Caris Life Sciences logo

Staff DevOps Engineer

Caris Life Sciences
2 hours ago
Remote friendly (Kansas, United States)
United States
IT
Position Summary
- Serve as a Staff DevOps Engineer specializing in AWS and Kubernetes to design, implement, and optimize scalable, secure cloud-native infrastructure. Lead PoC initiatives, oversee monitoring solutions, and translate SOX compliance into actionable cloud implementation plans. Provide technical leadership in cloud migration, security, and DevOps best practices.

Job Responsibilities
- Lead design, implementation, and management of Kubernetes clusters on AWS EKS, ensuring high availability, scalability, and security (autoscaling, monitoring, logging, security policies).
- Lead proof-of-concept (PoC) initiatives for new tools and environments.
- Manage Kubernetes cluster lifecycle: upgrades, patch management, version control, and performance optimization.
- Support teams deploying and optimizing applications on Kubernetes (container orchestration and service mesh).
- Design and implement monitoring/alerting using CloudWatch, Prometheus, and Datadog.
- Build observability standards and dashboards using AI/AIOps approaches and SRE agents for anomaly detection, alert noise reduction, and automated root-cause analysis.
- Develop/maintain Infrastructure as Code (Terraform, AWS CDK) and implement CI/CD pipelines for deployment and image management.
- Design/implement security solutions and translate SOX compliance requirements into actionable cloud plans.
- Lead cloud migration and modernization of legacy applications with cross-functional teams.
- Provide technical leadership and mentorship to junior engineers; implement knowledge-sharing initiatives.
- Stay current with emerging AWS services/features and optimize cost/resource utilization.
- Develop/maintain documentation (team knowledge base, runbooks, process documentation).
- Identify inefficiencies and develop process improvement plans.
- Participate in on-call rotations to support critical infrastructure and respond to emergencies.

Required Qualifications
- Bachelor’s degree in Computer Science, IT, or related field.
- 7+ years in DevOps or Site Reliability Engineering.
- 5+ years hands-on AWS and cloud architecture.
- 5+ years hands-on Kubernetes (cluster management, troubleshooting, optimization).
- Proficiency in at least one programming language (Python, Go, Java).
- Extensive IaC experience (Terraform, CloudFormation, AWS CDK).
- Deep understanding of Docker and Kubernetes security best practices.
- CI/CD experience, especially GitLab CI and GitHub Actions.
- Strong networking knowledge for cloud environments.
- Strong problem-solving/troubleshooting skills.
- Experience leading PoCs and evaluating new technologies.
- Experience creating/maintaining technical documentation and knowledge bases.
- Ability to identify operational inefficiencies and drive process improvement.
- Strong analytical skills translating technical insights into business recommendations.
- Strong communication and mentoring skills.
- Proficiency in Microsoft Office (Word, Excel, Outlook) and basic business internet use.

Preferred Qualifications
- AWS Professional certifications (Solutions Architect Pro, DevOps Eng Pro).
- Kubernetes certifications (CKA, CKAD, CKS).
- Experience with multi-cloud platforms (AWS, GCP).
- Database knowledge (MySQL, PostgreSQL, DynamoDB).
- Monitoring/observability tools (Prometheus, Grafana, ELK).
- Serverless and microservices familiarity.
- Configuration management tools (Ansible, Chef, Puppet).
- Experience implementing knowledge management systems/tools in a DevOps environment.
- Open-source or personal projects demonstrating cloud expertise.

Benefits (if applicable)
- Medical, dental, and vision coverage options
- Health Savings Account (HSA) and Flexible Spending Account (FSA)
- Paid time off (vacation, sick time, holidays)
- 401(k) match and financial planning tools
- LTD/STD insurance and voluntary benefits options
- Employee Assistance Program, Pet Insurance, Legal Assistance, Tuition Assistance

Other/Conditions
- Periodic travel; may require some evenings/weekends/holidays and after-hours on-call response.

Training
- Job-specific, safety, and compliance training assigned based on role.