Eli Lilly and Company logo

Principal Platform Reliability Engineer

Eli Lilly and Company
9 days ago
Remote friendly (Indianapolis, IN)
United States
IT
What You’ll Do
- Define and implement SLOs, SLIs, and reliability standards; improve resilience via capacity planning, failover design, and disaster recovery.
- Lead response for P1/P2 incidents; own mitigation/recovery, conduct root cause analysis, and implement corrective actions.
- Develop and maintain runbooks, playbooks, and operational standards.
- Implement and optimize observability (monitoring, logging, tracing, alerting) to improve visibility and reduce alert noise.
- Use Splunk, Prometheus, CloudWatch (or equivalent) to enable proactive detection/diagnosis/resolution.
- Build and maintain CI/CD pipelines and deployment automation; drive Infrastructure as Code and GitOps adoption.
- Support integration of SRE principles throughout the software lifecycle.
- Implement secure-by-design practices; support vulnerability remediation and secure configurations; align with enterprise security/compliance.
- Partner with teams to improve reliability, performance, and deployment practices.
- Provide technical guidance/mentorship and communicate health/incident impact to stakeholders.

Your Basic Requirements
- Bachelor’s degree in a related technical field.
- 7+ years hands-on AWS.
- Extensive Kubernetes/container experience (e.g., Docker, EKS).
- Experience operating production distributed systems.
- Incident management/on-call experience.
- Experience defining/managing SLOs/SLIs/error budgets.
- Observability tooling experience (e.g., Splunk, LGTM).
- CI/CD pipeline experience.
- Infrastructure as Code (Terraform, CloudFormation).
- Scripting: Python, Bash, or PowerShell.
- Networking/cloud architecture fundamentals.
- Security best practices in cloud.
- Troubleshooting complex system/performance issues.

What You Should Bring
- ArgoCD/GitHub Actions/GitOps workflows.
- Large-scale enterprise environments.
- Regulated industry experience (healthcare/pharma).
- Global support models/follow-the-sun.
- Strong written communication (incident updates, postmortems, status summaries).

Role Details
- Hybrid; in office 3 days/week; no travel required.

Compensation & Benefits
- Anticipated wage: $126,000–$224,400.
- Eligible for company bonus (depending on company/individual performance) and comprehensive benefits (e.g., 401(k), pension, vacation, medical/dental/vision, flexible benefits, life insurance, time off/leave, well-being benefits).