Role Summary
Pfizer’s mission to deliver breakthroughs that change patients’ lives is rooted in our commitment to science and innovation. Within Discovery, Preclinical, and Translational Solutions (DP&TS), we accelerate the journey from target identification to clinical translation by leveraging advanced digital technologies, AI, and data-driven insights.
We’re building a forward-thinking platform engineering team dedicated to delivering secure, scalable, and resilient infrastructure. As a Site Reliability/Operations Engineering Lead, you’ll play a pivotal role in ensuring the reliability, performance, and operational excellence of our cloud-native platforms.
This role is perfect for a high-caliber, well-rounded generalist who thrives in dynamic environments, takes initiative, and enjoys solving complex problems across infrastructure, automation, and observability. You’ll be joining a team that values curiosity, collaboration, and continuous learning. While we expect you to take ownership and solve meaningful problems, you’ll be supported by a friendly, inclusive environment with clear goals, strong mentorship, and a culture of shared success. We believe in setting our team up to thrive—not just deliver.
Responsibilities
- Ensure high availability and performance of cloud infrastructure and services (AWS, Azure)
- Build and maintain monitoring, alerting, and observability systems (e.g., Prometheus, Grafana, ELK)
- Automate operational tasks using Terraform, Ansible, and scripting languages
- Manage incident response, root cause analysis, and postmortems
- Collaborate on CI/CD pipelines and deployment strategies using GitHub Actions
- Maintain and improve container orchestration platforms (Kubernetes, Docker)
- Administer systems, databases, and networks with a focus on reliability and security
- Implement and enforce security and compliance best practices
- Continuously evaluate and integrate tools to improve operational efficiency
- Lead and grow a high-performing team of reliability and operations engineers
Qualifications
- Education: Bachelor’s degree in a relevant field (e.g., Computer Science, Data Science, Bioinformatics, Engineering, or related discipline)
- 6+ years of experience in site reliability, operations, or infrastructure engineering
- Strong experience with AWS or Azure
- Proficiency in Terraform, Ansible, and GitHub
- Solid understanding of Kubernetes, Docker, and container orchestration
- Strong scripting skills (e.g., Bash, Python, Typescript)
- Experience with Linux/Unix system administration
- Familiarity with networking, security, and database administration
- Proven troubleshooting and incident management skills
- Fluent in English; capable of clear technical communication across scientific and engineering disciplines
- Candidate demonstrates a breadth of diverse leadership experiences and capabilities including: the ability to influence and collaborate with peers, develop and coach others, oversee and guide the work of other colleagues to achieve meaningful outcomes and create business impact.
Preferred Qualifications
- Experience with observability and logging tools (e.g., OpenTelemetry, Prometheus, Grafana, ELK)
- Knowledge of secrets management (e.g., HashiCorp Vault, AWS Secrets Manager)
- Experience working in regulated environments or with compliance frameworks (e.g., GxP, SOC2, HIPAA)