Role Summary
Pfizer’s mission to deliver breakthroughs that change patients’ lives is rooted in our commitment to science and innovation. Within Discovery, Preclinical, and Translational Solutions (DP&TS), we accelerate the journey from target identification to clinical translation by leveraging advanced digital technologies, AI, and data-driven insights. We’re building a forward-thinking platform engineering team dedicated to delivering secure, scalable, and resilient infrastructure. As a Site Reliability/Operations Engineer, you’ll play a pivotal role in ensuring the reliability, performance, and operational excellence of our cloud-native platforms. This role is perfect for a high-caliber, well-rounded generalist who thrives in dynamic environments, takes initiative, and enjoys solving complex problems across infrastructure, automation, and observability. You’ll be joining a team that values curiosity, collaboration, and continuous learning. While we expect you to take ownership and solve meaningful problems, you’ll be supported by a friendly, inclusive environment with clear goals, strong mentorship, and a culture of shared success. We believe in setting our team up to thrive—not just deliver.
Responsibilities
- Monitor cloud infrastructure and respond to alerts under guidance of senior engineers
- Deploy and maintain automation scripts and tools (e.g., Terraform, Ansible) and proactively look for improvements to tooling
- Maintain and update observability systems (e.g., Prometheus, Grafana) based on feedback from engineers
- Actively participate in incident response activities and root cause analysis to resolve issues and implement improvements with support from the team
- Collaborate with team members to implement changes and improvements to CI/CD pipelines
- Contribute to documentation and process improvements
- Proactively seek opportunities for skill development and apply security and compliance best practices in daily tasks
Qualifications
- Required: Bachelor’s degree in a relevant field (e.g., Computer Science, Data Science, Bioinformatics, Engineering, or related discipline)
- Required: 4+ years of experience in site reliability, operations, or infrastructure engineering
- Required: Familiarity with AWS or Azure
- Required: Familiarity with Terraform, Ansible, and GitHub
- Required: Understanding of Kubernetes, Docker, and container orchestration
- Required: Good scripting skills (e.g., Bash, Python, Typescript)
- Required: Familiarity with Linux/Unix system administration
- Required: Familiarity with networking, security, and database administration
- Required: Strong problem-solving skills and eagerness to learn in a collaborative environment
- Required: Fluent in English; capable of clear technical communication across scientific and engineering disciplines
- Preferred: Experience with observability and logging tools (e.g., OpenTelemetry, Prometheus, Grafana, ELK)
- Preferred: Knowledge of secrets management (e.g., HashiCorp Vault, AWS Secrets Manager)
- Preferred: Experience working in regulated environments or with compliance frameworks (e.g., GxP, SOC2, HIPAA)
- Preferred: Experience working in team-based environments, either professionally or academically
Education
- Bachelor’s degree in a relevant field as specified above
Additional Requirements
- Travel up to 10% may be required for business activities
- Work Location: Hybrid