Role Summary
Pfizer's committed to applying computational science to drug discovery and development. The role focuses on designing and delivering robust High Performance Computing (HPC) solutions in a cloud environment, driving architecture, automation, migration, and operational excellence to modernize the scientific computing platform.
Responsibilities
- Platform Architecture and Engineering: design, implement, operate, and own infrastructure for HPC and ML/AI workloads in cloud environments (AWS/GCP); lead containerization, deployment, and operation of HPC platforms (Slurm, Open On Demand, Prometheus/Grafana, batch and distributed computing) across clouds; translate stakeholder input into robust, scalable computing platforms; collaborate with HPC staff to convert manual processes into reproducible IaC workflows.
- Automation and DevOps: develop and maintain infrastructure automation using IaC tools (Terraform, CloudFormation); create reusable modules and enforce standards; operationalize containerized solutions with Docker and Kubernetes; manage full lifecycle of production computing platforms from provisioning to teardown; perform troubleshooting and benchmarking to maintain performance.
- Monitoring and Reliability: develop and maintain monitoring, logging, and alerting; design dashboards and workflows to improve observability and cost monitoring; document architecture and procedures; support delivery of scientific computing services including user support, Linux administration, operations, job scheduling, application management, and resource optimization.
Qualifications
- Required: B.S. in computer science, life science, data science, or a related field; 6+ years of cloud infrastructure engineering with proven IaC deployments; experience managing scientific computing workloads in an enterprise environment; advanced experience with AWS or GCP and knowledge of core compute and storage services relevant to HPC; solid understanding of cloud networking, identity, and security controls.
- Preferred: Experience with HPC deployment utilities (AWS ParallelCluster, AWS Parallel Computing Services, Google Cloud Cluster Toolkit); proficiency with distributed computing environments (EKS/GKE/Kubernetes); familiarity with HPC environments, job schedulers (Slurm), HPC application containers (Docker, Singularity, Apptainer), and NVIDIA GPU computing.
Skills
- Cloud computing (AWS, GCP)
- Infrastructure as Code (Terraform, CloudFormation)
- Containerization and orchestration (Docker, Kubernetes)
- HPC platforms and job scheduling (Slurm)
- Monitoring and observability (Prometheus, Grafana, CloudWatch)
- Linux administration and automation
- Security and network fundamentals in cloud environments
Additional Requirements
- Occasional international travel for team meetings and conferences.
- Hybrid work location: must be able to work from a Pfizer office 2–3 days per week, or as needed by the business.