Sr Manager, Cloud Infrastructure Engineer, Scientific Computing and HPC

Pfizer

Full-time

Remote friendly (New London County, CT)

United States

$108,700 - $201,400 USD yearly

Role Summary

Pfizer's commitment to applying computational science in drug discovery and development includes a large-scale migration of our computational infrastructure to the cloud. This role leverages extensive experience in cloud engineering and DevOps and requires hands-on design and delivery of robust High Performance Computing (HPC) solutions supporting computational workloads across the organization. Location: Hybrid. Must be able to work from the assigned Pfizer office 2-3 days per week, or as needed by the business.

Responsibilities

Design, implement, operate, and own robust and dependable infrastructure for HPC and ML/AI workloads in a cloud environment (AWS/GCP).
Lead containerization, deployment, and operation of user- and admin-facing HPC platforms (Slurm, Open On Demand, Prometheus/Grafana, batch and distributed computing platforms) across cloud environments.
Translate stakeholder input into robust, high-performance, scalable, cost effective computing platforms.
Partner with HPC specialists to capture institutional knowledge and manual processes in IaC workflows, transforming ad-hoc deployment practices into reproducible, version-controlled, automated procedures.
Develop and maintain infrastructure automation using IaC tools like Terraform and CloudFormation to ensure repeatable environment provisioning and scaling.
Create reusable Terraform modules, develop and enforce standards, and drive the implementation and maintenance of all cloud infrastructure using IaC tools.
Operationalize containerized solutions using Docker and Kubernetes.
Own the full lifecycle of infrastructure management, from provisioning to operations, support, updating, and teardown of production computing platforms.
Perform troubleshooting, system analysis, and benchmarking to resolve issues and maintain a high-performance environment.
Develop and maintain monitoring, logging, and alerting for the infrastructure (e.g., CloudWatch, Prometheus/Grafana).
Design dashboards, workflows, and utilities to improve observability, cost monitoring, workload efficiency, user or administration experience.
Document architecture, deployment processes, and operational procedures.
Collaborate with team members to support delivery of scientific computing services including user support, Linux administration, operations, job scheduling, application management, and resource optimization.

Qualifications

Required: B.S. in computer science, life science, data science or similar fields.
Required: 6+ years of experience in cloud infrastructure engineering with a proven track record of developing and supporting robust IaC deployments.
Required: Experience managing scientific computing workloads in an enterprise environment.
Required: Advanced experience with at least one of AWS and GCP, including knowledge of core compute and storage services relevant to HPC.
Required: Solid understanding of cloud networking, identity, and security controls.

Preferred: Prior experience with HPC deployment utilities including AWS ParallelCluster, AWS Parallel Computing Services, and Google Cloud Cluster Toolkit.
Preferred: Proficiency with distributed computing environments, especially EKS/GKE/Kubernetes.
Preferred: Familiarity with HPC environments, job schedulers (Slurm), HPC application containers (Docker, Singularity, Apptainer) and NVIDIA GPU computing.

Additional Requirements

Occasional international travel for team meetings and conferences.

Apply now

Share this job

Sr Manager, Cloud Infrastructure Engineer, Scientific Computing and HPC

Role Summary

Responsibilities

Qualifications

Additional Requirements

More jobs

Associate Director, IT Regulatory, Clinical, and Enterprise Systems

Bristol Myers Squibb

Data Scientist

Takeda