Pfizer logo

Sr Manager, Cloud Infrastructure Engineer, Scientific Computing and HPC

Pfizer
Full-time
Remote friendly (New London County, CT)
United States
$108,700 - $201,400 USD yearly
IT

Role Summary

Pfizer's commitment to applying computational science in drug discovery and development includes a large-scale migration of our computational infrastructure to the cloud. This role leverages extensive experience in cloud engineering and DevOps and requires hands-on design and delivery of robust High Performance Computing (HPC) solutions supporting computational workloads across the organization. Location: Hybrid. Must be able to work from the assigned Pfizer office 2-3 days per week, or as needed by the business.

Responsibilities

  • Design, implement, operate, and own robust and dependable infrastructure for HPC and ML/AI workloads in a cloud environment (AWS/GCP).
  • Lead containerization, deployment, and operation of user- and admin-facing HPC platforms (Slurm, Open On Demand, Prometheus/Grafana, batch and distributed computing platforms) across cloud environments.
  • Translate stakeholder input into robust, high-performance, scalable, cost effective computing platforms.
  • Partner with HPC specialists to capture institutional knowledge and manual processes in IaC workflows, transforming ad-hoc deployment practices into reproducible, version-controlled, automated procedures.
  • Develop and maintain infrastructure automation using IaC tools like Terraform and CloudFormation to ensure repeatable environment provisioning and scaling.
  • Create reusable Terraform modules, develop and enforce standards, and drive the implementation and maintenance of all cloud infrastructure using IaC tools.
  • Operationalize containerized solutions using Docker and Kubernetes.
  • Own the full lifecycle of infrastructure management, from provisioning to operations, support, updating, and teardown of production computing platforms.
  • Perform troubleshooting, system analysis, and benchmarking to resolve issues and maintain a high-performance environment.
  • Develop and maintain monitoring, logging, and alerting for the infrastructure (e.g., CloudWatch, Prometheus/Grafana).
  • Design dashboards, workflows, and utilities to improve observability, cost monitoring, workload efficiency, user or administration experience.
  • Document architecture, deployment processes, and operational procedures.
  • Collaborate with team members to support delivery of scientific computing services including user support, Linux administration, operations, job scheduling, application management, and resource optimization.

Qualifications

  • Required: B.S. in computer science, life science, data science or similar fields.
  • Required: 6+ years of experience in cloud infrastructure engineering with a proven track record of developing and supporting robust IaC deployments.
  • Required: Experience managing scientific computing workloads in an enterprise environment.
  • Required: Advanced experience with at least one of AWS and GCP, including knowledge of core compute and storage services relevant to HPC.
  • Required: Solid understanding of cloud networking, identity, and security controls.
  • Preferred: Prior experience with HPC deployment utilities including AWS ParallelCluster, AWS Parallel Computing Services, and Google Cloud Cluster Toolkit.
  • Preferred: Proficiency with distributed computing environments, especially EKS/GKE/Kubernetes.
  • Preferred: Familiarity with HPC environments, job schedulers (Slurm), HPC application containers (Docker, Singularity, Apptainer) and NVIDIA GPU computing.

Additional Requirements

  • Occasional international travel for team meetings and conferences.
Apply now
Share this job