Pfizer logo

Sr Manager, Cloud Infrastructure Engineer, Scientific Computing and HPC

Pfizer
Full-time
Remote friendly (Memphis, TN)
United States
$108,700 - $201,400 USD yearly
IT

Role Summary

Pfizer's committed to applying computational science to drug discovery and development. The role focuses on designing and delivering robust High Performance Computing (HPC) solutions in a cloud environment, driving architecture, automation, migration, and operational excellence to modernize the scientific computing platform.

Responsibilities

  • Platform Architecture and Engineering: design, implement, operate, and own infrastructure for HPC and ML/AI workloads in cloud environments (AWS/GCP); lead containerization, deployment, and operation of HPC platforms (Slurm, Open On Demand, Prometheus/Grafana, batch and distributed computing) across clouds; translate stakeholder input into robust, scalable computing platforms; collaborate with HPC staff to convert manual processes into reproducible IaC workflows.
  • Automation and DevOps: develop and maintain infrastructure automation using IaC tools (Terraform, CloudFormation); create reusable modules and enforce standards; operationalize containerized solutions with Docker and Kubernetes; manage full lifecycle of production computing platforms from provisioning to teardown; perform troubleshooting and benchmarking to maintain performance.
  • Monitoring and Reliability: develop and maintain monitoring, logging, and alerting; design dashboards and workflows to improve observability and cost monitoring; document architecture and procedures; support delivery of scientific computing services including user support, Linux administration, operations, job scheduling, application management, and resource optimization.

Qualifications

  • Required: B.S. in computer science, life science, data science, or a related field; 6+ years of cloud infrastructure engineering with proven IaC deployments; experience managing scientific computing workloads in an enterprise environment; advanced experience with AWS or GCP and knowledge of core compute and storage services relevant to HPC; solid understanding of cloud networking, identity, and security controls.
  • Preferred: Experience with HPC deployment utilities (AWS ParallelCluster, AWS Parallel Computing Services, Google Cloud Cluster Toolkit); proficiency with distributed computing environments (EKS/GKE/Kubernetes); familiarity with HPC environments, job schedulers (Slurm), HPC application containers (Docker, Singularity, Apptainer), and NVIDIA GPU computing.

Skills

  • Cloud computing (AWS, GCP)
  • Infrastructure as Code (Terraform, CloudFormation)
  • Containerization and orchestration (Docker, Kubernetes)
  • HPC platforms and job scheduling (Slurm)
  • Monitoring and observability (Prometheus, Grafana, CloudWatch)
  • Linux administration and automation
  • Security and network fundamentals in cloud environments

Additional Requirements

  • Occasional international travel for team meetings and conferences.
  • Hybrid work location: must be able to work from a Pfizer office 2–3 days per week, or as needed by the business.
Apply now
Share this job