AI GPU Platform Engineering

Full-time

Remote friendly (Indianapolis, IN)

United States

$135,000 - $213,400 USD yearly

Want to see how your resume matches up to this job? A free trial of our JobsAI will help! With over 2,000 biopharma executives loving it, we think you will too! Try it now — JobsAI.

Role Summary

AI GPU Platform Engineer focused on driving the engineering and operations of advanced Linux platforms supporting AI and HPC workloads, with expertise in Nvidia DGX systems, Spectrum X networking, and WEKA storage to support cutting-edge AI/ML workloads. Lead strategy and development of advanced Linux computing capabilities for AI/ML and advise on global Linux strategy for on-premises private cloud and public IaaS Linux services.

Responsibilities

Drive the engineering and operations of advanced Linux platforms supporting AI and HPC workloads.
Manage Nvidia DGX systems using Mission Control, Base Command, and Run:AI.
Optimize Spectrum X networking and WEKA storage for AI/ML applications.
Boost productivity for Advanced Intelligence and Data Science teams through AI/HPC infrastructure tooling and operational excellence.
Lead the strategy, engineering, and development of Advanced Linux computing capabilities for AI/ML.
Advise with the senior Linux platform engineer on directing the global Linux strategy for on-premises private cloud and public IaaS Linux services.

Qualifications

Required: Expertise in Linux system administration, HPC environments, and Nvidia DGX server management; Experience with Spectrum X networking and parallel file systems.
Required: 6+ years of demonstrated experience in AI/ML and HPC workloads and infrastructure.
Required: Hands-on experience with HPC-grade infrastructure; knowledge of accelerated computing (GPU), storage (WEKA), scheduling/orchestration (Slurm, Kubernetes, LSF), high-speed networking (Ultra-Ethernet, RoCE), and container technologies (Docker).
Required: Proficiency in at least one scripting language (Bash, Python, etc.).
Preferred: Experience running and optimizing large-scale distributed training workloads using PyTorch (DDP, FSDP), NeMo, or JAX; understanding AI/ML workflows from data processing to inference.
Required: Bachelor’s degree in computer science, IT, or related technical field.
Required: 10+ years’ experience as a Linux OS/Platform Engineer.

Education

Bachelor’s degree in computer science, Information Technology, or related technical field.

Skills

Linux system administration
HPC environments
Nvidia DGX server management
Spectrum X networking
WEKA storage
Containerization and automation tools
Python/Bash scripting
PyTorch, NeMo, or JAX for large-scale distributed training
Slurm/Kubernetes/LSF, GPU acceleration
AI/ML infrastructure optimization

Additional Requirements

Hybrid role located in Indianapolis, IN (relocation required)
Less than 5% travel

Apply now

Share this job

AI GPU Platform Engineering

Role Summary

Responsibilities

Qualifications

Education

Skills

Additional Requirements

More jobs

Associate Director, Sr Principal Business Analyst

Bristol Myers Squibb

Data & Digital Director, Site Operations

Takeda