Role Summary
Senior AI/ML Platform Engineer to help realize a vision of transformative AI and machine learning across GSK's data ecosystem. The role sits within the AI/ML Platform Engineering team, which is building a first-in-class platform of tools and services covering MLOps/DevOps across Cloud and High-Performance Computing. The goal is to decrease development time and raise the quality bar on engineering across AI/ML teams and products. The team emphasizes ownership, accountability, continuous development, and collaboration, with career development supported from day one.
Responsibilities
- Serve as a key engineer for the AIML platform and contribute technical expertise to teams in closely aligned technical areas such as GenAI Platform, DevOps, Compute and Cloud.
- Lead design of major software components of the AIML Platform and contribute to development of production code in Python and participate in both design reviews and PR reviews.
- Accountable for key component(s) of AIML Platform with particular focus on usability, reproducibility and performance at scale.
- Integrate with DataOps, HPC and Data Engineering products for best performance and ease of use in ML training at scale.
- Participate in or lead project teams and contribute technical expertise to teams in closely aligned technical areas.
- Able to design innovative strategies and ways of working to create a better environment for the end users.
- Champion best practices in ways of working and engineering discipline, and proactively contribute to improvements within your engineering area.
Qualifications
- Required:
- Bachelor’s, Master’s or PhD degree in Computer Science, Software Engineering, or related discipline.
- 6+ years of experience in industry experience in software engineering with a Bachelor’s.
- 4+ years of experience in industry experience in software engineering with a Master’s.
- 2+ years of experience in industry and/or academic experience in software engineering with a PhD.
- 2+ years of experience in AIML engineering, including large-scale model training and production deployment.
- Experience with delivering projects primarily using Python.
- Preferred:
- Deep knowledge and use of Python programming language including toolchains for documentation, testing, and operations / observability
- Deep expertise in modern software development tools / ways of working (e.g. git/GitHub, DevOps tools, metrics / monitoring, …)
- Deep cloud expertise (e.g., AWS, Google Cloud, Azure), including infrastructure-as-code tools (Terraform, Ansible, Packer, …) and scalable cloud compute technologies, such as Google Batch and Vertex AI
- Deep hands-on experience with ML frameworks such as PyTorch or TensorFlow as well as external libraries such as Huggingface and/or Deepspeed
- Hands-on experience with frameworks for building agentic AI systems, such as LangGraph, LangChain
- Experience with ML application performance tuning and optimization, both for ML training and inference/deployment, including large scale multi-GPU, and/or multi-TPU multi-node distributed training for large models such as LLMs
- Experience with CI/CD implementations using git and a common CI/CD stack (e.g., Azure DevOps, CloudBuild, Jenkins, CircleCI, GitLab)
- Experience in ML workflow orchestration and pipelines with tools such as Vertex Pipelines, MLFlow, etc.
- Experience with MLOps tools and model deployments (including LLMs) such as Kubeflow, Vertex AI Predictions, vLLM, Ollama
- Deep expertise with Docker, Kubernetes, and the larger CNCF ecosystem including experience with application deployment tools such as Helm
- Experience with High-Performance Computing (HPC) at both software stack and hardware level and understanding performance within HPC systems
- Deep familiarity with tools, techniques, optimizations in AIML and AIML Platform/MLOps space, including engagement with the open-source community
- Demonstrated excellence with agile software development environments using tools like Jira and Confluence
Education
- Bachelor’s, Master’s or PhD degree in Computer Science, Software Engineering, or related discipline