GSK logo

Data Architect II

GSK
Remote friendly (San Francisco, CA)
United States
$109,725 - $182,875 USD yearly
IT

Role Summary

As a Data Architect II, you'll apply your expertise in big data and AI/GenAI workflows to support GSK's complex, regulated R&D environment. You'll contribute to designing Data Mesh/Data Fabric architectures while enabling modern AI and machine learning capabilities across our platform.

Responsibilities

  • Partner with the Scientific Knowledge Engineering team to develop physical data models to build fit-for-purpose data products
  • Design data architecture aligned with enterprise-wide standards to promote interoperability
  • Collaborate with the platform teams and data engineers to maintain architecture principles, standards, and guidelines
  • Design data foundations that support GenAI workflows including RAG (Retrieval-Augmented Generation), vector databases, and embedding pipelines
  • Work across business areas and stakeholders to ensure consistent implementation of architecture standards
  • Lead reviews and maintain architecture documentation and best practices for Onyx and our stakeholders
  • Adopt security-first design with robust authentication and resilient connectivity
  • Provide best practices and leadership, subject matter, and GSK expertise to architecture and engineering teams composed of GSK FTEs, strategic partners, and software vendors

Qualifications

  • Required: Bachelor's degree in computer science, engineering, Data Science or similar discipline
  • Required: 5+ years of experience in data architecture, data engineering, or related fields in pharma, healthcare, or life sciences R&D
  • Required: 3+ years’ experience of defining architecture standards, patterns on Big Data platforms
  • Required: 3+ years’ experience with data warehouse, data lake, and enterprise big data platforms
  • Required: 3+ years’ experience with enterprise cloud data architecture (preferably Azure or GCP) and delivering solutions at scale
  • Required: 3+ years of hands-on relational, dimensional, and/or analytic experience (using RDBMS, dimensional, NoSQL data platform technologies, and ETL and data ingestion protocols)
  • Preferred: Master’s or PhD in computer science, engineering, Data Science or similar discipline
  • Preferred: Deep knowledge and use of at least one common programming language (Python, Scala, Java)
  • Preferred: Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures
  • Preferred: Familiarity with GenAI/LLM data patterns: RAG architectures, prompt engineering data requirements, fine-tuning data preparation
  • Preferred: Experience with GCP data/analytics stack: Spark, Dataflow, Dataproc, GCS, BigQuery
  • Preferred: Experience with enterprise data tools: Ataccama, Collibra, Acryl
  • Preferred: Experience with Agile frameworks: SAFe, Jira, Confluence, Azure DevOps
  • Preferred: Experience applying CI/CD principles to data solution
  • Preferred: Experience with Spark and RAG-based architectures for data science and ML use cases
  • Preferred: Strong communication skills—ability to explain technical concepts to non-technical stakeholders
  • Preferred: Pharmaceutical, healthcare, or life sciences background

Education

  • Bachelor’s degree in computer science, engineering, Data Science or similar discipline
  • Master’s or PhD in computer science, engineering, Data Science or similar discipline (preferred)
Apply now
Share this job