Data Architect II

GSK

Remote friendly (San Francisco, CA)

United States

$109,725 - $182,875 USD yearly

Role Summary

As a Data Architect II, you'll apply your expertise in big data and AI/GenAI workflows to support GSK's complex, regulated R&D environment. You'll contribute to designing Data Mesh/Data Fabric architectures while enabling modern AI and machine learning capabilities across our platform.

Responsibilities

Partner with the Scientific Knowledge Engineering team to develop physical data models to build fit-for-purpose data products
Design data architecture aligned with enterprise-wide standards to promote interoperability
Collaborate with the platform teams and data engineers to maintain architecture principles, standards, and guidelines
Design data foundations that support GenAI workflows including RAG (Retrieval-Augmented Generation), vector databases, and embedding pipelines
Work across business areas and stakeholders to ensure consistent implementation of architecture standards
Lead reviews and maintain architecture documentation and best practices for Onyx and our stakeholders
Adopt security-first design with robust authentication and resilient connectivity
Provide best practices and leadership, subject matter, and GSK expertise to architecture and engineering teams composed of GSK FTEs, strategic partners, and software vendors

Qualifications

Required: Bachelor's degree in computer science, engineering, Data Science or similar discipline
Required: 5+ years of experience in data architecture, data engineering, or related fields in pharma, healthcare, or life sciences R&D
Required: 3+ years’ experience of defining architecture standards, patterns on Big Data platforms
Required: 3+ years’ experience with data warehouse, data lake, and enterprise big data platforms
Required: 3+ years’ experience with enterprise cloud data architecture (preferably Azure or GCP) and delivering solutions at scale
Required: 3+ years of hands-on relational, dimensional, and/or analytic experience (using RDBMS, dimensional, NoSQL data platform technologies, and ETL and data ingestion protocols)
Preferred: Master’s or PhD in computer science, engineering, Data Science or similar discipline
Preferred: Deep knowledge and use of at least one common programming language (Python, Scala, Java)
Preferred: Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures
Preferred: Familiarity with GenAI/LLM data patterns: RAG architectures, prompt engineering data requirements, fine-tuning data preparation
Preferred: Experience with GCP data/analytics stack: Spark, Dataflow, Dataproc, GCS, BigQuery
Preferred: Experience with enterprise data tools: Ataccama, Collibra, Acryl
Preferred: Experience with Agile frameworks: SAFe, Jira, Confluence, Azure DevOps
Preferred: Experience applying CI/CD principles to data solution
Preferred: Experience with Spark and RAG-based architectures for data science and ML use cases
Preferred: Strong communication skills—ability to explain technical concepts to non-technical stakeholders
Preferred: Pharmaceutical, healthcare, or life sciences background

Education

Bachelor’s degree in computer science, engineering, Data Science or similar discipline
Master’s or PhD in computer science, engineering, Data Science or similar discipline (preferred)

Data Architect II

Role Summary

Responsibilities

Qualifications

Education

More jobs

Delivery Manager, People Technology

Bristol Myers Squibb

Director Information Security

Edgewise Therapeutics