Role Summary
As a Data Architect II, you'll apply your expertise in big data and AI/GenAI workflows to support GSK's complex, regulated R&D environment. You'll contribute to designing Data Mesh/Data Fabric architectures while enabling modern AI and machine learning capabilities across our platform.
Responsibilities
- Partner with the Scientific Knowledge Engineering team to develop physical data models to build fit-for-purpose data products
- Design data architecture aligned with enterprise-wide standards to promote interoperability
- Collaborate with the platform teams and data engineers to maintain architecture principles, standards, and guidelines
- Design data foundations that support GenAI workflows including RAG (Retrieval-Augmented Generation), vector databases, and embedding pipelines
- Work across business areas and stakeholders to ensure consistent implementation of architecture standards
- Lead reviews and maintain architecture documentation and best practices for Onyx and our stakeholders
- Adopt security-first design with robust authentication and resilient connectivity
- Provide best practices and leadership, subject matter, and GSK expertise to architecture and engineering teams composed of GSK FTEs, strategic partners, and software vendors
Qualifications
- Required: Bachelor's degree in computer science, engineering, Data Science or similar discipline
- Required: 5+ years of experience in data architecture, data engineering, or related fields in pharma, healthcare, or life sciences R&D
- Required: 3+ years’ experience of defining architecture standards, patterns on Big Data platforms
- Required: 3+ years’ experience with data warehouse, data lake, and enterprise big data platforms
- Required: 3+ years’ experience with enterprise cloud data architecture (preferably Azure or GCP) and delivering solutions at scale
- Required: 3+ years of hands-on relational, dimensional, and/or analytic experience (using RDBMS, dimensional, NoSQL data platform technologies, and ETL and data ingestion protocols)
- Preferred: Master’s or PhD in computer science, engineering, Data Science or similar discipline
- Preferred: Deep knowledge and use of at least one common programming language (Python, Scala, Java)
- Preferred: Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures
- Preferred: Familiarity with GenAI/LLM data patterns: RAG architectures, prompt engineering data requirements, fine-tuning data preparation
- Preferred: Experience with GCP data/analytics stack: Spark, Dataflow, Dataproc, GCS, BigQuery
- Preferred: Experience with enterprise data tools: Ataccama, Collibra, Acryl
- Preferred: Experience with Agile frameworks: SAFe, Jira, Confluence, Azure DevOps
- Preferred: Experience applying CI/CD principles to data solution
- Preferred: Experience with Spark and RAG-based architectures for data science and ML use cases
- Preferred: Strong communication skills—ability to explain technical concepts to non-technical stakeholders
- Preferred: Pharmaceutical, healthcare, or life sciences background
Education
- Bachelor’s degree in computer science, engineering, Data Science or similar discipline
- Master’s or PhD in computer science, engineering, Data Science or similar discipline (preferred)