GSK logo

Data Engineer II

GSK
Full-time
Remote friendly (Cambridge, MA)
United States
$116,325 - $193,875 USD yearly
IT

Role Summary

Data Engineer II within the Onyx Research Data Platform at GSK, focused on designing, delivering, and maintaining automated end-to-end data services and pipelines. Work across structured, unstructured, and scientific data domains to deliver reliable, scalable, and governed data products, with opportunities to contribute to GenAI-enabled data capabilities.

Responsibilities

  • Build modular code / libraries / services using modern data engineering tools (Python/Spark, Kafka, Storm) and orchestration tools (e.g., Google Workflow, Airflow Composer)
  • Produce well-engineered software, including automated test suites and technical documentation
  • Develop, measure, and monitor key metrics for all tools and services, and iterate to improve them
  • Ensure consistent application of platform abstractions for logging and data lineage
  • Participate in code reviews and uphold coding best practices, improving the team's standards
  • Adhere to QMS framework and CI/CD best practices
  • Provide L3 support to existing tools / pipelines / services

Qualifications

  • Required: Bachelor’s degree in Data Engineering, Computer Science, Software Engineering, or a related discipline
  • Required: 4+ years of data engineering experience
  • Required: Software engineering experience
  • Required: Familiarity with orchestrating tooling
  • Required: Cloud experience (GCP, Azure or AWS)
  • Required: Experience in automated testing and design
  • Preferred: New PhD or a Masters degree with 2+ years of experience
  • Preferred: Experience overcoming high-volume, high-compute challenges
  • Preferred: Knowledge and use of programming languages such as Python, Scala, Java, including toolchains for documentation, testing, and operations/observability
  • Preferred: Strong experience with modern software development tools and practices (Git/GitHub, DevOps tools, metrics/monitoring)
  • Preferred: Cloud experience (AWS, Google Cloud, Azure, Kubernetes)
  • Preferred: Experience with CI/CD implementations using git and common stacks (e.g., Jenkins, CircleCI, GitLab, Azure DevOps)
  • Preferred: Experience with agile software development environments using Jira and Confluence
  • Preferred: Demonstrated experience with data engineering tools (e.g., Spark, Kafka, Storm)
  • Preferred: Knowledge of data modeling, database concepts and SQL
  • Preferred: Exposure to GenAI or ML data workflows (vector stores, embeddings, feature pipelines)

Skills

  • Proficiency with Spark, Kafka, Storm and related data processing frameworks
  • Experience with data orchestration and workflow tools (Airflow, Google Workflow, etc.)
  • Strong software engineering practices: testing, documentation, version control, and observability
  • Knowledge of data governance, logging, and data lineage
  • Familiarity with GenAI-enabled data concepts and vectorized data flows
Apply now
Share this job