Data Engineer II

GSK

Full-time

Remote friendly (Cambridge, MA)

United States

$116,325 - $193,875 USD yearly

Role Summary

Data Engineer II within the Onyx Research Data Platform at GSK, focused on designing, delivering, and maintaining automated end-to-end data services and pipelines. Work across structured, unstructured, and scientific data domains to deliver reliable, scalable, and governed data products, with opportunities to contribute to GenAI-enabled data capabilities.

Responsibilities

Build modular code / libraries / services using modern data engineering tools (Python/Spark, Kafka, Storm) and orchestration tools (e.g., Google Workflow, Airflow Composer)
Produce well-engineered software, including automated test suites and technical documentation
Develop, measure, and monitor key metrics for all tools and services, and iterate to improve them
Ensure consistent application of platform abstractions for logging and data lineage
Participate in code reviews and uphold coding best practices, improving the team's standards
Adhere to QMS framework and CI/CD best practices
Provide L3 support to existing tools / pipelines / services

Qualifications

Required: Bachelor’s degree in Data Engineering, Computer Science, Software Engineering, or a related discipline
Required: 4+ years of data engineering experience
Required: Software engineering experience
Required: Familiarity with orchestrating tooling
Required: Cloud experience (GCP, Azure or AWS)
Required: Experience in automated testing and design
Preferred: New PhD or a Masters degree with 2+ years of experience
Preferred: Experience overcoming high-volume, high-compute challenges
Preferred: Knowledge and use of programming languages such as Python, Scala, Java, including toolchains for documentation, testing, and operations/observability
Preferred: Strong experience with modern software development tools and practices (Git/GitHub, DevOps tools, metrics/monitoring)
Preferred: Cloud experience (AWS, Google Cloud, Azure, Kubernetes)
Preferred: Experience with CI/CD implementations using git and common stacks (e.g., Jenkins, CircleCI, GitLab, Azure DevOps)
Preferred: Experience with agile software development environments using Jira and Confluence
Preferred: Demonstrated experience with data engineering tools (e.g., Spark, Kafka, Storm)
Preferred: Knowledge of data modeling, database concepts and SQL
Preferred: Exposure to GenAI or ML data workflows (vector stores, embeddings, feature pipelines)

Skills

Proficiency with Spark, Kafka, Storm and related data processing frameworks
Experience with data orchestration and workflow tools (Airflow, Google Workflow, etc.)
Strong software engineering practices: testing, documentation, version control, and observability
Knowledge of data governance, logging, and data lineage
Familiarity with GenAI-enabled data concepts and vectorized data flows

Data Engineer II

Role Summary

Responsibilities

Qualifications

Skills

More jobs

Associate Director, IT Regulatory, Clinical, and Enterprise Systems

Bristol Myers Squibb

Data Scientist

Takeda