GSK logo

Senior Data Engineer

GSK
Full-time
Remote friendly (Cambridge, MA)
United States
$136,950 - $228,250 USD yearly
IT

Role Summary

Senior Data Engineer responsible for turning ambiguous scientific or technical challenges into well-defined data solutions. You will lead technical design, mentor engineers, and drive high-impact work across the data ecosystem, with deep expertise in distributed systems, data processing, cloud platforms, and modern software engineering. You will ensure robustness of services, serve as an escalation point, and contribute to GenAI-enabled data services and related workflows.

Responsibilities

  • Designs, builds, and operates data tools, services, workflows, etc. that deliver high value through the solution to key business problems, leveraging modern data engineering tools (e.g., Spark, Kafka, Storm) and orchestration tools (e.g., Google Workflow, AirFlow Composer).
  • Confidently optimizes design and execution of complex data ingestion and data transformation solutions.
  • Enables data products optimized for AI/ML and GenAI workloadsβ€”high throughput, observable, feature-ready and governed.
  • Produces well-engineered software, including automated test suites, technical documentation, and operational strategy.
  • Implements modular, reusable components and microservices to accelerate development and reduce operational overhead.
  • Provides input into roadmaps of upstream teams to improve the overall program of work.
  • Ensures consistent application of platform abstractions for logging and data lineage.
  • Participates in code reviews and advocates for coding standards and best practices.
  • Follows QMS framework and CI/CD best practices, contributing to improvements in processes.
  • Provides technical leadership, code reviews, architectural guidance, and mentorship to junior engineers; acts as escalation point for complex operational issues across pipelines and data services.

Qualifications

  • Required: PhD + 2 years, Masters + 4 years, or a Bachelors with 6+ years of data engineering experience; software engineering experience; experience handling high-volume, high-compute challenges; familiarity with orchestration tooling; cloud experience; experience with automated testing and design; DevOps-forward ways of working.
  • Preferred: Proficiency in at least one programming language (e.g., Python, Scala, Java) with tooling for documentation, testing, and operations/observability; strong experience with modern software development tools (Git/GitHub, devops tools, metrics/monitoring); cloud infrastructure experience (AWS, Google Cloud, Azure, Kubernetes) including infrastructure-as-code; CI/CD implementations using common stacks (e.g., Jenkins, CircleCI, GitLab, Azure DevOps); agile software development environments (Jira, Confluence); deep familiarity with data engineering tools (Spark, Kafka, Storm) and orchestration (Google Workflow, AirFlow); strong data modeling, SQL skills; experience building GenAI-related pipelines (embeddings, RAG, LLM data prep, scalable inference data flows).
Apply now
Share this job