Bristol Myers Squibb logo

Senior Data Engineer, Translational Data Products

Bristol Myers Squibb
Full-time
Remote friendly (San Diego, CA)
United States
IT

Want to see how your resume matches up to this job? A free trial of our JobsAI will help! With over 2,000 biopharma executives loving it, we think you will too! Try it now — JobsAI.

Role Summary

As part of the Translational Data Products team, you will directly support translational medicine leaders in their mission to discover biomarkers that guide patient selection and treatment response for BMS assets. Your work will enable exploratory data analysis that drives crucial biomarker decisions at the heart of translational research.

You will bridge data engineering with innovation orchestrating advanced pipelines, ensuring auto-generated ETL and schema mappings are correct, and experimenting with the newest techniques-such as MCP servers, prompt engineering strategies (ReACT, chain-of-thought, etc.), and LLM-assisted tooling-to make biomarker data accessible, trustworthy, and actionable.

Responsibilities

  • Enable biomarker discovery Deliver data pipelines and mappings that help translational leaders identify biomarkers (molecular, digital, imaging) for patient stratification and treatment response.
  • Innovate with AI/LLMs Explore and apply cutting-edge approaches (MCP servers, prompt orchestration, auto-schema mapping, LLM-based ETL generation) to accelerate and improve data workflows.
  • Data orchestration Oversee ingestion from diverse sources (vendor feeds, raw instruments, CSV, PDF, etc.), ensuring automated ETL and sample-to-target mapping & transformation (STTM) outputs meet stakeholder needs.
  • Quality and profiling Assess and validate source data, documenting any cleaning, normalization of semantic mapping that needs to be applied for optimal QC, and identify where improvements are required vs merely convenient.
  • Hands-on implementation Build or adapt tools/scripts (Python, SQL, AWS Glue, Databricks, etc.) when automation falls short.
  • Stakeholder collaboration Act as a partner to translational medicine leaders-communicating progress, and brainstorming next steps as priorities evolve.
  • Agile team contribution Participate actively in standups, design sessions, sprint demos and innovation discussions.

Qualifications

  • Bachelor's or Master's degree in Computer Science, Data Engineering, Bioinformatics, or related field.
  • 5+ years of experience in data engineering, ideally with exposure to life sciences or healthcare.
  • Strong experience with data integration from heterogeneous sources (structured, semi-structured, unstructured).
  • Proficiency in AWS, Python and SQL, with ability to prototype and automate workflows.
  • Hands-on expertise with ETL frameworks (AWS Glue, Databricks, Airflow)
  • Familiarity with modern AI/LLM approaches for data transformation and semantic mapping is highly desirable.
  • Excellent communication skills to engage both technical and scientific stakeholders.
  • Comfortable in agile, exploratory, scientific environments

What Makes This Role Unique

  • Direct scientific impact Your work connects directly to patient-centric translational decisions.
  • Innovation You are encouraged to explore new technologies and approaches, not just maintain existing ones.
  • Automation first Instead of building every pipeline from scratch, you orchestrate and validate auto-generated ETLs and mappings.
  • Collaborative science + engineering You will brainstorm with scientists, demo working solutions, and help shape the future of translational data products.