Principal Data Engineer

Takeda

Remote

United States

$56,000 - $88,000 USD yearly

Role Summary

The Principal Data Engineer will architect and deliver cloud-based data pipelines using Python, Spark, and Airflow to automate ETL/ELT processes across data lakes and warehouses. They will design and implement AI/ML and GenAI-driven solutions using supervised/unsupervised learning, statistical modeling, and NLP to enhance data quality, automate workflows, detect similarities, and support evidence-based clinical decision-making. The role involves developing data integration workflows for structured and unstructured data, creating interactive dashboards and real-time visualization platforms to deliver actionable insights, and mentoring junior engineers while guiding enterprise-wide adoption of scalable, AI-powered data engineering solutions. Location: Cambridge, MA; 100% remote work allowed anywhere in the U.S.

Responsibilities

Engineer cloud-based data pipelines using Python, Spark, and Airflow to automate ETL/ELT processes, enabling efficient data ingestion, transformation, and storage across data lakes and warehouses.
Design and implement AI/ML and GenAI-driven solutions using supervised/unsupervised learning, statistical modeling, and NLP to enhance data quality, automate workflows, detect similarities, and support evidence-based clinical decision-making.
Develop robust data integration workflows for structured and unstructured data, ensuring adherence to Good Clinical Practices (GCP), FDA regulations, and SOPs through SQL-based data validation frameworks.
Create interactive dashboards and real-time visualization platforms to deliver actionable insights from clinical and operational data, enabling stakeholders to monitor performance and drive data-informed strategies.
Develop custom automation tools using Python, R, and APIs to streamline data entry, reduce manual processing, and enhance operational efficiency across clinical research systems.
Drive strategic alignment by partnering with cross-functional teams, mentoring junior engineers, and advising leadership on AI/ML adoption, automation strategies, and emerging data technologies.
Influence industry practices by presenting technical innovations at leading conferences and guiding enterprise-wide adoption of scalable, AI-powered data engineering solutions.

Qualifications

30 months of related experience; design, develop, test, and deploy software applications and features based on client and project requirements.
Experience implementing automated testing and regression testing using Selenium and Python to improve test coverage, reduce manual effort, and ensure application stability.
Collaborate with cross-functional teams, including developers, business analysts, and QA leads; participate in Agile/Scrum ceremonies to plan, deliver, and communicate software progress iteratively.
Perform data wrangling, transformation, and management to create structured datasets stored in databases, supporting data analyses.

Education

Master’s degree in Computer Science, Data Science, Engineering, or related field.

Skills

Python
Spark
Airflow
SQL-based data validation frameworks
AI/ML, GenAI, NLP
Data integration for structured and unstructured data
Dashboard and real-time visualization
APIs and automation using Python and R
Collaboration, mentoring, and leadership communication

Apply now

Principal Data Engineer

Role Summary

Responsibilities

Qualifications

Education

Skills

More jobs

(Sr) Manager, Clinical Data Platform Engineer

BioNTech SE

Supply Chain & SAP Implementation Analyst

Teva Pharmaceuticals