Takeda logo

Principal Data Engineer

Takeda
Remote
United States
$56,000 - $88,000 USD yearly
IT

Role Summary

The Principal Data Engineer will architect and deliver cloud-based data pipelines using Python, Spark, and Airflow to automate ETL/ELT processes across data lakes and warehouses. They will design and implement AI/ML and GenAI-driven solutions using supervised/unsupervised learning, statistical modeling, and NLP to enhance data quality, automate workflows, detect similarities, and support evidence-based clinical decision-making. The role involves developing data integration workflows for structured and unstructured data, creating interactive dashboards and real-time visualization platforms to deliver actionable insights, and mentoring junior engineers while guiding enterprise-wide adoption of scalable, AI-powered data engineering solutions. Location: Cambridge, MA; 100% remote work allowed anywhere in the U.S.

Responsibilities

  • Engineer cloud-based data pipelines using Python, Spark, and Airflow to automate ETL/ELT processes, enabling efficient data ingestion, transformation, and storage across data lakes and warehouses.
  • Design and implement AI/ML and GenAI-driven solutions using supervised/unsupervised learning, statistical modeling, and NLP to enhance data quality, automate workflows, detect similarities, and support evidence-based clinical decision-making.
  • Develop robust data integration workflows for structured and unstructured data, ensuring adherence to Good Clinical Practices (GCP), FDA regulations, and SOPs through SQL-based data validation frameworks.
  • Create interactive dashboards and real-time visualization platforms to deliver actionable insights from clinical and operational data, enabling stakeholders to monitor performance and drive data-informed strategies.
  • Develop custom automation tools using Python, R, and APIs to streamline data entry, reduce manual processing, and enhance operational efficiency across clinical research systems.
  • Drive strategic alignment by partnering with cross-functional teams, mentoring junior engineers, and advising leadership on AI/ML adoption, automation strategies, and emerging data technologies.
  • Influence industry practices by presenting technical innovations at leading conferences and guiding enterprise-wide adoption of scalable, AI-powered data engineering solutions.

Qualifications

  • 30 months of related experience; design, develop, test, and deploy software applications and features based on client and project requirements.
  • Experience implementing automated testing and regression testing using Selenium and Python to improve test coverage, reduce manual effort, and ensure application stability.
  • Collaborate with cross-functional teams, including developers, business analysts, and QA leads; participate in Agile/Scrum ceremonies to plan, deliver, and communicate software progress iteratively.
  • Perform data wrangling, transformation, and management to create structured datasets stored in databases, supporting data analyses.

Education

  • Master’s degree in Computer Science, Data Science, Engineering, or related field.

Skills

  • Python
  • Spark
  • Airflow
  • SQL-based data validation frameworks
  • AI/ML, GenAI, NLP
  • Data integration for structured and unstructured data
  • Dashboard and real-time visualization
  • APIs and automation using Python and R
  • Collaboration, mentoring, and leadership communication