Principal Scientist, Data Science – R&D DSDH - Therapeutics Development & Supply (TDS)
Position Summary
- Design, build, and optimize data capture, processing, and storage solutions to enable advanced analytics, digital process transformation, and AI/ML applications across the development-to-supply continuum for TDS.
- Hands-on technical contributor delivering AI-ready data pipelines and data products across Process Development, Manufacturing, Supply Chain, Quality, and Digital/Data Science teams.
- Create robust, future-proof data systems, engineering workflows, and high-value data repositories supporting scientific, technical, and operational decision-making.
Key Responsibilities
- Data Engineering & Pipeline Development
- Design, build, and maintain scalable data pipelines to acquire, integrate, and manage TDS data from lab systems, MES, clinical supply, quality systems, and external partners.
- Create and optimize data flows for structured and unstructured data using Python, R, SQL, cloud services, and modern engineering tools.
- Develop and maintain TDS-specific data repositories with enterprise-level data models.
- Enable AI/ML readiness via well-structured, versioned, traceable data semantically aligned to enterprise standards.
- Data Product & Architecture Partnership
- Translate business needs into data products and engineering requirements with data scientists and domain experts.
- Implement semantic models and future-proof architectures with ontology/knowledge graph teams.
- Quality, Compliance & Performance
- Implement data quality/performance standards; define KPIs for accuracy, completeness, and consistency.
- Use data versioning and lineage tracking for compliance, traceability, and audit readiness.
- Follow software best practices (code versioning, DevOps, documentation).
- Cross-Functional Collaboration
- Engage stakeholders to design solutions and drive adoption.
- Manage multiple concurrent projects and deliver maximum business value.
Qualifications
- Required
- Advanced degree (Engineering, Data Science, Life Sciences, Computer Science, or related); advanced degree preferred.
- 3+ years in data engineering, including data modeling and database design (scientific, manufacturing, or healthcare preferred).
- Proficiency with Python, R, SQL, and cloud architectures (AWS services, Snowflake, Redshift).
- Experience with NoSQL and graph databases.
- Strong analytical/problem-solving and stakeholder-management skills.
- Ability to drive multiple projects with strong organization and adaptability.
- Preferred
- Regulated/standards-driven data experience (CDISC, HL7, FHIR, OMOP, DICOM, manufacturing/quality standards).
- Experience with high-dimensional data (e.g., imaging, sensor data).
- Knowledge/principles supporting MLOps and model deployment workflows.
- Familiarity with manufacturing systems (MES), laboratory information systems, or industrial data systems.
- Exposure to knowledge graph or ontology-driven architectures.
Required/Preferred Skills
- Advanced Analytics, Critical Thinking, Data Analysis, Data Quality, Data Reporting, Data Privacy Standards, Data Savvy, Data Science, Data Visualization, Digital Fluency, Workflow Analysis, Technical Credibility, Strategic Thinking, Process Improvements, Organizing
Benefits (explicitly stated)
- Vacation: 120 hours per calendar year
- Sick time: 40 hours per calendar year (Colorado: 48; Washington: 56)
- Holiday pay (including floating holidays): 13 days per calendar year
- Work, Personal and Family Time: up to 40 hours per calendar year
- Parental Leave: 480 hours within one year of birth/adoption/foster care
- Bereavement Leave: 240 hours immediate family (and 40 hours extended family per calendar year)
- Caregiver Leave: 80 hours in a 52-week rolling period
- Volunteer Leave: 32 hours per calendar year
- Military Spouse Time-Off: 80 hours per calendar year
Application Instructions
- Candidate interested in US-based locations: apply with Requisition ID R-069212.