Position Summary
The R&D Data Science organization is seeking a Data Scientist – Data Engineer to design, build, and optimize data capture, processing, and storage solutions enabling advanced analytics, digital process transformation, and AI/ML applications across the development-to-supply continuum for Therapeutics Development & Supply (TDS).
Key Responsibilities
Data Engineering & Pipeline Development
- Design, build, and maintain scalable data pipelines to acquire, integrate, and manage TDS data from diverse sources/systems (e.g., lab systems, MES, clinical supply, quality systems, external partners).
- Create and optimize data flows for structured and unstructured data using Python, R, SQL, cloud services, and modern engineering tools.
- Develop and maintain TDS-specific data repositories using enterprise-level data models.
- Enable AI/ML readiness via well-structured, versioned, traceable, semantically aligned data.
Data Product & Architecture Partnership
- Partner with data scientists, TDS domain experts, and digital technology teams to translate business needs into data products and engineering requirements.
- Work with ontology/knowledge graph teams to implement semantic models and future-proof data architectures.
Quality, Compliance & Performance
- Implement data quality/performance standards; define KPIs for accuracy, completeness, and consistency.
- Apply data versioning and lineage tracking for compliance, traceability, and audit readiness.
- Follow software best practices (code versioning, DevOps integration, documentation).
Cross-Functional Collaboration
- Engage scientific/technical/operations stakeholders to understand requirements, design solutions, and drive adoption.
- Support multiple concurrent projects, managing priorities and delivering business value across the TDS network.
Qualifications
Required
- Advanced degree in Engineering, Data Science, Life Sciences, Computer Science, or related field (advanced degree preferred).
- 3+ years of experience in data engineering, including data modeling and database design (preferably in scientific, manufacturing, or healthcare environments).
- Proficiency with Python, R, SQL, and cloud-based architectures (e.g., AWS, Snowflake, Redshift).
- Experience with NoSQL and graph databases.
- Strong analytical, problem-solving, and stakeholder-management skills; ability to translate discussions into actionable requirements.
- Ability to drive multiple projects concurrently with strong organization and adaptability.
Preferred
- Experience with regulated/standards-driven data environments (e.g., CDISC, HL7, FHIR, OMOP, DICOM; manufacturing/quality data standards).
- Familiarity with high-dimensional data (e.g., imaging, sensor data).
- Experience with MLOps/model deployment workflows.
- Knowledge of manufacturing systems (MES), laboratory information systems, or industrial data systems.
- Exposure to knowledge graphs or ontology-driven architectures.
Benefits (time off)
- Vacation: 120 hours/year; Sick time: 40 hours/year (48 in CO; 56 in WA); Holiday pay incl. floating holidays: 13 days/year.
- Work, Personal and Family Time: up to 40 hours/year.
- Parental Leave: 480 hours within one year of birth/adoption/foster care.
- Bereavement Leave: 240 hours (immediate family) and 40 hours (extended family) per year.
- Caregiver Leave: 80 hours in a 52-week rolling period.
- Volunteer Leave: 32 hours/year.
- Military Spouse Time-Off: 80 hours/year.
Application Instructions
- Candidate interested in US-based locations, please apply to R-069212.