Principal Data Scientist – R&D DSDH - Preclinical Sciences & Translational Safety (PSTS)
Position Summary
- Leverage advanced machine learning and robust data engineering to generate actionable insights within Pharmaceutical Sciences & Translational Safety (PSTS).
- Create AI-ready datasets, develop predictive models, and deliver analytical solutions to improve safety evaluations and facilitate translational research.
Key Responsibilities
- Machine Learning & Modeling
- Develop and deploy ML/AI models for safety signal detection, dose selection, PK/PD modeling, toxicology insights, and translational interpretation.
- Use representation learning, predictive modeling, and multivariate analytics across in vivo studies, in vitro assays, exposure-response data, and pathology information.
- Partner with scientific SMEs to align modeling strategies with PSTS decision points.
- Apply model governance, versioning, and validation standards.
- Data Engineering & Pipeline Development
- Build and maintain scalable pipelines integrating toxicology studies, PK/PD datasets, biomarker readouts, and animal study repositories.
- Transform raw experimental outputs into standardized, analysis-ready, AI-ready datasets using Python, R, and cloud-native services.
- Collaborate on harmonized scientific data models with data engineering and ontology teams.
- Scientific Domain Integration
- Translate study designs into computational requirements with toxicology, DMPK, and safety stakeholders.
- Apply mechanism-based toxicology, exposure-response concepts, and in vivo study structures to guide data transformations and modeling.
- Improve cross-study comparability using standardized terminologies, metadata practices, and quality checks.
- Ensure high-quality, scalable data solutions with PSTS functional experts, Data Science teams, and platform architects.
Qualifications
Required
- MS or PhD in Data Science, Computational Biology, Toxicology, Pharmacology, Biomedical Engineering, Computer Science, or related field.
- 3+ years applying machine learning and/or data engineering to scientific/biomedical datasets.
- Proficiency with Python and/or R, SQL, and modern data engineering tooling (cloud computing, workflow orchestration, version control).
- Experience developing, evaluating, and deploying ML model pipelines.
- Experience with biological, toxicology, PK/PD, or in vivo datasets.
Preferred
- Safety sciences experience; ADME/DMPK, toxicogenomics, or biomarker analytics.
- Familiarity with scientific data formats (assay outputs, histopathology, PK time-course).
- Exposure to ontologies/semantic technologies/knowledge graph integration.
- Cloud data architecture experience (AWS S3, Snowflake, Redshift).
- Understanding of regulatory data standards (SEND, CDISC).
Required/Preferred Skills (as listed)
- Required Skills: Advanced Analytics, Data Analysis, Data Quality, Data Reporting, Data Science, Data Visualization, Critical Thinking, Strategic Thinking, Technical Credibility.
- Preferred Skills: Coaching, Data Privacy Standards, Data Savvy, Digital Fluency, Econometric Models, Organizing, Process Improvements, Workflow Analysis.
Benefits (explicitly stated)
- Vacation: 120 hours/calendar year
- Sick time: 40 hours/calendar year (CO: 48; WA: 56)
- Holiday pay (Floating Holidays): 13 days/calendar year
- Work, Personal and Family Time: up to 40 hours/calendar year
- Parental Leave: 480 hours within one year of birth/adoption/foster care
- Bereavement Leave: 240 hours (immediate family) / 40 hours (extended family) per calendar year
- Caregiver Leave: 80 hours in a 52-week rolling period
- Volunteer Leave: 32 hours/calendar year
- Military Spouse Time-Off: 80 hours/calendar year
Application Instructions
- Candidate interested in Europe-based locations: apply to R-069190.