Johnson & Johnson logo

Principal Data Scientist – R&D DSDH - Therapeutics Discovery (TD)

Johnson & Johnson
5 months ago
Remote friendly (Spring House, PA)
United States
Clinical Research and Development
About The Role
- Develop and apply advanced Machine Learning (ML) and Data Engineering solutions to accelerate scientific innovation across the drug discovery lifecycle.
- Collaborate with discovery scientists, automation engineers, computational biologists, and platform technology teams to turn complex, multimodal R&D data into actionable insights.

Key Responsibilities
- Machine Learning & Modeling
- Develop ML/AI models for discovery workflows (target prioritization, multi-omics integration, mechanistic inference).
- Use modern ML approaches (deep learning, graph learning, foundation models, generative models) on chemical, biological, imaging, and assay datasets.
- Build/optimize models for real-world R&D use cases with scalability, interpretability, and scientific rigor.
- Data Engineering & Pipeline Development
- Design, build, and maintain data pipelines to curate, standardize, and integrate diverse R&D datasets (chemical, biological, multi-omics, imaging, biophysical, automation logs, etc.).
- Implement best-practice MLOps/DevOps workflows and deploy ML models into production R&D environments.
- Build tooling to accelerate dataset preparation, feature engineering, and model lifecycle management across TD.
- Scientific Partnership
- Partner with TD scientists to understand biological/chemical questions and shape computational strategy.
- Translate sparse, heterogeneous experimental datasets into insights for hit discovery, mechanism studies, perturbation experiments, and compound optimization.
- Contribute to design, interpretation, and iterative refinement of discovery experiments.
- Innovation & Collaboration
- Partner cross-functionally to drive AI/ML adoption.
- Evaluate new analytical methods, automation technologies, and data platforms.
- Champion data quality, documentation, governance, and reproducibility.

Qualifications
Required
- Master’s or Ph.D. in Computational Biology, Bioinformatics, Data Science, Chemistry, Chemical Biology, Biomedical Engineering, Computer Science, or related field.
- Experience applying ML/AI in scientific domains (drug discovery, biology, chemistry, systems biology, imaging, etc.).
- Strong programming skills in Python (preferred) and experience with scientific/ML libraries (PyTorch, TensorFlow, scikit-learn, RDKit, etc.).
- Data engineering experience: data modeling, workflow orchestration, ETL/ELT pipelines, and cloud computing (AWS, GCP, or Azure).
- Ability to work directly with experimental scientists to solve R&D challenges.
Preferred
- Pharma/biotech discovery experience (target assessment, phenotypic screening, medicinal chemistry workflows, lab automation).
- Familiarity with omics, high-content imaging, chemical structure data, or biological assay data.
- Knowledge of data standards (e.g., FAIR, ontologies, controlled vocabularies) and work in regulated/quality-governed environments.
- Strong communication skills in a matrixed, multidisciplinary environment.

Required Skills
- Advanced Analytics, Critical Thinking, Data Analysis, Data Quality, Data Reporting, Data Science, Data Visualization, Digital Fluency, Process Improvements, Strategic Thinking, Technical Credibility, Workflow Analysis

Preferred Skills
- Coaching, Data Privacy Standards, Data Savvy, Econometric Models, Organizing

Benefits
- Vacation – 120 hours per calendar year
- Sick time – 40 hours per calendar year (Colorado: 48; Washington: 56)
- Holiday pay (including Floating Holidays) – 13 days per calendar year
- Work, Personal and Family Time – up to 40 hours per calendar year
- Parental Leave – 480 hours within one year of birth/adoption/foster care
- Bereavement Leave – 240 hours for immediate family; 40 hours for extended family per calendar year
- Caregiver Leave – 80 hours in a 52-week rolling period
- Volunteer Leave – 32 hours per calendar year
- Military Spouse Time-Off – 80 hours per calendar year

Application Instructions
- Candidate interested in EMEA based locations, please apply to R-069202.