Johnson & Johnson logo

Principal Data Scientist – R&D DSDH - Therapeutics Discovery (TD)

Johnson & Johnson
5 months ago
Remote friendly (Cambridge, MA)
United States
Clinical Research and Development
Principal Data Scientist – R&D DSDH - Therapeutics Discovery (TD)

Responsibilities
- Develop ML/AI models that support discovery workflows, including target prioritization, multi-omics integration, and mechanistic inference.
- Apply modern ML approaches (deep learning, graph learning, foundation models, generative models) to chemical, biological, imaging, and assay datasets.
- Build and optimize models for real-world R&D use cases, ensuring scalability, interpretability, and scientific rigor.
- Design, build, and maintain robust data pipelines to curate, standardize, and integrate diverse R&D datasets (chemical, biological, multi-omics, imaging, biophysical, automation logs, etc.).
- Partner with platform teams to implement best-practice MLOps/DevOps workflows and deploy ML models into production R&D environments.
- Develop tooling to accelerate dataset preparation, feature engineering, and model lifecycle management across TD.
- Work with TD scientists to understand key biological/chemical questions and shape computational strategy.
- Translate sparse, heterogeneous experimental data into insights for hit discovery, mechanism studies, perturbation experiments, and compound optimization.
- Participate in design, interpretation, and iterative refinement of discovery experiments.
- Partner cross-functionally (R&D Data Science, IT, platform engineering, therapeutic area groups) to drive AI/ML adoption.
- Contribute to evaluating new analytical methods, automation technologies, and data platforms; champion data quality, documentation, governance, and reproducibility.

Qualifications
Required
- Master’s or Ph.D. in Computational Biology, Bioinformatics, Data Science, Chemistry, Chemical Biology, Biomedical Engineering, Computer Science, or related field.
- Experience applying ML/AI in scientific domains (drug discovery, biology, chemistry, systems biology, imaging, etc.).
- Strong Python programming skills (preferred) and experience with ML/scientific libraries (PyTorch, TensorFlow, scikit-learn, RDKit, etc.).
- Data engineering experience: data modeling, workflow orchestration, ETL/ELT pipelines, and cloud computing (AWS, GCP, or Azure).
- Ability to work directly with experimental scientists to solve real R&D challenges.

Preferred
- Pharma/biotech discovery experience (target assessment, phenotypic screening, medicinal chemistry workflows, lab automation).
- Familiarity with omics, high-content imaging, chemical structure data, and/or biological assay data.
- Knowledge of data standards (e.g., FAIR, ontologies, controlled vocabularies) and regulated/quality-governed environments.
- Strong communication skills; ability to thrive in a matrixed, multidisciplinary environment.

Required Skills (from posting)
- Advanced Analytics, Critical Thinking, Data Analysis, Data Quality, Data Reporting, Data Science, Data Visualization, Digital Fluency, Process Improvements, Strategic Thinking, Technical Credibility, Workflow Analysis

Preferred Skills (from posting)
- Coaching, Econometric Models, Organizing, Data Privacy Standards, Data Savvy

Benefits (explicitly stated)
- Vacation: 120 hours per calendar year
- Sick time: 40 hours per calendar year (CO: 48; WA: 56)
- Holiday pay (including floating holidays): 13 days per calendar year
- Work, Personal and Family Time: up to 40 hours per calendar year
- Parental Leave: 480 hours within one year of birth/adoption/foster care
- Bereavement Leave: 240 hours immediate family; 40 hours extended family per calendar year
- Caregiver Leave: 80 hours in a 52-week rolling period
- Volunteer Leave: 32 hours per calendar year
- Military Spouse Time-Off: 80 hours per calendar year

Application Instructions
- Candidate interested in EMEA-based locations should apply to posting ID: R-069202