Eli Lilly and Company logo

Machine Learning Scientist/Sr Scientist, Federated Benchmarking & Validation Engineering

Eli Lilly and Company
Full-time
Remote friendly (San Francisco, CA)
United States
$151,500 - $244,200 USD yearly
Operations

Want to see how your resume matches up to this job? A free trial of our JobsAI will help! With over 2,000 biopharma executives loving it, we think you will too! Try it now β€” JobsAI.

Role Summary

Machine Learning Scientist/Sr Scientist, Federated Benchmarking & Validation Engineering plays an essential role within the TuneLab platform, responsible for identifying, assessing, and implementing algorithmic solutions that leverage diverse datasets while ensuring data privacy and security for biotech partners. This role requires knowledge in small molecule drug development, ADME/Tox, antibody engineering, and/or genetic medicine, combined with data science and statistical analysis to develop models using federated learning. The role focuses on designing algorithms and workflows that accelerate drug discovery and deliver reproducible, generalizable models across deployment scenarios.

Responsibilities

  • Federated Test Set Design: Architect and implement privacy-preserving protocols for constructing representative test sets across distributed partner datasets, ensuring statistical validity while maintaining data isolation.
  • Benchmark Suite Development: Create comprehensive benchmark suites covering small molecules (ADMET, solubility, permeability), antibodies (affinity, stability, immunogenicity), and RNA therapeutics (stability, delivery, off-target effects).
  • Cross-Domain Validation: Develop validation strategies that assess model generalization across different experimental protocols, cell lines, species, and therapeutic indications while respecting partner data boundaries.
  • Public Dataset Integration: Systematically benchmark federated models against public datasets (ChEMBL, PubChem, PDB, Therapeutic Antibody Database) to establish performance baselines and identify gaps.
  • Validation Frameworks: Implement time-split or proper scaffold-split validation protocols that assess model performance on prospective data, simulating real-world deployment scenarios and detecting concept drift.
  • Reproducibility Infrastructure: Build robust MLOps pipelines ensuring complete reproducibility of federated experiments, including versioning of data snapshots, model checkpoints, and hyperparameter configurations.
  • Statistical Rigor: Design statistically powered validation studies accounting for multiple testing, hierarchical data structures, and non-independent observations common in drug discovery datasets.
  • Performance Profiling: Develop comprehensive performance profiling across diverse molecular scaffolds, target classes, and property ranges, identifying systematic biases and failure modes.
  • Platform Integration: Collaborate with engineering teams to integrate validation frameworks with the TuneLab federated learning platform built on NVIDIA FLARE, ensuring scalable and automated testing across partner networks.

Qualifications

  • PhD in Computational Biology, Bioinformatics, Cheminformatics, Computer Science, Statistics, or related field from an accredited college or university
  • Minimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development
  • Strong foundation in experimental design, statistical validation, and hypothesis testing
  • Experience with ML model validation, cross-validation strategies, and performance metrics
  • Proficiency in data engineering, pipeline development, and automation

Additional Preferences

  • Experience with federated learning platforms and distributed computing
  • Knowledge of regulatory requirements for AI/ML in pharmaceutical development
  • Expertise in ADMET assay development and validation
  • Understanding of antibody engineering and characterization methods
  • Familiarity with RNA therapeutic design and delivery systems
  • Experience with clinical biomarker validation and translational research
  • Proficiency in workflow orchestration tools (Airflow, Kubeflow, Prefect)
  • Strong knowledge of containerization and cloud computing (Docker, Kubernetes)
  • Publications on model validation, benchmarking, or reproducibility
  • Experience with GxP compliance and quality management systems
  • Exceptional attention to detail and commitment to scientific rigor
  • Strong technical writing skills for regulatory documentation
  • Portfolio mindset balancing rigorous validation with rapid deployment for partner value

Education

  • PhD in a relevant field (as listed in Qualifications)
Apply now
Share this job