Key Responsibilities
- Develop and deploy agentic AI applications enabling natural language interaction with clinical data.
- Ground AI outputs in validated biological knowledge (e.g., RAG anchored in HPO, Gene Ontology, MeSH, DrugBank; clinical trial registries; curated pathway databases).
- Deploy unsupervised/self-supervised learning (clustering, representation learning, contrastive learning) to discover latent patient archetypes and molecular disease subtypes.
- Deploy survival models and dynamic treatment regime estimators using combined clinical and omics features.
- Build AI tooling to harmonize heterogeneous trial and biobank datasets to common representations.
- Evaluate and monitor model performance, safety, and reliability in production.
- Manage vendors/contractors and partner relationships with relevant teams.
Post-Trial Data Research & Analysis
- Build pipelines for locked clinical trial databases (SDTM, ADaM) for secondary/exploratory research beyond primary endpoints.
- Apply ML to identify trial subgroup effects, treatment heterogeneity, and responder/non-responder signatures.
- Use NLP to mine adverse event narratives, clinical notes, and investigator comments to surface latent safety signals.
- Reconstruct longitudinal patient trajectories to model disease progression, drug response kinetics, and time-to-event outcomes.
- Architect cross-trial integrative and meta-analytic workflows.
- Connect findings to large-scale biobank cohorts (UK Biobank, All of Us, etc.) for validation/enrichment.
Research Rigor, Reproducibility & Governance
- Establish reproducible research data management (data versioning, containerized compute, audit-ready logs).
- Ensure compliance with HIPAA, GDPR, and relevant IRB/ethics requirements.
Basic Qualifications
- M.S. (or equivalent) in Biomedical Informatics/Computational Biology/Bioinformatics/Statistical Genetics/Epidemiology/Computer Science or related field; or MD/PhD with equivalent depth; plus 6+ years research experience with clinical trial (SDTM/ADaM), biobank, or population health data.
- Or Ph.D. (or equivalent) in the above fields; or MD/PhD; plus 3+ years relevant research experience.
Additional Skills & Preferences
- Production AI tool usage for clinical data analysis.
- Proficiency in Python and/or R; strong SQL; cloud/HPC experience (DNAnexus, AWS, GCP, Azure, HPC).
- Generative AI/foundation model experience; LLM fine-tuning or training.
- Knowledge of CDISC (SDTM, ADaM).
- ML experience (survival analysis, causal inference, NLP, deep learning).
- Understanding of OMOP CDM, HL7 FHIR Genomics, biomedical ontologies.
- Experience with major biobanks (UK Biobank, All of Us).
- Experience with federated learning/differential privacy/secure computation.
- Peer-reviewed publications; knowledge of target trial framework.
- Familiarity with pharmacogenomics/PK/PD and knowledge graphs/graph ML/ontology reasoning.
- Hands-on multi-omic data analysis.