Role Summary
Caris Life Sciences is seeking a creative, driven, and technically strong Data Scientist β Deep learning to join our Computational Pathology team. This role focuses on developing large-scale, generalizable machine learning models that learn rich representations from complex, high-dimensional data to support translational research and biomarker discovery. The successful candidate will play a central role in shaping Carisβs next-generation AI capabilities by designing scalable training pipelines, advancing representation learning approaches, and collaborating closely with scientific and clinical experts. This position is ideal for individuals with a strong background in deep learning, transformer-based architectures, and computational pathology, who are excited about building foundation-level modeling frameworks rather than task-specific solutions.
Responsibilities
- Design, train, and evaluate foundation-style machine learning models that learn robust and reusable representations from large-scale datasets.
- Develop and maintain scalable model training infrastructure using PyTorch and distributed training paradigms (e.g., multi-GPU and multi-node setups).
- Train and adapt transformer-based architectures for representation learning across diverse data sources.
- Apply self-supervised, weakly supervised, and representation learning techniques to leverage partially labeled or unlabeled data.
- Build flexible modeling frameworks capable of integrating multiple data sources and heterogeneous signals.
- Collaborate with pathologists, scientists, and engineers to ensure models are biologically meaningful and aligned with translational research goals.
- Process, curate, and analyze large, complex datasets using efficient and reproducible workflows.
- Support exploratory analyses, downstream modeling, and internal research initiatives using learned representations.
- Contribute to internal technical documentation, research outputs, and long-term modeling strategy.
- Follow best practices in software engineering, experiment tracking, and collaborative model development.
Qualifications
- Required: PhD in Computer Science, Data Science, Computational Biology, Bioinformatics, Engineering, Mathematics, or a related quantitative field with exposure to biological or medical data.
- Required: 0β4 years of experience applying machine learning or deep learning in research or industry settings (postdoctoral experience acceptable).
- Required: Strong understanding of deep learning model training, optimization, and evaluation.
- Required: Hands-on experience with transformer-based models, including both language-focused and vision-focused architectures.
- Required: Proficiency in Python and PyTorch.
- Required: Hands-on experience with distributed training (e.g., PyTorch DDP, multi-GPU or multi-node workflows).
- Required: Experience working in Linux environments and using Git for version control.
- Required: Ability to work with large datasets and complex data pipelines.
- Required: Strong written and verbal communication skills.
- Preferred: Background in computational pathology or experience working with large-scale imaging data.
- Preferred: Experience training large representation models or foundation models.
- Preferred: Familiarity with self-supervised and representation learning techniques, such as contrastive learning, DINO-style approaches, or related methods.
- Preferred: Experience working with multiple data sources in unified modeling frameworks.
- Preferred: Experience with cloud-based machine learning environments, including distributed training workflows (e.g., AWS, SageMaker).
- Preferred: Strong engineering mindset with attention to reproducibility, scalability, and model robustness.
- Preferred: Background in biomedical, translational, or applied research environments.
Additional Requirements
- Physical Demands: This position requires extended periods of computer-based work, along with collaboration with subject matter experts and business partners in person or via remote conferencing.
- Travel: Periodic travel may be required.