Pfizer logo

AI Data Engineer--In Vivo Data

Pfizer
2023 years ago
Remote friendly (Cambridge, MA)
United States
$106,000 - $171,500 USD yearly
IT

Role Summary

As a member of our cross-functional Data Ecosystem Team, you will help build and scale an AI-ready data architecture supporting In-Vivo biology labs. In this role, you will leverage your expertise to design innovative software solutions that extract valuable insights from Pfizer's proprietary data and external datasets, enabling the generation of testable hypotheses across the entire drug discovery value chain.

Responsibilities

  • Development and implementation of a data platform to enable efficient and scalable correlation and analysis of in vitro and in vivo data
  • Development of innovative data products and machine learning methods for data to support translational studies together with machine learning experts within Pfizer
  • Processing, analysis and integration of internal in vivo pharmacodynamics and toxicology data sets
  • Curation and integration of relevant datasets from the public domain
  • Development of data analysis pipelines
  • Development and roll out of data products engineered to meet specific data access patterns
  • Implementation, testing and validation of new methods for data analysis and visualization techniques
  • Drive collaborations with external companies and academic institutions
  • Develop Pfizer in vivo data capture, metadata tagging and storage strategy along with Pfizer’s Digital organization
  • Onboarding of Pfizer colleagues to the data platform and organization of workshops, hackathons, trainings and scientific talks
  • Strengthen external visibility and scientific excellence through publishing / presenting work in reputed journals and conference/workshop venues and engaging with the scientific community

Qualifications

  • Required: PhD in Biology, Pharmacology, Toxicology, Computer Science, Physics, Statistics, or a related technical discipline OR Master’s degree and 2+ years of experience building AI powered research applications
  • Required: Experience in In-Vivo Pharmacology
  • Required: Strong background in data handling, integration and analysis
  • Required: Thorough understanding of drug discovery and biology with a particular focus on in vivo / in vitro translational research
  • Required: Research experience in developing data products and data integration solutions
  • Required: Experience solving complex analyses/problems in a timely fashion
  • Required: Exceptional programming skills in Python
  • Required: Strong full-stack development experience with focus on python, in-depth database expertise with a focus on postgres and ETL frameworks
  • Required: Strong communication skills—verbal, written, and presentation
  • Preferred: Nextflow pipeline development experience
  • Preferred: Hands-on experience handling, processing, integrating, and analyzing large heterogeneous data sets in a drug discovery research environment
  • Preferred: Proficiency in front-end technologies such as typescript, reactjs and browser-based visualization techniques
  • Preferred: Proficiency utilizing AI/ML libraries including PyTorch and Lightning is a plus
  • Preferred: Experience with LLMs/RAG systems
  • Preferred: Proven expertise in software engineering, package development, cloud architectures, CI/CD and software engineering tooling
  • Preferred: Familiarity with pertinent libraries within the Python scientific stack
  • Preferred: Experience with Claude Code or equivalent and vibe coding paradigms
  • Preferred: Strong publication record and demonstrated contributions to the field
  • Preferred: Experience taking ideas from prototype to production