Role Summary
Associate Director, Statistical Data Scientist to lead and execute the development, validation and automation of analytical pipelines and statistical models that support metadata-driven clinical data processing, reporting, and regulatory submissions. Hands-on technical leadership role focusing on coding, problem-solving, and cross-functional collaboration to bring rigor, reproducibility, and automation to clinical reporting workflows.
Responsibilities
- Lead the design, development, and validation of R/Python code to automate generation of analytical datasets and TLFs within a metadata-driven pipeline.
- Translate SAPs and metadata specifications (YAML/CSV) into executable and reproducible code.
- Build and validate R packages and data science tools supporting both exploratory and confirmatory analyses, ensuring full traceability and audit readiness.
- Implement and validate statistical models (e.g., MMRM, ANCOVA, logistic regression) using R packages such as mmrm, emmeans.
- Collaborate with IT to integrate data science and statistical programming workflows within Databricks and CI/CD pipelines for continuous validation and reproducibility.
- Collaborate across programming, biostatistics, and data standards functions to ensure dataset definitions, derivations, and metadata align with controlled standards.
- Conduct peer code reviews, unit testing, and automated validation; ensuring deliverables meet submission-quality and reproducibility standards.
- Mentor and guide team members in best practices for programming, validation, and automation.
Qualifications
- Bachelor’s or Master’s degree in Statistics, Biostatistics, Data Science, or a related field with 8+ years of statistical programming experience in the pharmaceutical/biotech industry including hands-on experience with R and/or Python.
- Proven experience preparing or supporting R-based regulatory submissions (e.g., R package validation, R-based analysis delivery, or submission readiness).
- Strong understanding of CDISC ADaM and SDTM data structures, and their use in analytical workflows.
- Experience developing and validating reusable R/Python libraries and functions.
- Proficiency with Git, Bitbucket, and CI/CD automation pipelines.
- Working knowledge of GxP and Part 11 compliance.
- Preferred familiarity with YAML/JSON configuration and metadata-driven programming workflows.
- Prior experience migrating from SAS to R/Python environments.
- Knowledge of R validation frameworks (e.g., risk-based testing, reproducibility documentation).
- Experience with exploratory analytics or visualization in R or Python within a regulated framework strongly preferred.
- Excellent documentation and validation practices.
- Collaborative and proactive mindset; able to operate independently in a small, agile team.
- The physical and mental requirements include regular computer use, clear communication, and focus. Reasonable accommodations may be made to enable individuals with disabilities to perform these functions.
Skills
- R and Python programming for analytical datasets and modeling
- Statistical modeling (MMRM, ANCOVA, logistic regression)
- CDISC ADaM/SDTM knowledge
- Data engineering integration with Databricks and CI/CD
- Software development practices: version control, testing, validation, documentation
- Metadata-driven workflow design
Education
- Bachelor’s or Master’s degree in Statistics, Biostatistics, Data Science, or a related field