Caris Life Sciences logo

Senior Data Engineer (Dallas, TX -or- Phoenix, AZ)

Caris Life Sciences
Full-time
Remote friendly (Tempe, AZ)
United States
IT

Want to see how your resume matches up to this job? A free trial of our JobsAI will help! With over 2,000 biopharma executives loving it, we think you will too! Try it now — JobsAI.

Role Summary

Senior Data Engineer to join our data engineering group at Caris Life Sciences. You’ll play a critical part in shaping and advancing our data ecosystem to support cutting-edge cancer research and data licensing initiatives. We’re looking for a detail-oriented, self-motivated engineer who’s passionate about enabling precision medicine through innovative data solutions.

Responsibilities

  • Collaborate closely with other data engineers, computational scientists, and researchers to make complex, multimodal data easily accessible for scientific discovery.
  • Maintain and enhance our AWS-based data platforms (Glue, Athena, S3) while evaluating and implementing new tools and approaches for data delivery.
  • Design, build, and optimize data pipelines that integrate diverse data sources into a scalable and secure data lake.
  • Continuously improve data architecture, automation, quality control, and testing processes.
  • Proactively troubleshoot, optimize, and modernize existing systems to ensure reliability and performance.
  • Contribute to best practices in data engineering, documentation, and cross-team knowledge sharing.
  • Assist with architecting solutions having scalability in mind to support future growth in data volume and complexity.
  • Provide technical mentorship (when needed) to data engineers and contribute to team development.

Qualifications

  • Bachelor’s degree in Computer Science or a related technical field, or equivalent practical experience.
  • 6+ years of software development experience, including at least 3 years focused on data engineering.
  • Proficiency with Python and experience working with data frames for transformation and analysis.
  • Hands-on experience with relational (SQL) and NoSQL databases.
  • Solid understanding of cloud platforms (preferably AWS) and ETL/ELT pipeline development.
  • Familiarity with CI/CD for data workflows, Git, and infrastructure as code (e.g., Terraform, CloudFormation).
  • Strong communication skills and the ability to work effectively in cross-functional teams.

Preferred Qualifications

  • Deep technical expertise with modern data engineering technologies, including distributed computing frameworks (e.g., Apache Spark, Dask, AWS EMR).
  • Experience building and optimizing large-scale data pipelines, architectures, and datasets.
  • Proficiency in data modeling (e.g., dimensional modeling, data vault, star schema).
  • Experience with data observability, including monitoring, logging, alerting, and automated testing.
  • Familiarity with metadata, lineage tracking, and workflow orchestration tools (e.g., Metaflow or similar).
  • Practical experience with AWS services such as Glue, Athena, S3, and Lambda.
  • Passion for advancing cancer research and familiarity with genomic data (e.g., DNA/RNA sequencing).

Physical Demands

  • Ability to sit, stand, and work at a computer for extended periods.