Role Summary
We are seeking a Senior Data Engineer to join our data engineering group at Caris Life Sciences. In this role, you’ll play a critical part in shaping and advancing our data ecosystem to support cutting-edge cancer research and data licensing initiatives. We’re looking for a detail-oriented, self-motivated engineer who’s passionate about enabling precision medicine through innovative data solutions.
Responsibilities
- Collaborate closely with other data engineers, computational scientists, and researchers to make complex, multimodal data easily accessible for scientific discovery.
- Maintain and enhance our AWS-based data platforms (Glue, Athena, S3) while evaluating and implementing new tools and approaches for data delivery.
- Design, build, and optimize data pipelines that integrate diverse data sources into a scalable and secure data lake.
- Continuously improve data architecture, automation, quality control, and testing processes.
- Proactively troubleshoot, optimize, and modernize existing systems to ensure reliability and performance.
- Contribute to best practices in data engineering, documentation, and cross-team knowledge sharing.
- Assist with architecting solutions having scalability in mind to support future growth in data volume and complexity.
- Provide technical mentorship (when needed) to data engineers and contribute to team development.
Qualifications
- Required: Bachelor’s degree in Computer Science or a related technical field, or equivalent practical experience.
- Required: 6+ years of software development experience, including at least 3 years focused on data engineering.
- Required: Proficiency with Python and experience working with data frames for transformation and analysis.
- Required: Hands-on experience with relational (SQL) and NoSQL databases.
- Required: Solid understanding of cloud platforms (preferably AWS) and ETL/ELT pipeline development.
- Required: Familiarity with CI/CD for data workflows, Git, and infrastructure as code (e.g., Terraform, CloudFormation).
- Required: Strong communication skills and the ability to work effectively in cross-functional teams.
Preferred Qualifications
- Deep technical expertise with modern data engineering technologies, including distributed computing frameworks (e.g., Apache Spark, Dask, AWS EMR).
- Experience building and optimizing large-scale data pipelines, architectures, and datasets.
- Proficiency in data modeling (e.g., dimensional modeling, data vault, star schema).
- Experience with data observability, including monitoring, logging, alerting, and automated testing.
- Familiarity with metadata, lineage tracking, and workflow orchestration tools (e.g., Metaflow or similar).
- Practical experience with AWS services such as Glue, Athena, S3, and Lambda.
- Passion for advancing cancer research and familiarity with genomic data (e.g., DNA/RNA sequencing).
Education
- None specified beyond the required degree qualifications.
Additional Requirements
- Physical Demands: Ability to sit, stand, and work at a computer for extended periods.