Responsibilities:
- Design and implement data pipelines and ETLs that process protein measurement data at scale, turning instrument outputs into reliable, query-able scientific results.
- Improve the architecture of existing cloud systems: identify structural weaknesses, propose better approaches, and drive implementation alongside the technical lead.
- Maintain and evolve the APIs and database schemas that serve internal teams including bioinformatics, science, and product development, adapting as needs grow.
- Contribute to the teamโs DevOps practice: optimize AWS costs, manage cloud deployments, improve system security, and drive performance improvements through infrastructure changes.
- Work cross-functionally with scientific and software teams to define data quality metrics, understand downstream consumer usage, and ensure the platform meets their needs.
- Surface and advocate for changes to project priorities and architecture across the cloud pipeline and adjacent projects.
Requirements:
- 7+ years of relevant experience in a software engineering organization, delivering production-quality systems.
- Bachelorโs degree in Computer Science or related field, or equivalent practical experience.
- Fluency in multiple programming languages; currently invested in Python for data pipelines.
- Solid experience with AWS cloud infrastructure, including cost management and deployment practices.
- Experience with CI/CD pipelines and infrastructure-as-code (e.g., Terraform, CDK).
- Experience with relational and non-relational database design.
- Demonstrated experience building and maintaining data pipelines/ETL systems at production scale.
- Ability to independently pick up new technologies across domains.
- Strong communication skills across engineering, science, and product stakeholders.
- Ability to identify when a change in direction is necessary.
- Familiarity with AI-driven development tools and methodologies.
Nice to Haves:
- Docker and container orchestration (Kubernetes, ECS).
- Workflow orchestration tools (e.g., Nextflow, Step Functions, Airflow, Prefect).
- Data observability, pipeline monitoring, or data quality frameworks.
- Background in biotech/life sciences/scientific data processing.
- Familiarity with NoSQL data stores and when to use them alongside relational databases.