Role Summary
Locations: South San Francisco, Cambridge, Seattle. We are seeking a highly skilled individual contributor to design, build, and operate data platforms that support machine learning engineering teams. You will translate poorly defined problems into clear specifications, drive implementations with a focus on metrics, system health, and performance, and promote software development best practices including code quality, documentation, DevOps, and testing. You will ensure robustness of services, act as an escalation point for pipelines and workflows, and engage with the broader open-source community as appropriate.
Responsibilities
- Lead the design, development, and implementation of data frameworks, pipelines and services to support data operations for machine learning engineering teams.
- Collaborate with cross-functional teams to identify requirements and provide technical guidance. Integrate APIs with systems and platforms for seamless data exchange and enhanced system functionality.
- Produce well-engineered software, including appropriate automated test suites, technical documentation, and necessary operations.
- Boldly explore the bleeding edge of GenAI/agentic tools and frameworks to maximize gains while minimizing failures.
- Ensure consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage.
- Consult, educate, and coach/mentor on developer best practices and production standards, participate in code reviews and design sessions.
- Adhere to QMS framework and CI/CD best practices and helps to guide improvements to them that improve ways of working.
- Stay up-to-date on emerging technologies, trends, and best practices. Identify areas for improvement across the stack.
Qualifications
- Required: Bachelor's or graduate degree in Computer Science, focused on software or data engineering, high-performance computing, or machine learning or 5+ years of industry experience in software/data/machine learning engineering.
- Required: 2+ years of industry experience in software engineering.
- Required: Experience in Python.
- Required: Experience in applying CI/CD implementations using git and a common CI/CD stack (e.g. Jenkins, CircleCI, GitLab, Azure DevOps).
- Required: Experience in API design principles, protocols, and tools (REST, GraphQL, Swagger, etc.).
- Required: Experience in data system/ETL design principles, protocols, and tools (REST, GraphQL, Swagger, etc.).
- Required: Experience developing infrastructure using Cloud services in GCP or similar cloud environment.
- Required: Experience working with Docker and Kubernetes, or closely related containerization and cluster computing frameworks.
- Preferred: Candidates with explicit experience working on machine learning orchestration in cloud or HPC environments (using MLFlow, Kubeflow, or similar) will be highly competitive.
- Preferred: Technical leadership and experience leading development projects and/or teams is desirable but not required. Bonus points for demonstrated experience leading efforts in large programs involving multiple cross-functional teams/stakeholders.
- Preferred: Experience upholding software/production best practices, as well as mentorship/coaching is strongly preferred. A willingness and demonstrated ability in these domains are sufficient.
- Preferred: Experience with event-driven architectures and implementing event hooks/triggers in API systems. Familiarity with webhooks, message queues, and pub/sub systems. Ability to design API integrations that enable real-time data updates and notifications.
- Preferred: Agile-minded, with clear proficiency in iterative software development and prototyping.
- Preferred: Candidates who’ve worked on Search products, vector databases, knowledge graphs, or other production AI/ML products—particularly with agentic interfaces and integrations—will be highly competitive.