Role Summary
The Associate Principal Architect, Data and Analytics Platforms will support the design and creation of cloud-based products and services to meet the technology needs of Exelixis. This position will help define and design our cloud platform, which includes cloud automation tools and standards, CI/CD pipelines, DevOps tooling and AWS account provisioning automation. This position will also work with application support teams to re-define the interface between our cloud platform products and application installation and configuration processes.
Responsibilities
- Create and lead the implementation of robust, scalable, and secure data architecture strategies on AWS and Databricks to support advanced analytics and AI/ML initiatives.
- Design and implement data ingestion, processing, storage, and reporting solutions leveraging AWS services like S3, Glue, Redhsift, Lake Formation, Athena Kinesis, MSK, EC2, ECS, Data Bricks
- Provide expert guidance and hands-on development on the AWS and Databricks data technologies
- Develop and maintain conceptual, logical, and physical data models, and design data lakes and data warehouses for optimal performance.
- Establish and enforce data governance policies, data quality standards, security protocols, and compliance frameworks (e.g., FAIR, GxP, HIPAA, IAM, Data Mesh).
- Implement and manage AWS Lake Formation and Databricks Unity Catalog to enable centralized, fine-grained data governance across all data assets.
- Design and develop scalable and resilient application compute infrastructure using AWS EKS and ECS
- Lead the development of automated CI/CD pipelines for notebooks, jobs, and infrastructure-as-code for AWS, DBT, and Databricks environment.
- Utilize tools like GitHub Actions or Jenkins to automate deployments and promote consistent releases.
- Manage and optimize AWS, Databricks and other data platform resource utilization to maintain cost-effectiveness
- Design and implement a multi-layered Medallion architecture (Bronze, Silver, Gold) using Delta Lake to ensure data quality and reliability.
- Develop and publish data engineering best practices and patterns
Qualifications
- Proven experience architecting and implementing lake house architectures on AWS data technologies (Glue, Lake Formation, RedShift, Athena) and Databricks
- Proven experience designing, developing, and managing data engineering pipelines at scale
- Expertise in various data modeling techniques (conceptual, logical, and physical)
- Expertise in programming languages and frameworks such as DBT, Python, SparkSQL and SQL
- Deep knowledge of the AWS services, including IAM, security, data storage, data processing, compute, and analytics services.
- Strong understanding of modern data architecture patterns (e.g., data mesh, lakehouse).
- Hands-on experience with big data technologies like Apache Spark, AWS Glue, DataBricks, Apache Iceberg, and Delta Tables
- Certifications: Professional AWS and Data Bricks certifications are highly valued and relevant.
Skills
- Knowledge of Amazon Web Services (AWS) hosting services (EC2, RDS, Systems Manager, IAM, etc.)
- Infrastructure as code tools, especially Terraform
- Cloud and infrastructure automation
- Continuous integration and continuous deployment practices
Education
- Master's degree in a relevant field and 9 years of experience; or
- Bachelorβs degree in a relevant field and 11 years of related experience; or
- Equivalent combination of education or years of experience
- Technical certification may be required
Experience
- At least 10 years of experience in cloud infrastructure in a highly available and production environment.
- Experience working in a SOX and FDA regulated environment is a plus.
Additional Requirements
- Advanced level knowledge of Amazon Web Services (AWS), especially with hosting services like EC2, RDS, Systems Manager, IAM
- Advanced level knowledge with infrastructure as code (IaC) tools, especially Terraform
- Intermediate level knowledge with cloud and infrastructure automation
- Intermediate level knowledge of modern development and deployment practices of CI/CD