Role Summary
The Staff Data Architect - AI supports the design and implementation of modern data architecture with a focus on enabling AI and machine learning capabilities across our bio-manufacturing operations.
Responsibilities
- Develop solutions by studying data needs, analyzing user requirements, and following Regeneron software development lifecycle.
- Create design and documentation of data architecture standards with a focus on building infrastructure ready for AI/ML workloads.
- Examine processes and systems to optimize, consolidate and analyze diverse data sets including structured, semi-structured and unstructured.
- Create the technical documentation of solutions utilizing standards, templates, and procedures.
- Design scalable data pipelines that feed predictive and generative AI models, as well as process monitoring tools.
- Lead the development and maintenance of enterprise data models and reference architectures, with an emphasis on clean, well-structured data that AI systems can reliably consume.
- Implement cloud-native data infrastructure (AWS, Azure) including data lakes, feature stores, and model serving layers.
- Collaborate with data and system owners, data scientists, and AI users to understand data requirements and ensure the architecture supports both current and future AI use cases.
- Participate in architecture reviews, documenting design decisions and flagging potential risks or gaps.
- Learn and apply governance and data integrity standards in a GxP environment.
- Assist in the technical documentation of solutions utilizing standards, templates, and procedures. Independently manage small project related assignments ensuring on time delivery.
Qualifications
- Knowledge of data modeling, database design, and data pipeline development.
- Hands-on experience or strong academic/project exposure to architecture design and implementation of data pipelines in Azure and AWS. Experience with Databricks is a plus.
- Experience with AI/ML data concepts such as feature engineering, data versioning, model training pipelines, or vector databases is a strong plus.
- Understanding of integration patterns and APIs for connecting disparate data sources.
- Curiosity about how modern data approaches — data mesh, data fabric, lake house architectures — can support AI at scale.
- Strong communication skills and a willingness to collaborate across technical and operational teams.
- Experience with Version Control Software (SVN, Git, etc.).
- Quality focused with strong attention to detail.
- Staff: 10+ years relevant experience.
- Senior Staff: 12+ years of relevant experience.
- Experience in biotech, pharmaceutical, or other life sciences industries preferred.
- Cloud platform experience (AWS, Azure), workflow orchestration tools (Airflow, Luigi, Prefect, or similar), containerization technologies and scientific data management systems and experience with using GenAI to enhance own work.
Education
- BA/BS in Computer Science, Bioinformatics, or related field.