Role Summary
Data Engineer role focused on designing, developing, and maintaining data pipelines and data solutions for Lilly Medicines Foundry. You will integrate IT/OT systems with cloud data lakehouse architectures (AWS/Azure) to enable advanced analytics and AI/ML capabilities while ensuring data quality, integrity, and regulatory compliance. You will collaborate with business stakeholders, Data Architects, and Data Scientists to deliver data as a product and reusable data domains.
Responsibilities
- Engage with business stakeholders to design, develop, and maintain data pipelines and data solutions that ensure availability and quality of data sets and actionable insights for the Foundry.
- -cover data capture, integration, acquisition, contextualization, and harmonization to deliver data-as-a-product and reusable data domains and products.
- Integrate IT/OT systems with cloud data lakehouse architecture (AWS/Azure) to enable advanced analytics and AI/ML capabilities while ensuring data integrity and regulatory compliance.
- Collaborate closely with the Data Architect, Data Scientists, and other business and IT groups to understand enterprise infrastructure and source systems.
- Analyze large, complex data domains and craft practical solutions for data exploitation via analytics.
- Design, develop, and maintain data solutions for data capture, storage, integration, and analytics in partnership with Tech at Lilly teams.
- Review and provide practical recommendations on design patterns, performance considerations, optimization, database versions, and deployment strategies.
- Ensure data solutions adhere to regulatory requirements including FDA guidelines and GMP.
- Demonstrate knowledge in Data Governance, Master Data Management, and Business Intelligence.
Qualifications
- Required: Bachelorβs degree in Computer Science, Data Science, Engineering or related field
- Required: At least 3 years of experience in statistical methods, data modeling, ETL/ELT, ontology development, semantic graph construction and linked data, relational schema design
- Required: At least 1 year of experience in a pharmaceutical GxP or Scientific environment
- Required: Authorized to work in the United States on a full-time basis (no sponsorship)
- Preferred: 1-3 years of experience designing large-scale data models for functional, operational, and analytical environments (Conceptual, Logical, Physical & Dimensional)
- Preferred: Demonstrated SQL and data modeling proficiency
- Preferred: Experience with data modeling tools such as ER/Studio, Erwin, or TOAD
- Preferred: Experience with cloud platforms (AWS, Azure)
- Preferred: Experience with AI/ML/LLM concepts and building agentic AI solution sets
- Preferred: Experience with data integration such as data streaming, Industrial IoT, MQTT, AMQP, Kafka
- Preferred: Understanding of modern data architecture, data lakehouse, data warehousing, and/or big data concepts
- Preferred: Experience with security models and handling large data sets
- Preferred: Experience with multiple databases (PostgreSQL, Redshift, Aurora, Athena, Neptune, DynamoDB, MongoDB) and 3NF/Dimensional designs
- Preferred: Experience with Agile, CI/CD, GitHub, and automation platforms
- Preferred: Prior pharma or GMP work experience
- Preferred: Solid knowledge of Computer System Validation
- Preferred: Strong problem-solving, learning agility, and cross-functional communication skills
Skills
- Data modeling and SQL proficiency
- Cloud data platforms (AWS, Azure)
- Data governance, master data management, business intelligence
- Experience with data integration technologies and protocols (MQTT, AMQP, Kafka)
- Understanding of data lakehouse and data warehousing concepts
- Regulatory awareness (FDA, GMP)