Role Summary
We’re seeking a highly skilled and hands-on AI Solution Architect to join our team and help drive the development of a cutting-edge AIOps platform. This individual will play a pivotal role in designing and implementing AI-driven solutions that transform IT Operations—enhancing automation, observability, and incident response across complex enterprise environments. As a key member of our architecture team, you will collaborate with data scientists, engineers, and IT stakeholders to architect AI-driven solutions that proactively detect, diagnose, and resolve operational issues across complex infrastructure and application landscapes.
Responsibilities
- Architect and lead the development of an enterprise-grade AIOps platform leveraging machine learning, deep learning, and advanced analytics.
- Design scalable AI pipelines for anomaly detection, predictive analytics, root cause analysis, and intelligent alerting.
- Collaborate cross-functionally with engineering, DevOps, and IT teams to integrate AI capabilities into existing operational workflows and tools.
- Evaluate and select appropriate technologies, frameworks, and cloud-native services in Azure and GCP to support real-time data ingestion, processing, and model deployment.
- Ensure platform reliability and performance, with a focus on scalability, security, and maintainability.
- Mentor and guide engineering teams on best practices in AI architecture and model lifecycle management.
- Stay current with emerging trends in AIOps, MLOps, and IT automation to continuously evolve the platform.
Qualifications
- Required: Bachelor’s or master’s degree in computer science, Data Science, Machine Learning, or related field
- Required: Experience as a Solution Architect or AI/ML Architect in enterprise environments
- Required: 3+ years of hands-on experience building AIOps platforms and solutions in IT environments
- Required: Experience in IT Operations, infrastructure monitoring, incident management, and observability tools
- Required: Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch, Scikit-learn) and data engineering tools (e.g., Spark, Kafka, Airflow) and knowledge graphs
- Required: Experience in Python and familiarity with key AI/ML libraries (e.g., TensorFlow, PyTorch, HuggingFace)
- Required: Experience with cloud platforms such as Azure and Google Cloud Platform (GCP), including their native Data and AI/ML toolsets and services like ADLS, Azure Machine Learning, Azure Foundry, GCS and GCP Vertex AI. Experience with container orchestration technologies like Kubernetes, and hands-on expertise in working with Large Language Models (LLMs) and Generative AI tools
- Preferred: Experience with log and metrics analysis, time-series forecasting, and NLP for IT ticket classification
- Preferred: Knowledge of MLOps practices for model deployment, monitoring, and governance
- Preferred: Exposure to tools like ServiceNow, Datadog, Prometheus, Grafana, ELK Stack, etc
- Preferred: Certified in AI engineering in Azure and GCP
Skills
- AI/ML architecture and model lifecycle management
- Real-time data ingestion, processing, and model deployment on Azure and GCP
- Cloud-native services, data lakes, and AI tooling
- Kubernetes and container-based deployment
- LLMs and Generative AI tooling
- Strong collaboration with cross-functional teams
Education
- Bachelor’s or Master’s degree in computer science, Data Science, Machine Learning, or related field