Role Summary
Location: Bengaluru, Durham Blackwell Street, Luxor North Tower.
We’re seeking a highly skilled and hands-on AI Solution Architect to drive the development of a cutting-edge AIOps platform. This role designs and implements AI-driven solutions that transform IT operations—enhancing automation, observability, and incident response across complex enterprise environments. As a key member of the architecture team, you will collaborate with data scientists, engineers, and IT stakeholders to architect AI-driven solutions that proactively detect, diagnose, and resolve operational issues across complex infrastructure and application landscapes.
Responsibilities
- Architect and lead the development of an enterprise-grade AIOps platform leveraging machine learning, deep learning, and advanced analytics.
- Design scalable AI pipelines for anomaly detection, predictive analytics, root cause analysis, and intelligent alerting.
- Collaborate cross-functionally with engineering, DevOps, and IT teams to integrate AI capabilities into existing operational workflows and tools.
- Evaluate and select appropriate technologies, frameworks, and cloud-native services in Azure and GCP to support real-time data ingestion, processing, and model deployment.
- Ensure platform reliability and performance, with a focus on scalability, security, and maintainability.
- Mentor and guide engineering teams on best practices in AI architecture and model lifecycle management.
- Stay current with emerging trends in AIOps, MLOps, and IT automation to continuously evolve the platform.
Qualifications
- Required: Experience as a Solution Architect or AI/ML Architect in enterprise environments.
- Required: 3+ years of hands-on experience building AIOps platforms and solutions in IT environments.
- Required: Strong understanding of IT Operations, infrastructure monitoring, incident management, and observability tools.
- Required: Hands-on experience with AI/ML frameworks (e.g., TensorFlow, PyTorch, Scikit-learn) and data engineering tools (e.g., Spark, Kafka, Airflow) and knowledge graphs.
- Required: Proficient in Python and familiar with key AI/ML libraries (e.g., TensorFlow, PyTorch, HuggingFace).
- Required: Strong familiarity with cloud platforms such as Azure and Google Cloud Platform (GCP), including their native Data and AI/ML toolsets and services like ADLS, Azure Machine Learning, Azure Foundry, GCS and GCP Vertex AI. Experience with container orchestration technologies like Kubernetes, and hands-on expertise in working with Large Language Models (LLMs) and Generative AI tools.
- Required: Excellent communication and stakeholder management skills.
- Required: Have good mentoring skills to guide team members.
Skills
- AI/ML frameworks: TensorFlow, PyTorch, Scikit-learn
- Data engineering tools: Spark, Kafka, Airflow; knowledge graphs
- Programming: Python; AI/ML libraries such as HuggingFace
- Cloud and tools: Azure, GCP, ADLS, Azure Machine Learning, Azure Foundry, GCS, GCP Vertex AI; Kubernetes; Large Language Models (LLMs) and Generative AI tools
- IT Operations and observability: infrastructure monitoring, incident management
- Communication and stakeholder management; mentoring
Education
- Bachelor’s or master’s degree in computer science, Data Science, Machine Learning, or related field