Role Summary
We are seeking a multifaceted and innovative Senior Manager, Process & AI Automation, Application Reliability to spearhead efforts in enhancing application reliability through observability, AI, and automation. The role designs and implements strategies to optimize operational efficiency, predictability, and customer satisfaction, and envisions integrating AI-driven insights into processes to drive intelligent automation. Located in Raritan, NJ, this role collaborates with Application Reliability service managers, ITSM process owners, process users, and platform engineers to uncover automation opportunities and deliver scalable, automated solutions.
Responsibilities
- Own the development and execution of strategies using AI, observability tools, and automation to improve application reliability and support functions.
- Deploy self-healing, auto-remediation, and automated incident response mechanisms.
- Define and track critical metrics for automation impact and reliability improvements. Report automation value and reliability metrics to senior leadership. Collaborate with multi-functional teams to identify automation opportunities and implement scalable solutions to streamline incident management, troubleshooting, and system monitoring.
- Drive the adoption of observability platforms and analytics to proactively determine and resolve potential issues before impacting end-users.
- Develop innovative methodologies using AI/ML models for predictive maintenance, anomaly detection, and root cause analysis.
- Establish guidelines for monitoring, alerting, and incident response workflows grounded in automation and AI insights. Design and document automated workflows, exception handling, and escalation paths.
- Lead the evaluation, recommendation, and integration of emerging technologies to enhance operational capabilities.
- Promote a data-driven culture with continuous improvement initiatives based on insights derived from observability and AI tools.
Qualifications
- Required: Bachelor's Degree
- Required: Solid understanding of core ITSM processes: Incident, Problem, Change, and Request Management
- Required: Expertise in mapping and re-engineering processes and user journeys for automation suitability, including experience with process mining tools or BPMN
- Technical Expertise - Required: Demonstrable experience with application support, reliability engineering, or DevOps in complex environments
- Technical Expertise - Required: Hands-on experience crafting and building automations in ServiceNow (Flow Designer, Business Rules, Workflows, etc.) or other workflow-based platforms
- Technical Expertise - Required: Advanced scripting skills (Python, PowerShell, Go) for automation and integrations
- Technical Expertise - Required: Strong knowledge of observability tools (e.g., Grafana, AppDynamics, Splunk) and AIOps platforms
- Technical Expertise - Required: Hands-on experience with AI/ML techniques and frameworks (e.g., Python, TensorFlow, scikit-learn), including deploying models in production IT environments
- Technical Expertise - Required: Familiarity with automation tools (e.g., Ansible, Puppet, Jenkins, Terraform) and cloud-native automation (AWS, Azure, GCP)
- Technical Expertise - Required: Solid SQL and data visualization skills (Power BI, Tableau) for telemetry and operational analytics
- Technical Expertise - Required: Experience with system integrations across diverse platforms
- Strategic & Analytical Thinking - Required: Ability to design and articulate strategic plans for integrating AI and observability into operational workflows
- Strategic & Analytical Thinking - Required: Strong analytical skills with the ability to interpret large data sets and derive actionable insights
- Leadership & Collaboration - Required: Demonstrable experience in multi-functional teams or projects with a focus on digital transformation
- Leadership & Collaboration - Required: Ability to collaborate multi-functionally and translate partner needs into practical automation designs
- Leadership & Collaboration - Required: Excellent communication skills to translate technical concepts into business value
- Preferred Skills - Preferred: ServiceNow certifications
- Preferred Skills - Preferred: ITIL 3/4 certification
- Preferred Skills - Preferred: IT Operations experience (application / infrastructure)
Skills
- Observability and AI/ML-driven optimization
- Automation design and implementation across platforms
- ServiceNow and other workflow-based platforms
- Data analysis, telemetry, and operational analytics
- Cross-functional collaboration and leadership in digital transformation
Education