Role Summary
As Director, Disaster Recovery & Incident Management, you’ll manage the enterprise framework of Incident, Change and Crisis Management for preventing, responding to, and recovering from technology and operations disruptions at scale as part of the Disaster Recovery Program Management. You’ll lead the major incident program (24×7), facilitate recovery objectives (RTO/RPO) for critical services, orchestrate cross functional crisis response, and run an evidence based exercise program so the company can withstand outages to include but not limited to, infrastructure or facilities failures, data corruption, cyber events, and operational disruptions. This role requires strategic planning to ensure service excellence, operational readiness, and enterprise resilience. Additionally, managing the response efforts of cross-functional teams during high-pressure recovery activities in line with audit and regulatory compliance, prioritizing ongoing improvements to enhance operations, employee experience, and operational business continuity.
Responsibilities
- Manage the enterprise Incident Management and IT Disaster Recovery strategies, aligning with Business Continuity objectives and risk management goals.
- Manage the operational effectiveness of incident, change, and major incident management, coordinating triage, response, communications and rapid restoration activities.
- Establish a “single source of truth” for service impacted incidents, with live status updates that are timely, accurate, clear and concise.
- Drive robust root cause analysis (RCA) reviews approaching with a blameless analysis, action tracking, and trend reporting that provides indicators for Problem Management and engineering continuous improvement roadmaps.
- Conduct routine disaster recovery drills, tabletop exercises, and post-incident reviews, identifying areas for improvement, and updating plans accordingly.
- Collaborate with infrastructure, security, software development and operations teams to ensure the resilience of critical systems and applications.
- Develop and maintain a disaster recovery framework that includes detailed runbooks, recovery time objectives (RTOs), and recovery point objectives (RPOs).
- Ensure compliance with relevant audit, information security and regulatory standards, such as SOCII, FDA, ISO and SOX, related to data protection and disaster recovery.
- Provide regular updates to leadership team on disaster recovery readiness, incident outcomes, and continuous improvement efforts.
- Lead cross-functional teams during disaster recovery and incident management events, ensuring clear communication and well-coordinated response actions.
- Prepare project charters, identify stakeholders, and plan, execute, and monitor all recovery projects from inception to closure.
- Continuously evaluate technology trends and emerging threats to ensure disaster recovery plans remain effective and up to date.
- Identify systemic risks and single points of failure, recommending strategic mitigation plans to leadership team.
- Champion continuous improvement initiatives, leveraging lessons learned and benchmarking to enhance resiliency posture.
- Mentor and guide junior staff, fostering skill development and succession planning within the disaster recovery function.
- Maintain a steady pulse of ongoing changes throughout the infrastructure, systems and services within the environment capturing resiliency requirements on intake as to design response and recovery plans in collaboration with systems engineering, architects and operations teams.
Qualifications
- Required: Bachelor’s degree in Business Management, Information Systems Management, or related field.
- Required: Experience in IT systems engineering, infrastructure, or cloud architecture, ideally with hands-on recovery implementations.
- Required: Strong technical understanding of infrastructure and cloud recovery, with at least 5 years of DR experience designing and testing DR strategies and plans, with the ability to communicate this knowledge effectively to both technical and business audiences.
- Required: Experience managing complex projects across multiple technology disciplines and business units performing requirements gathering to meet recovery goals.
- Required: Creating and defining new operational models and procedures and explaining complex problems or situations.
- Required: Ability to work under pressure and within constrained parameters.
- Required: Comfortable bridging technical recovery strategies with business impact considerations.
- Required: Experience with governance frameworks, risk management, and regulatory compliance (e.g., GDPR, SOX, HIPAA, ISO, NIST, ITIL).
- Required: Critical thinking skills include strategic planning, communication, leadership, problem-solving, and change management.
- Required: Proven ability to work independently with efficiency, agility, and momentum.
- Required: Excellent problem solving, critical thinking, and analytical skills with the ability to communicate concepts to technical and non-technical audiences.
- Required: Effective written and oral communication skills, including experience developing and delivering after status reports, KPI’s and after actions review reports.
- Required: Strong organizational, multi-tasking, and project management skills.
- Required: Ability to work nights and weekends during crisis events.
- Preferred: Certification in one or more business continuity, risk, or resilience disciplines.
- Preferred: Experience working in a Healthcare or Life Sciences environment.
- Preferred: Forward thinking skills with enterprise mindset, business and customer focus with a strong sense of ownership, growth and adaptability.
- Preferred: Certified Business Continuity Manager (CBCM), Certified Disaster Recovery Engineer (CDRE), Business Continuity and Resiliency Professional (BCRP), Disaster Recovery Certified Specialist (DRCS), or Disaster Recovery Certified Expert (DRCE).
- Preferred: ITIL V4 certifications.
Education
- Bachelor’s degree in Business Management, Information Systems Management, or related field
Additional Requirements
- Physical Demands: Must possess the ability to sit, stand, and/or work at a computer for long periods of time. Ability to lift items up to 35 lbs. is required when installing some IT equipment if necessary. The IT DR Specialist must be prepared to be on call at any time and will also be periodically expected to work off-hours to support disaster recovery test activities.
- Other: Job may require after-hours response to emergency issues. Job may require travel to other sites. Other duties as assigned.