Incident & Request Manager - Non-Production Environments (Onsite) Job at Sumeru Solutions, Atlanta, GA

TElhY2JpOHV4aHFnR3hIWDdpYWlrZjZ1RXc9PQ==
  • Sumeru Solutions
  • Atlanta, GA

Job Description

Job Title: Incident & Request Manager

Location: Atlanta GA or Bellevue WA Locals Only

Role Overview

  • The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev, QA, UAT, Performance).
  • Acting as the escalation point for project/product delivery teams, this role ensures incidents are resolved quickly, requests are fulfilled efficiently, and learnings are embedded into continuous improvement.
  • The Incident Manager directly manages a team of Incident Analysts and SREs, partners with DevOps teams to automate detection and response, and works closely with Environment and Change Managers to reduce recurrence of issues.

Key Responsibilities Incident Management

  • Own the incident lifecycle: detection, triage, response, resolution, and closure.
  • Act as the primary escalation point for project/product delivery teams during NPE incidents.
  • Lead war rooms for critical incidents, coordinating with technical and delivery stakeholders.
  • Ensure timely escalation to Environment, Change, DevOps, Infra, and Security teams when required.
  • Track and improve incident SLAs (MTTR, MTTD, availability SLOs). Request Management
  • Own request fulfilment for project/product delivery teams (e.g., access, entitlements, environment service requests).
  • Standardize and automate common request types in collaboration with Intake and DevOps teams.
  • Ensure requests are logged, prioritized, and fulfilled within SLA.
  • Provide transparency to stakeholders on request status. Team Leadership
  • Manage and mentor Incident Analysts and SREs.
  • Ensure follow-the-sun coverage via offshore/onshore teams.
  • Build a culture of blameless incident management, automation-first practices, and continuous learning. Governance & RCA
  • Ensure all incidents have documented Root Cause Analysis (RCA).
  • Track corrective and preventive actions, and feed them into Change and Environment management processes.
  • Provide trend reporting and insights to leadership. SRE & DevOps Alignment
  • Work with SREs and DevOps teams to automate incident detection, rollback, and recovery.
  • Integrate observability tools (Splunk, Prometheus, Grafana) into proactive monitoring. Stakeholder Communication
  • Provide timely updates during incidents and delays in request fulfilment.
  • Publish regular reports on incident trends, RCA outcomes, and SLA adherence.
  • Maintain trust with project/product delivery teams by ensuring transparent communication.

Required Skills & Experience

  • 8-10 years in Incident Management, Service Operations, or SRE leadership.
  • Experience managing Incident Analysts and SRE teams.
  • Strong knowledge of AWS, Kubernetes, CI/CD pipelines, and observability tools (Splunk, Prometheus, Grafana).
  • Deep understanding of ITIL Incident, Problem, and Request Management processes.
  • Excellent crisis management, communication, and stakeholder engagement skills

Job Tags

Local area,

Similar Jobs

LIT & Aesthetics

Esthetician Job at LIT & Aesthetics

 ...conversing with clients during sessions on topics relating to their interests. Maintaining product knowledge to promote and sell spa and salon services, and retail products. Adhering to esthetics policies pertaining to chemical usage, and cleaning, sanitizing, and... 

Traveling with Tasha

Remote Travel Agent Job at Traveling with Tasha

Location: Remote Job Type: Full-time/Part-time (Flexible Schedule) Job Overview: Are you passionate about travel and helping others create unforgettable experiences? We are seeking...  ...to join our team as Remote Travel Agents. Whether you're an experienced travel... 

Avanade Inc.

Sistemista Microsoft Job at Avanade Inc.

 ...composto da esperti riconosciuti a livello globale. Contribuisci ad aiutare i nostri clienti nella gestione, ottimizzazione ed evoluzione delle loro infrastrutture on-premises, garantendo continuit operativa, sicurezza e performance. Il nostro intervento mira a modernizzare... 

Synchrony

SVP, Operational Risk & Model Risk Management Job at Synchrony

Job Description:**Role Summary/Purpose:**The SVP, Operational Risk Management (ORM) and Model Risk Management (MRM) will be responsible for overseeing and enhancing our operational risk management and MRM programs at Synchrony. This role plays a key part in identifying... 

Capital Vacations

Activations Agent-weekly pay, $1000 sign on bonus, full time- Capital Vacations Job at Capital Vacations

Join the Fun with the Fastest Growing Timeshare Company in the Country!Ready to Launch Your Career with a $1,000 Sign-On BonusMultiple Shifts availableAre you ready for a job thats as exciting as it is rewarding? Were on the lookout for energetic individuals to...