
DevOps Incident Response Playbook
During incident Roles: incident commander, comms lead, ops/feature SMEs, scribe. Declare severity quickly; open shared channel/bridge; timestamp actions. Stabilize first: roll back, feature-flag off, scale up, or shed load. Runbooks & tooling Prebuilt runbooks per service: restart/rollback steps, dashboards, logs, feature flags. One-click access to dashboards (metrics, traces, logs), recent deploys, and toggles. Paging rules with escalation; avoid noisy alerts. Comms Single source of truth: incident doc; external status page if needed. Regular updates with impact, scope, mitigation, ETA. After incident Blameless postmortem; timeline, root causes, contributing factors. Action items with owners/deadlines; track to completion. Add tests/alerts/runbook updates; reduce time-to-detect and time-to-recover.