AI Incidents
news

An Incident-Response Playbook for AI Systems

Generic IR runbooks assume the failing component is a server you can patch. AI incidents add a model whose behavior you can't fully explain. A playbook mapped to NIST SP 800-61r3, NIST AI 600-1, MITRE ATLAS, and the OWASP LLM Top 10.

By Theo Voss · · 8 min read

Our beat is the public record of AI failures, and a recurring pattern in that record is that the response was improvised. The organization had an incident-response process built for servers, networks, and credentials — and then the failing component was a model whose behavior nobody could fully explain, producing outputs nobody could fully predict, through an attack path the runbook never anticipated. This post is the playbook we wish more of the incidents we catalog had followed. It is built on published frameworks, not invented, and it is explicit about the steps a standard IR runbook leaves out.

Why a generic IR runbook is necessary but not sufficient

The standard incident-response lifecycle did not stop being correct because AI arrived. NIST SP 800-61 Revision 3, “Incident Response Recommendations and Considerations for Cybersecurity Risk Management” (April 2025), restructures incident response around the Cybersecurity Framework 2.0 functions — Govern, Identify, Protect, Detect, Respond, Recover — and that structure holds for AI systems. Preparation, detection, containment, eradication, recovery, and post-incident review are all still the phases.

What changes is what each phase has to account for. Three properties of AI systems break assumptions a generic runbook makes:

  1. The failure is often behavioral, not a breach. A model that begins producing harmful, biased, or hallucinated outputs after a deployment change is an incident with real-world impact, but there is no intrusion, no malware, no exfiltrated credential to find. The standard “contain the host, rotate the keys” reflex addresses none of it.
  2. The attack surface includes the model’s inputs at inference time. A poisoned document in a retrieval corpus, a prompt-injection payload in a web page the agent browses, or a crafted image — these are attack vectors a network-centric runbook does not enumerate.
  3. Root cause may not be fully knowable. With a deterministic service you can usually reconstruct exactly what happened. With a model you may only be able to characterize the failure statistically. The investigation has to tolerate that and respond anyway.

So the move is not to throw out the IR runbook. It is to extend it with AI-specific detection sources, containment levers, and evidence types — and to know which published frameworks supply each.

The frameworks, and what each one is for

Four documents do the heavy lifting. Mixing them up produces gaps; using each for its purpose produces coverage.

  • NIST SP 800-61r3 — the process spine. Use it for the lifecycle and for integrating IR into governance. It is framework, not AI-specific, which is the point: it tells you the phases, not the AI details.
  • NIST AI 600-1, the Generative AI Profile of the AI Risk Management Framework — the risk vocabulary. It enumerates GenAI-specific risks (CBRN information, confabulation, dangerous or violent recommendations, data privacy, harmful bias, and others) so your detection and impact assessment use a recognized taxonomy rather than ad-hoc labels.
  • MITRE ATLAS — the adversary model. ATLAS is a living knowledge base of adversary tactics and techniques against AI systems, structured like MITRE ATT&CK and backed by real-world case studies. Use it to map an observed attack to a named technique and to anticipate the attacker’s next move. Recent expansions extend ATLAS coverage toward agentic-AI and large-language-model threats.
  • OWASP Top 10 for LLM Applications 2025 — the application-layer checklist. Use it during triage to classify what kind of LLM-application failure you’re looking at (prompt injection, sensitive information disclosure, improper output handling, excessive agency, system-prompt leakage).

The discipline: 800-61r3 tells you what phase you’re in, AI 600-1 and OWASP tell you what kind of thing went wrong, ATLAS tells you how an adversary did it. An incident write-up that names all three is one a reader can verify and cross-reference.

The playbook

Phase 0 — Prepare (before anything happens)

The work that determines whether the response is competent is done before the incident.

  • Inventory the AI systems and, for each, record the model and version, the data sources it reads at inference time (RAG corpora, tool outputs, browsing), what tools/actions it can invoke, and what a “bad output” means for this application. You cannot detect drift from a baseline you never recorded.
  • Define AI-specific severity. A generic SEV scale keyed to availability and data loss misses behavioral harm. Add criteria for output-driven impact: did the model give harmful advice, leak data, take an unauthorized action, discriminate?
  • Capture the right telemetry. Log prompts, retrieved context, tool calls, and outputs with enough fidelity to reconstruct an interaction — subject to the privacy constraints in AI 600-1. You cannot investigate an interaction you didn’t log.
  • Establish a model-rollback path. Know how to revert to a previous model version or disable a feature in minutes, not hours.

Phase 1 — Detect and report

AI-incident signals come from sources a SOC doesn’t usually watch: a spike in guardrail trigger rates, a shift in output distribution, user reports of wrong or harmful answers, an anomaly in tool-call patterns, or external disclosure (a researcher, a journalist, a regulator). Treat all of these as detection sources and route them to the same intake.

The reporting step is where our methodology starts to apply to you: record the four dates we use for every catalog entry — when the harmful behavior occurred, when it was first acknowledged internally, when public reporting (if any) appeared, and when you confirmed it. Conflating “when we noticed” with “when it started” produces a misleading timeline, and a misleading internal timeline produces a misleading disclosure.

Phase 2 — Triage and classify

Before containment, classify the event, because the class drives the response:

  • Is anyone actually harmed yet? If yes, it’s an incident; if not, it’s a vulnerability and the urgency curve is different. (This is the same distinction we hold across the taxonomy.)
  • Was the model working as designed? If a working model was used as a tool for harm, that’s misuse, and “better alignment” will not fix it.
  • Which OWASP LLM risk and which ATLAS technique fit? Name them. This is what makes the incident cross-referenceable.

Phase 3 — Contain

AI containment levers are not the same as host containment:

  • Roll back the model or disable the feature. Often the fastest containment for a behavioral failure.
  • Tighten or trip the guardrails. Raise thresholds, switch a guardrail from monitor to block, narrow the model’s allowed scope.
  • Cut the contaminated input path. If the cause is a poisoned RAG document or a compromised tool, isolate that source — for indirect prompt injection, the contaminated source is the thing to contain, not the user.
  • Revoke the model’s agency. For an agentic system taking unauthorized actions, pull the tool permissions at the boundary. This is faster and more reliable than trying to prompt the model out of the behavior.

A standard runbook’s containment (isolate host, rotate credentials) still applies when the incident is a classic breach of the serving infrastructure. The point is to have both sets of levers and to pick by failure class.

Phase 4 — Eradicate and recover

Eradication for AI failures means removing the cause, which may be a poisoned data source, a regressed model version, a misconfigured guardrail, or an over-broad tool grant. Recovery means restoring service with the cause removed and the baseline re-validated. Re-run your adversarial test suite and your evaluation set before declaring recovery — a model that passed before the change should pass again, and the specific failure that triggered the incident should now be caught.

Phase 5 — Post-incident review

This is where most of the value, and most of the public record, lives. A good AI post-mortem records the four dates, the classification (incident/vulnerability/misuse), the OWASP and ATLAS mappings, the detection-to-containment gap, the root cause to the degree it is knowable — and explicitly, where it is not knowable, says so rather than inventing a clean narrative. It records what changed in the process so the same gap doesn’t recur. And it issues corrections visibly if earlier internal framing turns out to be wrong; silent edits are how timelines get rewritten.

The unglamorous conclusion

There is no novel AI-IR framework you need to buy. The process spine is NIST SP 800-61r3; the AI risk vocabulary is NIST AI 600-1; the adversary model is MITRE ATLAS; the application checklist is the OWASP LLM Top 10. The playbook is the standard lifecycle with AI-specific detection sources, containment levers, and evidence types bolted into each phase — plus the dating and classification discipline that turns a chaotic response into a record someone can cite later. The incidents in our catalog that were handled well were not handled with secret tools. They were handled by teams that had done Phase 0 before they needed it.

For the source-vetting that any external-facing disclosure should clear, see our source tiers; for why the post-mortem should not name an actor the evidence doesn’t support, see our attribution policy.

Sources

Sources

  1. NIST SP 800-61r3 — Incident Response Recommendations (CSF 2.0 Community Profile)
  2. NIST AI 600-1 — Artificial Intelligence Risk Management Framework: Generative AI Profile
  3. MITRE ATLAS — Adversarial Threat Landscape for AI Systems
  4. OWASP Top 10 for LLM Applications 2025
Subscribe

AI Incidents — in your inbox

AI incidents, model failures, and adversarial-use cases — dated and sourced. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments