Info Tech Incident Post-Mortem Theater

Thanks for following along through our six-part AI Cost Ripples series. In this article, we’re switching gears to something different.

In my TPM career, I’ve attended countless post-mortem meetings and exercises. Each one has been a learning experience, almost like a living environment with constantly changing variables. Yet, within that dynamic setting, certain rules remain surprisingly rigid. I’ve outlined some of these, along with details and examples, below.

“Incident Post-Mortem Theater” is a creative framework used in our software industry to transform routine post-incident reviews in engaging, transparent learning sessions. It blends the structured discipline of a traditional IT post-mortem with many narrative, reflection, and human elements of a theatrical performance.

Here is how an Incident Post-Mortem Theater session can be designed:

A Play Based on a Particular Incident (The Reenactment)

Move beyond a dry timeline review. Use theatrical framing to humanize that incident. Responders become “actors,” narrating their experience as a story.

  • Cast of Characters:
    Roles such as Incident Commander, On-Call Engineer, Customer Support, “System Itself,” or “Rogue Deploy.”
  • Screenplay Timeline:
    Transform your technical timeline into a narrative script with timestamps, actions, observations, and beliefs captured at that moment.
    Example:
  • Timestamp: 14:02 UTC
  • Character: On-Call Engineer
  • Action: Noticed a spike in error rates via Datadog dashboard.
  • Belief: Suspected a caching issue triggered by recent marketing traffic.
  • Stage Directions & Setting:
    Describe your environment, including “props” such as monitoring tools, runbooks, Slack channels, plus any contributing conditions for example “On-call engineer on bridge while fielding an unrelated page”.
  • Plot Points:
  • Lead-Up: Circumstances prior to that incident.
  • Climax: Maximum impact or discovery moment.
  • Turning Point: Action that drove mitigation.
  • Final Resolution: Service restored.

A Blameless Review (The Critique)

Following this “performance,” your larger project team becomes the audience in a blameless critique focused on systems, not individuals.

  • Set the Stage for Safety:
    A facilitator (your “Director”) reinforces psychological safety and no-blame culture.
  • Blameless Discussion Prompts:
  • What went well?
  • Where did we get lucky?
  • What could have been more successful?
  • How did we not detect this sooner?
  • Why did our process allow this? (Use “5 Whys” or similar techniques.)

Actionable Outcomes (Your “Next Season” Plan)

Insights must translate into clear improvements.

  • Call Sheet for Action Items:
    Create concise, prioritized, trackable tasks with owners and deadlines.
    Example:
  • Action Item XYZ: Update runbook for API latency issues
  • Owner: @engineer_name
  • Due: Nov 30
  • Documentation & Sharing:
    Publish a final post-mortem document/report, almost like a playbill, summarizing your story, timeline, root causes, and planned improvements. Circulate it widely to drive organizational learning.
  • Awards Ceremony:
    Celebrate responders and highlight positive contributions for cultural reinforcement.

Using this framework turns a necessary yet often tedious process into a more engaging, insightful, and effective learning experience.

Don’t forget to subscribe to get notifications for new articles 

Leave a Comment

Your email address will not be published. Required fields are marked *