Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Objective

Help organizations monitor, coordinate, and measure their incident response process so that it can be more transparent and effective. Focusing specifically on the following areas in order of priority:

  • Information security

  • Site reliability

  • Build pipeline

  • Urgent support

Target release

Document status

Status
title10% Draft

Document owner

Ian Tao (Unlicensed)

Designer

Asaad Mahmood

Tech lead

Jesse Hallam

Technical writers

N/A

QA

Prapti Shrestha (Deactivated)

Release Process

  • v1.0.0 will be the first public launch

  • v0.x will be feature releases

  • v0.0.x will be quality releases

  • We aim for bi-weekly cadence for feature releases

Recent releases

v0.1.x

  • Value: As a user, I can facilitate incidents within Mattermost so that there is a clear start/end time and channel for discussion.

  • Target date: April 3, 2020

  • Installation: Download v0.1.0 and install via System Console

  • Demo:

    Apr-07-2020 14-18-50.mp4

Upcoming

v0.2.x

  • Value: As a user, I can stay aligned with my team by sharing a checklist during an incident so that it’s clear what to do.

  • Target date: April 17, 2020

v0.3.x

  • Value: As a user, I can remind my teammates of the standardized procedure by predefining checklist items so that they don’t forget a step.

  • Target date: May 1, 2020

v0.4.x

  • Value: As a user, I can access any past incident from a central place so that my team can use that information to make postmortem easier.

  • Target date: May 15, 2020

User stories

Version

Monitor

Coordinate

Measure

v0.1.0

As a user, I can facilitate incidents within Mattermost so that there is a clear start/end time and channel for discussion.

  • As a user, I can view a list of active incidents within my team in the RHS so that I get an overview of what’s currently happening.

  • As a user, I can select an active incident in the RHS to view its detail so that it’s accessible from anywhere.

  • As a user, I can see and click on the channel associated to an incident in the RHS so that I can quickly navigate to it.

  • As a user, I can view the commander of all incidents in the RHS without opening it so that I know who’s the point person.

  • As a user, I can start an incident within my team with a slash command, RHS, or post action so that response can start quickly.

  • As the user that starts an incident, I default as the commander so that ownership is clear.

  • As the Incident plugin, a new channel is automatically created when an incident starts so that there is a place to log activities.

  • As a user, I can choose the name of the new channel so that it’s easily recognized

  • As the Incident plugin, a message is posted to the channel when the incident starts and ends so there is a timestamp record of who made the change.

v0.2.0

As a user, I can stay aligned with my team by sharing a checklist during an incident so that it’s clear what to do.

  • As a user, I can see the checklist for each active incidents in the RHS it’s clear what’s been done and what needs to be done.

  • As a member of the incident channel, I can add items to the end of the incident checklist in the RHS so that the team can adapt to the situation.

  • As a member of the incident channel, I can remove items from the incident checklist in the RHS so that the team can adapt to the situation.

  • As a member of the incident channel, I can check off items for an incident in the RHS so that it accurate represents the team’s progress.

  • As the Incident plugin, a message is posted to the incident channel when a checklist item has been added/removed so that there is a timestamp record of who made the change.

  • As the Incident plugin, a message is posted to the incident channel when a checklist item has been checked/unchecked so that there is a timestamp record of who made the change.

v0.3.0

As a user, I can remind my teammates of the standardized procedure by predefining checklist items so that they don’t forget a step.

  • .

  • As a user, I can configure the incident playbook with a checklist so that I save time setting up each future incidents.

  • As a member of the incident channel, I can change the incident commander from the RHS so that it remains accurate.

v0.4.0

As a user, I can refer to any past incident by reviewing them in a central place so that my team can use that information to make postmortem easier.

  • As a user of a team, I can see a list of all past and current incidents within the team so that the information is not lost.

  • As the incident plugin, the channel is automatically archived when the incident ends so that it reduces clutter.

  • As a user, I can export channel transcript from the incident detail page so that it can be saved for record

  • As the incident commander, I receive an incident summary and a link to more info after ending an incident so that I know where to go for post-mortem.

  • As a user, I can see a summary of the following on incident detail page so that I can get the gist at a glance:

    • Channel, start time, duration, commander, number of people in the channel, number of messages posted in channel

  • As a user, I can review when each checklist item was completed so that I can identify the steps that are taking the most time.

v0.5.0

Welcome message

  • As a person that’s just been added to an incident channel, I receive a templated ephemeral message so that I have the resources to help me start contributing.

  • As a user, I can configure the playbook to send a specific message whenever someone is added to the incident channel so that I can include instructions and links to resources.

  • As a user, I can review when each person was added to the incident channel so that I can understand the human resource that was invested.

v0.6.0

Metadata

v0.7.0

Aggregate reporting

v0.8.0

Multiple playbooks

User interaction and design

Key Learnings

  • Most customers do not expect nor even want automation to start, but rather prompts that gets their teams to consistently execute a procedure.

Open Questions

Question

Answer

Date Answered

Out of Scope