Incident Management - Major Incident / Problem

Overview

Large-scale incidents must be qualified and fully addressed as they can severely impact the organization's ability to function, especially with ever-increasingly complex systems. 

Using Major Incident tickets will create a more consistent process for how large-scale incidents are handled and addressed to returning functionality and to document the in-the-moment work around or fix.

Once the Major Incident is resolved, an administrator or manager will create a Problem ticket and set it as the parent of the previous Major Incident. The focus of a Problem ticket is to identify the root cause of the issue and determine what is required to ensure it does not happen again. Information about why the decision was made will also be documented in the Problem ticket. 

This approach of addressing large-scale incidents will ensure they are dealt with properly and thoroughly, increasing system robustness, and reducing overall technical dept. 

Major Incident - Ticket Creation / Life Cycle

A major incident can be categorized as either an issue that can severely affect many people or an issue that only affects a few people with the potential devastating results. When this is the case, a Major Incident ticket is to be filled out. 

Examples of Major incident tickets would be:

  • UWinsite Student / BrightSpace / <any major system> is down unexpectedly
  • Large-scale security breach

Once a Major Incident ticket is created, the workflow "Major Incident Confirmation and Notification" is automatically applied to the ticket. The Team Lead of Client Services will qualify the issue presented to the client and administrator to determine whether it is:

  • a major incident
  • an issue with only the client
  • not an issue and working as intended

Note: There is also an option in the workflow for the administrator to confirm that there is in fact a major incident to help expedite the process. 

If it is determined to be a major incident, the next step of the workflow has the Team Lead reassign the ticket and decide which level of notification should go out: 

  • IT Services only (this will generate an automated message with the Ticket ID and description)
  • IT Services + Department Technicians (this will generate an automated message with the Ticket ID and description)
  • All campus (The communications coordinator in ITS is brought in for consultation)
  • No notification

While the level of notification is figured out, the administrator can start working on a resolution or hot fix to the issue. A task is generated by the workflow to itemize this step and apply responsibility. The intent is that this quick, perhaps temporary, resolution is documented in the major incident ticket. 

Any tickets that come in as a result of the major incident should be set a children of the major incident ticket. When completing the ticket, the cascade check box should be checked to complete all child tickets and let them know of the resolution. 

When the notification and resolution tasks are both completed, an automatic notification goes to the manager of the responsible person/group letting them know a Major Incident ticket has been completed and now a problem ticket should be created to follow up. 

Problem - Ticket Creation / Life Cycle

After the Major Incident ticket is completed, the Problem ticket is created. The purpose of the Problem ticket is to determine the root cause. After the problem ticket is created, the Major Incident ticket that generated the Problem ticket in question is to be set as a child ticket. This will ensure both are linked. 

The goal is to answer why and how this major incident happened and to document the steps (or the decided and the approved lack of steps) taken toward the issue resolution. 

Details

Article ID: 151294
Created
Tue 5/30/23 3:10 PM
Modified
Thu 8/3/23 11:49 AM