What is ITIL Incident Management

Businesses aim for uninterrupted services to achieve higher efficiency and productivity. ITIL Incident Management is the first process to be adopted by most organizations for speedy service recovery. This belongs to the service operation of the ITIL service lifecycle. Incident Management acts as a single point of contact for end users to report any issues. An incident is defined as an interruption or disturbance to normal service. ITIL Incident Management performs fire-fighting most of the times to ensure the working state of service. This is one of the most significant ITIL processes to be implemented first in order to resolve users’ issues as fast as possible. Service desk implementation follows the Adopt and Adapt approach by adopting the best practices of ITIL Incident Management process and then adapting based on organizational needs. Successful Incident Management results in improved efficiency and higher productivity.

An IT Service Desk is the single point of contact between the IT team and end users. ITIL service operation covers Incident management techniques whose primary objective is to ensure seamless business operations with minimal or no interruption. Competent Incident management process reduces the communication gap that exists between end users and IT. ITIL Incident management process is a set of best practices for effective incident handling and resolution. Let us look at some of the basics of Incident management.

An Incident may be due to an asset that is not working properly. Therefore, Incident management and asset management are closely tied with each other to share information. Examples of Incident include WiFi issue, application error, email service issue, laptop crash, AD error, authentication issue.

Incident Management Process flow

Incident Management process comprises of a series of steps that are to be followed for an effective Incident Management process.

Incident logging

Incident logging is the first step in Incident management to report an identified incident. This is performed either by end users themselves using any ticket source or agents raise tickets on behalf of end users. The Incident form template is used to capture details about the issue. This speeds up the recovery process by automating based on values. Relevant channels are configured to let users raise a ticket. Common channels include email, self-service, mobile app.

Incident Classification

Classification of incidents helps in proper categorization and assignment of tickets to the right agent. Category/sub-category fields are available in the Incident template to choose the associated Incident category. Configure incident form with the right set of fields and automate ticket classification, prioritization, and assignment to save time during the process. Correct classification of Incidents helps in better decision making.

Incident Prioritization

Service Level Agreement (SLA) is dependent on ticket priority to define response and resolution rate. Priority decides the due by date before which the ticket has to be resolved. Therefore, it is essential to assign the right priority to the ticket. Priority matrix gathers impact and urgency from users and then decide the ticket priority. This addresses business critical issues on time. Hence, configure a realistic SLA definition to satisfy customer commitments.

Investigation & Diagnosis

Tier I team handles low priority incidents and complex incidents are handled by Tier II and Tier III teams. Tier I team does initial analysis and investigation. If the resolution is not found out, then it is escalated to Tier II and III teams for detailed investigation. The incident is associated with the relevant CI (Configuration Item) for faster diagnosis.

Incident Resolution & Closure

Incident resolution is crucial to meet the SLA and therefore timely resolution is important for agents to achieve good performance. Efficient communication about the resolution found out is equally important for users to get back to normalcy. Closure of tickets is handled by the system automatically or through the self-service portal.

Tiered Support Vs Swarming

Service Management methodologies such as DevOps, agile, lean have introduced newer ways of working for IT teams and the overarching goal is to collaborate effectively and promote teamwork over the silo.

TIERED SUPPORT

Traditional three-tiered support model follows a hierarchy and the teams include front desk team ( Tier I), technical teams( Tier II) and application development team(Tier III). Tier-1 solves most of the tickets and they are mostly generalists whereas few tickets that require subject matter expertise are transferred to Tier-2 i.e. application management team. If it is still not resolved, then it is passed on to the development team for implementing new changes. This three-tier model is hierarchical and follows escalation procedure.

SWARMING

Swarming, on the other hand, focuses on effective collaboration among agents to resolve issues as quick as possible. This is based on knowledge sharing and its primary objective is to learn from others to arrive at a solution. There is no hierarchy followed here and it is a flat structure. Agents form a swarm to effectively collaborate and brainstorm. Swarming is helpful to handle unconventional tickets. Swarming is driven by organizational culture and it eliminates knowledge silos.

Incident Management Best Practices

Channels

Provide multi-channel support to enhance user experience and ensure to educate users about the availability of these channels. Consumerization demands businesses to be available and accessible from anywhere.

User Management

Service desk users include agents and end users. It is vital to managing user details in order to bring more context to the ticket resolver

Categorization & Prioritization

Proper classification helps in better troubleshooting and improving the resolution time. Prioritization ensures business critical issues are addressed first.

Automate wherever possible

Incident Management involves a lot of routine tickets and activities such as categorization, prioritization, and assignment. Automation improves efficiency and productivity

Communication is key

Keep your users informed about the progress of the ticket. This develops trust between IT and end users.

Integrate Asset Management & Knowledge Management

Integrating Incident Management with other ITIL processes is helpful to share useful information and eliminate knowledge silos.

Gamify your service desk

Leverage Gamification to create a good culture among service desk agents and motivate them to push harder.

Incident Management vs Problem Management

Problem is a series of incidents with an unknown root cause whereas incident is an unplanned interruption to the normal service. Incident Management is mostly reactive whereas Problem Management aims to be proactive to prevent major incidents. The primary objective of Incident Management is to restore services as fast as possible whereas Problem Management aims at finding a permanent solution. Problem Management starts when Incident management is unable to find out a solution. A problem record is created either from one or more repeating incidents or on its own. Problem Management performs Root Cause Analysis (RCA) to find out a permanent solution and maintains the Known Error Database (KEDB). Incident management shares crucial information such as incident details, user impacted, asset impacted, urgency, impact and criticality. Therefore, Incident Management acts as a prerequisite to Problem Management by sharing necessary information. Problem Management follows some of the proven techniques such as Ishikawa, 5 whys, brainstorming, Kepner Tregoe to find out the root cause.

Major Incident Management

Major Incident Management causes significant disruptions to the business and it has a huge impact on the working state of the business. A major incident is defined as the highest impact and highest urgency issue. Incident Manager takes care of the resources needed for incident resolution and Problem Manager finds out the underlying root cause of this major incident. Different SLA policy is set to handle major incidents. Proactive Problem Management identifies and prevents major incidents. Major Incident report is prepared to capture details about the major incident and users impacted.

Example - ERP application shutdown, Access control threat

Incident Manager Roles and Responsibilities

An Incident manager devises and manages the Incident management process for the organization and adopts the best practices of ITIL within the process. Incident Manager is responsible for the following tasks:

5 DON’Ts of Incident Management

Key Performance Indicators

Incident manager owns the responsibility of defining the right KPIs. This ensures business alignment and KPI reports are reviewed with the management periodically. KPIs are related to Critical Success Factors (CSF) and CSFs, in turn, are related to primary objectives. Service desk solution helps you in assessing these KPIs with advanced analytics reports. These reports are automated and used to improve the existing process and business as a whole. Types of reports include ticket trends, agent performance, CSAT, SLA reports etc.

Typical Incident Management metrics include:

  • Incidents volume (per issue category, priority, status, requester, etc.)
  • Average resolution time
  • Average response time
  • SLA %
  • Configuration Items (CIs) causing or being impacted by Incidents
  • Incidents resolved without escalation
  • Average cost per Incident
  • Incident reopen rate
  • First call resolution, FCR

Feature Checklist for Incident Management solution

Service desk vendors provide a bunch of features relevant to Incident Management. However, the following list includes the minimum features required for Incident Management.

  • Create Incidents through multiple channels such as email, portal, mobile, chat etc.
  • Incident creation by agents on behalf of the end user
  • Configure Incident form template and save for future reference
  • Automate capabilities for Incident categorization, prioritization and assignment
  • Ticket reassignment to another agent
  • SLA timer to track resolution time
  • Ability to stop the SLA clock and override
  • Bulk actions on Incident such as bulk reply, bulk status change etc.
  • Adding custom status to Incidents
  • Ability to associate the relevant CIs to the Incident
  • Integrated Knowledge Management system to add a relevant solution article
  • Define priority matrix to capture impact and urgency
  • Automated email notifications for ticket, reply, status change etc.
  • Escalation matrix to handle SLA violations
  • Collaborate with peer agents within an Incident
  • Forward a ticket to a third party entity
  • Audit logs to understand ticket history
  • Parent-child relationship to link related tickets
  • Merge option for Incident duplication
  • Configure Customer Satisfaction (CSAT) survey to understand user satisfaction level
  • Reporting mechanism for Incidents
  • Associate an Incident to a Problem or a Change record directly
  • Ability to integrate with third-party apps such as event monitoring tools etc.

 

 

Incident Management Benefits

Incident management system delivers following business benefits

Other ITSM Resources