Está en la página 1de 14

Incident Management

In This Lesson

 Purpose and Objective


 Scope of Incident Management
 Value to the Business
 Activities of Incident Management
 Triggers to Incident Management
 Interfaces to Other processes
 Inputs and Outputs
 Critical Success Factors and Key Performance Indicators (KPIs)
 Challenges and Risks

Definition of an Incident

Incident
An unplanned interruption to an
IT service, reduction in quality of
an IT service, or a failure of a
configuraiton item that has not
yet impacted IT services
Definition of an Incident

Incident management is the process responsible for managing the


lifecycle of all incidents

Examples of incidents
– Failure of the payroll service
– A database lock that prevents data use by an application
– A failure of a disk in a mirrored set that has not yet impacted
services

Purpose of Incident Management

Restore normal service operation


as quickly as possible
Purpose

Minimize adverse impact on


business operations

Purpose of Incident Management

Normal Service Operation


A state where services and CIs are
performing within their agreed
service and operational levels.
Objectives of Incident Management

Ensure standard methods and procedures are used for the


– Efficient and prompt response
– Analysis
– Documentation
– Ongoing management
– Reporting of incidents

Increase visibility and communication of incidents to business and


support staff

Enhance business perception of IT through use of a professional


approach in quickly resolving and communicating incidents

Align IM priorities with those of the business

Maintain user satisfaction with the quality of IT service

Scope of Incident Management

Any event with disrupts or could disrupt service

All events are not incidents

Incidents are not the same as service requests, although both


may be reported to the servicedesk

SLAs and OLAs are important to the incident process to determine


if a service has been disrupted and assist in distinguishing
between events, incidents, and service requests

Value to the Business

Reduction in unplanned labor and cost for both the business and
IT support staff caused by incidents

Lower downtime to the business (which means higher levels of


availability)

Alignment of IT activity to real time business priority


– Enables the capability to capture business priority and dynamically
allocate resources as necessary

The ability to identify potential improvements to services

The service desk can identify additional service or training


requirements found in IT or the business
Activities of Incident Management

Escalation
Resolution
Identification • Functional
identification
• Hierarchical

Initial Resolution/
Logging
diagnosis recovery

Incident
Categorization Prioritization
closure

Event
Web Interface Phone Call Email
Management

Activities of Incident Incident

Management
Identificat ion

To Request
Is Incident? or SPM if
No Change
Proposal
Yes

Incident

Identification Logging

Logging
Incident
Categorization

Incident

Categorization
Prioritization

Major
Incident Major Incident?
Yes
Procedure

Prioritization No

Incident
Diagnosis

Initial diagnosis Functional Functional Escalation


Escalation Escalation Yes Needed?
Yes

No No

Escalation Management
Escalation
Hierarchical
Escalation
Investigate and
Diagnose
No
Yes No

Resolution identification Resolution


Identified?

Yes

Resolution/ recovery Resolution and


Recovery

Incident closure Incident Closure

End

Incident Identification

It is desirable to identify incidents (and resolve them) prior to


business impact

It is usually unacceptable from a business perspective to wait


until a user is impacted and contacts the service desk to notify IT
of an incident

Monitoring of key infrastructure components is critical to the


detection of failures an potential failures
Incident Logging

All incidents
All relevant
must be logged
information
regardless of
should be
where they are
captured
raised

Incident Logging

Incident reference number ___


Categorization __
Urgency _
Impact ___
Priority _
Date/time _
Name of person reporting _
Method of notification _
Name of user

Incident Logging

Contact method for user ___


Symptoms

Incident status __
Related CI
Support group/person incident is allocated _
Related problem or known error
__
Resolution activities
_
Resolution date and time
_
Closure category
__
Closure date and time
Incident Categorization

Incident categorization coding is important so that the exact


incident type is known

This is important for trending analysis later for use in problem


management, supplier management, and other ITSM processes

Categorization is also often used as a basis for support team


identification and routing

There is no ‘one best’ categorization schema although ITIL®


provides guidance on a method to identify an effective
categorization schema

Examples of Categorization Methods

Location Impacted Application Impacted

Service Impacted Database Impacted

or
System Impacted Server Impacted

Application Impacted Disk Impacted

Incident Prioritization

Prioritizing the response to an incident is critical to determine how


the incident is handled

Priority is based on impact to the business, urgency to the


business, and other factors
– Impact is a measure of how much impact the business is
experiencing
– Urgency is how quickly (or long the business can wait) for a
resolution
– Other factors include risk, number of services affected, financial
loss, etc.
Example of a Prioritization Method

Impact

High Medium Low


High 1 2 3
Urgency Medium 2 3 4
Low 3 4 5

Target Resolution
Priority Code Description
Time
1 Critical 1 hour
2 High 8 hours
3 Medium 24 hours
4 Low 48 hours
5 Planning Planned

Initial Diagnosis

Once an incident is logged and prioritized usually the service desk


agent will attempt to diagnose the issue to fully identify the
symptoms of the incident and where possible determine
corrective action

In some cases the service desk may be able to find a resolution


within agreed time frames without additional support groups

Incident matching against a known error database as part of a


knowledge management system may enable more incidents to be
resolved by the service desk

Incident Matching

Incident matching against incident classification is useful to


identify known errors or problems associated with the incident

Effective matching ensures redundant investigation does not


occur over and over again

Escalation occurs if the service desk cannot restore service to the


user
Incident matching

Incident Matching
procedure
(part ofinitial
diagnosis)

Update incident
Extract resolution count on problem
or work around record
Match on KEDB?
action form KEDB Yes

No Yes
Update incident
record with ID of
Update incident problem
Match to
count on KE
existing
problem
record?

Update incident
No
record with
Update incident
classification data
record with ID of
KE

Routine incident?

Update incident
record with
classification data No Yes

Log new problem


Inform customer of
work around

Return toinitial
diagnosis

Incident Escalation

As soon as it is determined that the service desk is unable to


resolve the incident, or the timescales for escalation have been
exceeded (whichever comes first) the incident must be escalated

Escalation procedures must be defined in OLAs and UCs with


internal and external support teams respectively

Note: The service desk retains ownership of the incident


regardless of where the incident is referred to in its life cycle!

Escalation

Hierarchical escalation
– Escalation up the management chain (even if
only for informational purposes)
– Escalation hierarchically may be initiated by
users, customers, or support staff

Functional escalation
– Escalation to a technical specialist or support
group
– May be into the supplier community depending
upon its nature
Major Incidents

Major incidents that have a high degree of business impact and


urgency will often require their own incident model or handling
procedures
– Shorter time scales
– Require emergency changes to be applied for resolution
– May have limited testing prior to implementation (note the risk of
not testing should be understood so as to not make a bad
situation worse)

– May require invocation of the continuity plan

Examples
– A fire in the primary data center
– A security incident where confidential information is disclosed

Incident Resolution and Recovery

Once a potential resolution has been identified, the resolution


should be applied and tested

This may involve


– Having users take a prescribed set of actions
– Technical specialists performing a recovery attempt
– A supplier being asked to remediate a fault

When a recovery is found, sufficient testing must be performed to


ensure that service has been restored and the recovery action is
complete

All actions must be recorded in the incident record

Triggers to Incident Management

Incidents can be triggered by


– Users contacting the service desk via phone, email, or web
interface

– Event monitoring systems


– Technical management
– Operations management
– Application management
– Suppliers notifying of actual or potential difficulty
Interfaces to Other Processes

Service level management


– SLAs and OLAs are critical to defining target resolution times,
impact definitions, response times, etc.

Information security management


Capacity management
Availability management
Service asset configuration management
Change management
Problem management
Access management

Incident Closure

Once service has been restored,the service desk should


– Validate with the user that service has been restored
– Confirm the closure categorization for accuracy
– Carry out a user satisfaction survey
– Document any additional details to fully document the incident
– Determine with the resolving team if the root cause was identified
and if not raise a problem record

– Formally close the incident

Inputs to Incident Management

Information about CIs and their status

Information about known errors and


workarounds

Communication and feedback about incidents


and symptoms

Communication of RFCs and releases


Inputs to Incident Management

Communication of events from event


management

Operational and service level objectives

Customer feedback on incidents

Agreed criteria for prioritizing and escalating


incidents

Outputs of Incident Management

Resolved incidents and resolution actions

Updated incident records

Updated classification of incidents

Raising of problem records

Validation that incidents have not recurred for


problems resolved

Outputs of Incident Management

Feedback on incidents related to changes and


releases

Identification of CIs associated with or


impacted by incidents

Satisfaction feedback from customers

Feedback on level and quality of monitoring


technologies and event management activities

Communication about incident resolution


history to identify overall service quality
Incident Management CSFs and KPIs

Resolve incidents as quickly as possible, minimizing business impact

Mean time to restore service (MTRS)

Breakdown of incidents by activity

Percentage of incidents closed by service desk Number

and percentage of incidents resolved remotely

Number and percentage of incidents resolved without


business impact

Incident Management CSFs and KPIs

Maintain quality of IT services

Total number of incidents (as a control)

Size of current incident backlog for each IT service


Number and percentage of major incidents for each IT
service

Incident Management CSFs and KPIs

Maintain user satisfaction with IT service

Average user/customer survey score


Percentage of satisfaction surveys answered versus
number sent
Incident Management CSFs and KPIs

Increase visibility and communication of incidents to business and IT staff

Average number of service desk calls for incidents


already reported

Number of business user complaints about the content


or quality of communication

Incident Management CSFs and KPIs

Align incident management activities and priorities with those of


the business

Business Priorities

Percentage of incidents handled


within agreed times
Average cost of an incident

Incident Management CSFs and KPIs

Ensure standard methods and procedures are used for incidents

Number and percentage of incidents

Incorrectly assigned
Incorrectly categorized
Processed per service desk agent
Related to change and release
Challenges and Risks

Early detection of incidents

Convincing all staff that incidents must be logged

Availability of information about problems and known errors

Integration in the CMS to determine relationships between CIs


and history of CIs when performing first line support

Challenges and Risks

Integration into the SLM process to assist with correctly assessing


priority of incidents and defining escalation procedures

Being inundated with incidents due to lack of trained resources

Inadequate support tools

Lack of information

Mismatch in objectives due to poorly aligned or non-existent OLAs


and/or UCs

What We Covered

 Purpose and Objective


 Scope of Incident Management

 Value to the Business

 Activities of Incident Management

 Triggers to Incident Management

 Interfaces to Other processes

 Inputs and Outputs

 Critical Success Factors and Key Performance Indicators (KPIs)

 Challenges and Risks

También podría gustarte