Gestión de Incidencias

Incident Management
In This Lesson
 Purpose and Objective

 Scope of Incident Management
 Value to the Business
 Activities of Incident Management
 Triggers to Incident Management
 Interfaces to Other processes
 Inputs and Outputs
 Critical Success Factors and Key Performance Indicators (KPIs)
 Challenges and Risks
Definition of an Incident
Incident
An unplanned interruption to an
IT service, reduction in quality of
an IT service, or a failure of a
configuraiton item that has not
yet impacted IT services
Definition of an Incident
Incident management is the process responsible for managing the

lifecycle of all incidents
Examples of incidents
– Failure of the payroll service
– A database lock that prevents data use by an application
– A failure of a disk in a mirrored set that has not yet impacted
services
Purpose of Incident Management
Restore normal service operation

as quickly as possible
Purpose
Minimize adverse impact on

business operations
Purpose of Incident Management
Normal Service Operation

A state where services and CIs are
performing within their agreed
service and operational levels.
Objectives of Incident Management
Ensure standard methods and procedures are used for the

– Efficient and prompt response
– Analysis
– Documentation
– Ongoing management
– Reporting of incidents
Increase visibility and communication of incidents to business and

support staff
Enhance business perception of IT through use of a professional

approach in quickly resolving and communicating incidents
Align IM priorities with those of the business
Maintain user satisfaction with the quality of IT service
Scope of Incident Management
Any event with disrupts or could disrupt service
All events are not incidents
Incidents are not the same as service requests, although both

may be reported to the servicedesk
SLAs and OLAs are important to the incident process to determine

if a service has been disrupted and assist in distinguishing
between events, incidents, and service requests
Value to the Business
Reduction in unplanned labor and cost for both the business and
IT support staff caused by incidents
Lower downtime to the business (which means higher levels of

availability)
Alignment of IT activity to real time business priority

– Enables the capability to capture business priority and dynamically
allocate resources as necessary
The ability to identify potential improvements to services
The service desk can identify additional service or training

requirements found in IT or the business
Activities of Incident Management
Escalation
Resolution
Identification • Functional
identification
• Hierarchical
Initial Resolution/
Logging
diagnosis recovery
Incident
Categorization Prioritization
closure
Event
Web Interface Phone Call Email
Management
Activities of Incident Incident
Management
Identificat ion
To Request
Is Incident? or SPM if
No Change
Proposal
Yes
Incident
Identification Logging
Logging
Incident
Categorization
Incident
Categorization
Prioritization
Major
Incident Major Incident?
Yes
Procedure
Prioritization No
Incident
Diagnosis
Initial diagnosis Functional Functional Escalation

Escalation Escalation Yes Needed?
Yes
No No
Escalation Management
Escalation
Hierarchical
Escalation
Investigate and
Diagnose
No
Yes No
Resolution identification Resolution

Identified?
Yes
Resolution/ recovery Resolution and

Recovery
Incident closure Incident Closure
End
Incident Identification
It is desirable to identify incidents (and resolve them) prior to

business impact
It is usually unacceptable from a business perspective to wait

until a user is impacted and contacts the service desk to notify IT
of an incident
Monitoring of key infrastructure components is critical to the

detection of failures an potential failures
Incident Logging
All incidents
All relevant
must be logged
information
regardless of
should be
where they are
captured
raised
Incident Logging
Incident reference number ___

Categorization __
Urgency _
Impact ___
Priority _
Date/time _
Name of person reporting _
Method of notification _
Name of user
Incident Logging
Contact method for user ___

Symptoms
Incident status __
Related CI
Support group/person incident is allocated _
Related problem or known error
__
Resolution activities
_
Resolution date and time
_
Closure category
__
Closure date and time
Incident Categorization
Incident categorization coding is important so that the exact

incident type is known
This is important for trending analysis later for use in problem

management, supplier management, and other ITSM processes
Categorization is also often used as a basis for support team

identification and routing
There is no ‘one best’ categorization schema although ITIL®

provides guidance on a method to identify an effective
categorization schema
Examples of Categorization Methods
Location Impacted Application Impacted
Service Impacted Database Impacted
or
System Impacted Server Impacted
Application Impacted Disk Impacted
Incident Prioritization
Prioritizing the response to an incident is critical to determine how

the incident is handled
Priority is based on impact to the business, urgency to the

business, and other factors
– Impact is a measure of how much impact the business is
experiencing
– Urgency is how quickly (or long the business can wait) for a
resolution
– Other factors include risk, number of services affected, financial
loss, etc.
Example of a Prioritization Method
Impact
High Medium Low

High 1 2 3
Urgency Medium 2 3 4
Low 3 4 5
Target Resolution
Priority Code Description
Time
1 Critical 1 hour
2 High 8 hours
3 Medium 24 hours
4 Low 48 hours
5 Planning Planned
Initial Diagnosis
Once an incident is logged and prioritized usually the service desk

agent will attempt to diagnose the issue to fully identify the
symptoms of the incident and where possible determine
corrective action
In some cases the service desk may be able to find a resolution

within agreed time frames without additional support groups
Incident matching against a known error database as part of a

knowledge management system may enable more incidents to be
resolved by the service desk
Incident Matching
Incident matching against incident classification is useful to

identify known errors or problems associated with the incident
Effective matching ensures redundant investigation does not

occur over and over again
Escalation occurs if the service desk cannot restore service to the

user
Incident matching
Incident Matching
procedure
(part ofinitial
diagnosis)
Update incident
Extract resolution count on problem
or work around record
Match on KEDB?
action form KEDB Yes
No Yes
Update incident
record with ID of
Update incident problem
Match to
count on KE
existing
problem
record?
Update incident
No
record with
Update incident
classification data
record with ID of
KE
Routine incident?
Update incident
record with
classification data No Yes
Log new problem

Inform customer of
work around
Return toinitial
diagnosis
Incident Escalation
As soon as it is determined that the service desk is unable to

resolve the incident, or the timescales for escalation have been
exceeded (whichever comes first) the incident must be escalated
Escalation procedures must be defined in OLAs and UCs with

internal and external support teams respectively
Note: The service desk retains ownership of the incident

regardless of where the incident is referred to in its life cycle!
Escalation
Hierarchical escalation
– Escalation up the management chain (even if
only for informational purposes)
– Escalation hierarchically may be initiated by
users, customers, or support staff
Functional escalation
– Escalation to a technical specialist or support
group
– May be into the supplier community depending
upon its nature
Major Incidents
Major incidents that have a high degree of business impact and

urgency will often require their own incident model or handling
procedures
– Shorter time scales
– Require emergency changes to be applied for resolution
– May have limited testing prior to implementation (note the risk of
not testing should be understood so as to not make a bad
situation worse)
– May require invocation of the continuity plan
Examples
– A fire in the primary data center
– A security incident where confidential information is disclosed
Incident Resolution and Recovery
Once a potential resolution has been identified, the resolution

should be applied and tested
This may involve

– Having users take a prescribed set of actions
– Technical specialists performing a recovery attempt
– A supplier being asked to remediate a fault
When a recovery is found, sufficient testing must be performed to

ensure that service has been restored and the recovery action is
complete
All actions must be recorded in the incident record
Triggers to Incident Management
Incidents can be triggered by

– Users contacting the service desk via phone, email, or web
interface
– Event monitoring systems

– Technical management
– Operations management
– Application management
– Suppliers notifying of actual or potential difficulty
Interfaces to Other Processes
Service level management

– SLAs and OLAs are critical to defining target resolution times,
impact definitions, response times, etc.
Information security management

Capacity management
Availability management
Service asset configuration management
Change management
Problem management
Access management
Incident Closure
Once service has been restored,the service desk should

– Validate with the user that service has been restored
– Confirm the closure categorization for accuracy
– Carry out a user satisfaction survey
– Document any additional details to fully document the incident
– Determine with the resolving team if the root cause was identified
and if not raise a problem record
– Formally close the incident
Inputs to Incident Management
Information about CIs and their status
Information about known errors and

workarounds
Communication and feedback about incidents

and symptoms
Communication of RFCs and releases

Inputs to Incident Management
Communication of events from event

management
Operational and service level objectives
Customer feedback on incidents
Agreed criteria for prioritizing and escalating

incidents
Outputs of Incident Management
Resolved incidents and resolution actions
Updated incident records
Updated classification of incidents
Raising of problem records
Validation that incidents have not recurred for

problems resolved
Outputs of Incident Management
Feedback on incidents related to changes and

releases
Identification of CIs associated with or

impacted by incidents
Satisfaction feedback from customers
Feedback on level and quality of monitoring

technologies and event management activities
Communication about incident resolution

history to identify overall service quality
Incident Management CSFs and KPIs
Resolve incidents as quickly as possible, minimizing business impact
Mean time to restore service (MTRS)
Breakdown of incidents by activity
Percentage of incidents closed by service desk Number
and percentage of incidents resolved remotely
Number and percentage of incidents resolved without

business impact
Maintain quality of IT services
Total number of incidents (as a control)
Size of current incident backlog for each IT service

Number and percentage of major incidents for each IT
service
Maintain user satisfaction with IT service
Average user/customer survey score

Percentage of satisfaction surveys answered versus
number sent
Increase visibility and communication of incidents to business and IT staff
Average number of service desk calls for incidents

already reported
Number of business user complaints about the content

or quality of communication
Align incident management activities and priorities with those of

the business
Business Priorities
Percentage of incidents handled

within agreed times
Average cost of an incident
Ensure standard methods and procedures are used for incidents
Number and percentage of incidents
Incorrectly assigned
Incorrectly categorized
Processed per service desk agent
Related to change and release
Challenges and Risks
Early detection of incidents
Convincing all staff that incidents must be logged
Availability of information about problems and known errors
Integration in the CMS to determine relationships between CIs

and history of CIs when performing first line support
Challenges and Risks
Integration into the SLM process to assist with correctly assessing

priority of incidents and defining escalation procedures
Being inundated with incidents due to lack of trained resources
Inadequate support tools
Lack of information
Mismatch in objectives due to poorly aligned or non-existent OLAs

and/or UCs
What We Covered
 Purpose and Objective

 Scope of Incident Management
 Value to the Business
 Activities of Incident Management
 Triggers to Incident Management
 Interfaces to Other processes
 Inputs and Outputs
 Critical Success Factors and Key Performance Indicators (KPIs)
 Challenges and Risks

Gestión de Incidencias

Cargado por

Información del documento

Derechos de autor

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Gestión de Incidencias

Cargado por

Copyright:

Incident Management

 Purpose and Objective

Incident management is the process responsible for managing the

Purpose of Incident Management

Restore normal service operation

Minimize adverse impact on

Purpose of Incident Management

Normal Service Operation

Ensure standard methods and procedures are used for the

Increase visibility and communication of incidents to business and

Enhance business perception of IT through use of a professional

Align IM priorities with those of the business

Maintain user satisfaction with the quality of IT service

Scope of Incident Management

Any event with disrupts or could disrupt service

All events are not incidents

Incidents are not the same as service requests, although both

SLAs and OLAs are important to the incident process to determine

Value to the Business

Lower downtime to the business (which means higher levels of

Alignment of IT activity to real time business priority

The ability to identify potential improvements to services

The service desk can identify additional service or training

Activities of Incident Incident

Initial diagnosis Functional Functional Escalation

Resolution identification Resolution

Resolution/ recovery Resolution and

Incident closure Incident Closure

It is desirable to identify incidents (and resolve them) prior to

It is usually unacceptable from a business perspective to wait

Monitoring of key infrastructure components is critical to the

Incident reference number ___

Contact method for user ___

Incident categorization coding is important so that the exact

This is important for trending analysis later for use in problem

Categorization is also often used as a basis for support team

There is no ‘one best’ categorization schema although ITIL®

Examples of Categorization Methods

Location Impacted Application Impacted

Service Impacted Database Impacted

Application Impacted Disk Impacted

Prioritizing the response to an incident is critical to determine how

Priority is based on impact to the business, urgency to the

High Medium Low

Once an incident is logged and prioritized usually the service desk

In some cases the service desk may be able to find a resolution

Incident matching against a known error database as part of a

Incident matching against incident classification is useful to

Effective matching ensures redundant investigation does not

Escalation occurs if the service desk cannot restore service to the

Log new problem

As soon as it is determined that the service desk is unable to

Escalation procedures must be defined in OLAs and UCs with

Note: The service desk retains ownership of the incident

Major incidents that have a high degree of business impact and

– May require invocation of the continuity plan

Incident Resolution and Recovery

Once a potential resolution has been identified, the resolution

This may involve

When a recovery is found, sufficient testing must be performed to

All actions must be recorded in the incident record

Triggers to Incident Management

Incidents can be triggered by

– Event monitoring systems

Service level management

Information security management

Once service has been restored,the service desk should