|
|
|
ITS Procedure - Critical Incident Management
1. Introduction
These instructions provide a standard procedure for managing Priority
1 and 2 service problems for the IT services provided by ITS.
This procedure is linked to the ITS Service Problem
and Critical Incident Management Policy.
2. Responsible Officer
Director - Client Services, ITS
3. Definitions
Technical support staff - ITS staff drawn from a number
of operational units working on diagnosis and resolution of the problem
- typically from Infrastructure Services or Application Services.
Senior Problem Manager - will be a senior staff member
of the ITS Client Services, Application Services or Infrastructure Services.
This role must always be occupied and be designated to a staff member.
| Campus |
Service area |
Senior Problem Manager (SPM) |
Alternative SPM |
Berwick & Peninsula |
All local services |
ITS Campus Manager, Client Services |
Service Desk Officer, Client Services
Service Desk Officer, Client Services |
Caulfield |
All local services |
ITS Campus Manager, Client Services |
Service Desk Officer, Client Services |
Clayton |
Client Services Infrastructure Services Application Services
(Administrative) Application Services (Web) |
Senior Service Desk Officer, Client Services
Data Centre Manager, Infrastructure Services
Technical Support Manager, Application Services
Senior Portal Developer, Application Services |
Call Centre Manager
Senior/Systems Operator
Senior Technical Support Officer, Application Services
Portal Developer, Application Services |
Gippsland |
All local services |
ITS Campus Manager, Client Services |
Technology Services Coordinator,
Client Services |
Critical Incident Manager - senior ITS operational manager
for the service impacted. If more than one service is affected as the
result of failure or fault with an infrastructure device, then the ITS
operational manager for Infrastructure Services (Manager, Production Facilities)
will be the Critical Incident Manager.
| Service Area |
Critical Incident Manager (CIM) |
On Call |
Infrastructure Services |
Manager, Production Facilities |
0409 350 613 |
Client Services |
Clayton ITS Campus Manager, Client Services |
0409 430 936 |
Application Services |
Manager, IAS, Application Services |
0400 894 220 |
See Key Staff for names of current SPM and
Critical Incident Managers.
The role of the Critical Incident Manager is to quality assure the correct
problem management procedures are being followed, the right resources
are working on the problem and if necessary to use the position's authority
to ensure the correct focus is applied to resolving the problem.
4. Scope
Covers Priority 1 and Priority 2 problems where a failure in an IT system
has resulted in the total inability or severe restriction to perform the
normal operation of a significant business function of the University.
5. ProceduresHigh priority problems may be identified in a number of
ways:
- calls coming into the ITS Service Desk, the IAS Service Desk and Production
Facilities;
- alerts from systems monitoring tools such as Patrol and Spectrum;
and
- technical support staff identifying a problem.
The Senior Problem Manager identifies Priority 1 and 2 problems referred
from the ITS or IAS Service Desks, or Production Facilities and ensures
that the correct follow up/escalation procedure is carried out. All Priority
1 and 2 problems must be referred to the Senior Problem Manager immediately
if a 'quick fix' is not available so that a central point of coordination
occurs. The Senior Problem Manager is also responsible for keeping stakeholders
informed of the problem progress.
Where technical support staff identify the existence of a Priority 1
or 2 problem, they should assess what to do from the following:
- If the problem can be fixed quickly (i.e. less than 2 minutes), technical
staff may proceed to fix the problem without sending downtime notice.
- If the problem requires more time to resolve, the SPM should be notified
as soon as possible (5-10 minutes after problem realised), and a downtime
message issued as soon as possible.
It is important that technical support staff notify the SPM as soon as
possible and do not spend 5-10 minutes (which can quickly lead into much
longer) working on diagnosis and resolution before they notify anyone.
If it is a real quick fix scenario, or immediate action is required to
prevent damage, then work should be undertaken straight away.
The Senior Problem Manager is responsible for:
-
Receiving/detecting high priority problems
SPM receives/detects potentially Priority 1 and 2 problems from customers,
ITS or IAS Service Desk staff or ITS staff.
-
Performing initial assessment
Carries out initial assessment to determine extent of service/s affected
and decides whether the problem is Priority 1 or 2. Liaises with the
appropriate area to perform technical checks to determine where the
problem lies and which Support Section the problem should belong to
and to allocate the problem to them.
These initial technical checks should take no more than 10 minutes
and feedback must be provided to the Senior Problem Manager so that
he/she may continue to co-ordinate the problem. The Senior Problem
Manager must take initiative to ensure feedback is provided if none
has been received within the allotted time.
-
Allocating the problem to a Support Section
Ensure that correct Support Section has been notified and is ready
to work or are working on the problem.
- Initial client communications
- Ensure the Clayton Service Desk staff (Bld 28) are notified and
request the Service Desk notify the IAS Service Desk and ITS staff
at all campuses.
- Ensure initial user notification via downtime (if possible) is
sent. Downtime messages should be sent by technical support staff,
but SPM should negotiate with Tech Support and send downtime message
if Tech Support are too busy resolving the problem.
In the case of Application Services (Administrative), it will be
necessary to notify the registered user base in lay terms of the
problem, in addition to the downtime notification, (e.g. Callista
users, SAP users).
- Communicate incident details to Portal team for broadcast to affected
parties (when the portal broadcast facility is available).
- Request and brief Clayton Service Desk to set up a recorded message
on the Service Interruption Hotline (27000) so that callers are
advised of the problem, services affected and, if possible, estimated
fix time.
- After consultation with Critical Incident Manager, activate voicemail
notification system for both the CI Administrators Voicemail List
(97000) and the CI Technical Voicemail List (95000). If the incident
is a Priority 1 and it is not possible to send a downtime message
or enduser message (Application Services), then activate voicemail
notification immediately.
The subscribers to these lists are distinct and the communication
must be appropriate for the audience. Communication to the CI Administrators
Voicemail List must describe the CI in non- technical terms, where
as the communication to the CI Technical Voicemail List should provide
technical details where possible.
- Notify the Critical Incident Manager
As soon as possible after extent of service difficulty is known, contact
the Critical Incident Manager to provide initial details of the problem,
extent of services affected and resolution efforts.
Priority 1 - escalate to the Critical Incident Manager
under the Critical Incident Manager Procedure at 30 minutes after initial
detection.
Priority 2 - escalate to the Critical Incident Manager
under the Critical Incident Manager Procedure at 2 hours after initial
detection.
If the problem does not look resolvable within the above
timeframes, escalate to Critical Incident Manager immediately.
The Senior Problem Manager is required to use discretion and decide
if it is necessary to escalate the problem to the Critical Incident
Manager before the above timeframes.
- Liaising with Critical Incident Manager
Continue to liaise with the Critical Incident Manager until the problem
is resolved. Further co-ordinate and organize information updates to
stakeholders (eg, ITS management, TWP and business owners) until resolution
of problem is reached.
Role of Critical Incident Manager (CIM)
When a Priority 1 or 2 problem occurs, the Critical Incident Manager
is responsible for:
-
Client communications
Liaise with Senior Problem Manager to ensure all affected clients
are aware of the problem status and ITS have communicated the relevant
information to them in a timely manner. Ensure that ITS and IAS Service Desk staff are continually updated via Senior Problem Manager.
-
ITS Senior Management communications
Ensure applicable senior ITS Managers have been made aware of the
problem. This would include the Executive Director’s Office,
Director Client Services, the Director of the area responsible for
providing the affected service, and the Client Communications Manager.
-
ITS Service Coordinators
Ensure the responsible ITS Service Coordinator, and any other relevant
ITS Service Coordinators, have been made aware of the problem.
-
Correct resources have been allocated to the problem
Check that correct resources have been allocated to resolving the
problem. For example, do we have the correct team working on the problem
and are the team members sufficiently experienced to cope with such
a problem?
Remove any obstacles hindering the resources from concentrating on
the problem. Examples of this would be too many people requesting
problem updates, access to certain skill groups, etc. Whatever is
needed, it is the Critical Incident Manager's job to ensure they receive
it.
-
Initial Contingency Actions
Meet with ITS service area manager and/or support team and determine
what immediate contingency actions are possible.
-
Vendor support
Ascertain whether consideration has been given to calling upon the
vendors. As a general rule, it is suggested to always call the vendors
on-site, if possible, for critical incidences.
- Ongoing problem resolution
If the problem is not resolved within 3 hours, CIM should call CI meeting
to:
- get a full briefing from support staff working on problem;
- undertake analysis of effects and ongoing risks;
- review the approach to problem resolution;
- determine an alternative courses of actions (if possible);
- develop contingency plans;
- ensure appropriate resources are allocated (including vendor
support);
- confirm client communications with Director, Client Services;
and
- confirm the brief to ITS senior management and business process
owners.
Depending on the type of problem, the CIM should determine frequency
of team meetings and ensure meetings are held. It is recommended that
Critical Incident meetings are held every 3 hours. Given a protracted
outage, a situational assessment should be made near close of business
each day (i.e. 4.00 pm).
- Service restoration and review
Once the problem is resolved, the CIM must:
- notify clients (ITS Advisory Notice, and other stakeholders directly
where appropriate);
- notify ITS Senior Management;
- conduct formal review of problem/problem resolution process to
document lessons learnt and compile an outstanding issues list for
ongoing work. Each outstanding issue will be assigned to the CI
Manager of the relevant ITS Department and Director and tracked
by the ITS Executive Officer through the regular ITS Director’s
meetings.
- log critical incident in central repository.
Client communications
-
When problem appears
Priority 1 & 2: Ensure that appropriate downtime
messages are sent -- SPM
Priority 1 & 2: Ensure Service interruption Hotline
has appropriate message -- SPM
Priority 1: Draft 'allstaff' or ‘alluser”
email advising existence and nature of problem, services affected,
etc and pass to Director Client Services or Director, Application
Services for approval and dissemination – CIM
-
After problem resolved
Priority 1 & 2: Ensure that appropriate ITS Advisory
Notices are sent -- CIM
Priority 1: Draft 'allstaff' and ‘alluser’
email advising that the problem has been resolved and pass to Director
Client Services or Director, Application Services for approval and
dissemination -- Client Communications
6. Approval for Procedures
| Authorising Body/Officer |
ITS Directors |
| Meeting No and Date ITS Directors meeting |
11/01; 24 April 2001 |
| Issue of Policy |
1 May 2001 |
| Amended |
20/7/2002 – S Bamber
20/2/2004 – Hellyer, Macdonald, White |
| Approved |
ITS Directors
06/04; 24 Feb 2004 |
|