Skip to content | Change text size
 

ITS Procedure - Critical Incident Management

1. Introduction

These instructions provide a standard procedure for managing Priority 1 and 2 service problems for the IT services provided by ITS.

This procedure is linked to the ITS Service Problem and Critical Incident Management Policy.

2. Responsible Officer

Director - Client Services, ITS

3. Definitions

Technical support staff - ITS staff drawn from a number of operational units working on diagnosis and resolution of the problem - typically from Infrastructure Services or Application Services.

Senior Problem Manager - will be a senior staff member of the ITS Client Services, Application Services or Infrastructure Services. This role must always be occupied and be designated to a staff member.

Campus Service area Senior Problem Manager (SPM) Alternative SPM

Berwick & Peninsula

All local services

ITS Campus Manager, Client Services

Service Desk Officer, Client Services
Service Desk Officer, Client Services

Caulfield

All local services

ITS Campus Manager, Client Services

Service Desk Officer, Client Services

Clayton

Client Services Infrastructure Services Application Services (Administrative) Application Services (Web)

Senior Service Desk Officer, Client Services

Data Centre Manager, Infrastructure Services

Technical Support Manager, Application Services

Senior Portal Developer, Application Services

Call Centre Manager

Senior/Systems Operator

Senior Technical Support Officer, Application Services

Portal Developer, Application Services

Gippsland

All local services

ITS Campus Manager, Client Services

Technology Services Coordinator,
Client Services

Critical Incident Manager - senior ITS operational manager for the service impacted. If more than one service is affected as the result of failure or fault with an infrastructure device, then the ITS operational manager for Infrastructure Services (Manager, Production Facilities) will be the Critical Incident Manager.

Service Area Critical Incident Manager (CIM) On Call

Infrastructure Services

Manager, Production Facilities

0409 350 613

Client Services

Clayton ITS Campus Manager, Client Services

0409 430 936

Application Services

Manager, IAS, Application Services

0400 894 220

See Key Staff for names of current SPM and Critical Incident Managers.

The role of the Critical Incident Manager is to quality assure the correct problem management procedures are being followed, the right resources are working on the problem and if necessary to use the position's authority to ensure the correct focus is applied to resolving the problem.

4. Scope

Covers Priority 1 and Priority 2 problems where a failure in an IT system has resulted in the total inability or severe restriction to perform the normal operation of a significant business function of the University.

5. ProceduresHigh priority problems may be identified in a number of ways:

  • calls coming into the ITS Service Desk, the IAS Service Desk and Production Facilities;
  • alerts from systems monitoring tools such as Patrol and Spectrum; and
  • technical support staff identifying a problem.

The Senior Problem Manager identifies Priority 1 and 2 problems referred from the ITS or IAS Service Desks, or Production Facilities and ensures that the correct follow up/escalation procedure is carried out. All Priority 1 and 2 problems must be referred to the Senior Problem Manager immediately if a 'quick fix' is not available so that a central point of coordination occurs. The Senior Problem Manager is also responsible for keeping stakeholders informed of the problem progress.

Where technical support staff identify the existence of a Priority 1 or 2 problem, they should assess what to do from the following:

  • If the problem can be fixed quickly (i.e. less than 2 minutes), technical staff may proceed to fix the problem without sending downtime notice.
  • If the problem requires more time to resolve, the SPM should be notified as soon as possible (5-10 minutes after problem realised), and a downtime message issued as soon as possible.

It is important that technical support staff notify the SPM as soon as possible and do not spend 5-10 minutes (which can quickly lead into much longer) working on diagnosis and resolution before they notify anyone. If it is a real quick fix scenario, or immediate action is required to prevent damage, then work should be undertaken straight away.

The Senior Problem Manager is responsible for:

  • Receiving/detecting high priority problems
    SPM receives/detects potentially Priority 1 and 2 problems from customers, ITS or IAS Service Desk staff or ITS staff.

  • Performing initial assessment
    Carries out initial assessment to determine extent of service/s affected and decides whether the problem is Priority 1 or 2. Liaises with the appropriate area to perform technical checks to determine where the problem lies and which Support Section the problem should belong to and to allocate the problem to them.

    These initial technical checks should take no more than 10 minutes and feedback must be provided to the Senior Problem Manager so that he/she may continue to co-ordinate the problem. The Senior Problem Manager must take initiative to ensure feedback is provided if none has been received within the allotted time.

  • Allocating the problem to a Support Section
    Ensure that correct Support Section has been notified and is ready to work or are working on the problem.

  • Initial client communications
    • Ensure the Clayton Service Desk staff (Bld 28) are notified and request the Service Desk notify the IAS Service Desk and ITS staff at all campuses.
    • Ensure initial user notification via downtime (if possible) is sent. Downtime messages should be sent by technical support staff, but SPM should negotiate with Tech Support and send downtime message if Tech Support are too busy resolving the problem.
      In the case of Application Services (Administrative), it will be necessary to notify the registered user base in lay terms of the problem, in addition to the downtime notification, (e.g. Callista users, SAP users).
    • Communicate incident details to Portal team for broadcast to affected parties (when the portal broadcast facility is available).
    • Request and brief Clayton Service Desk to set up a recorded message on the Service Interruption Hotline (27000) so that callers are advised of the problem, services affected and, if possible, estimated fix time.
    • After consultation with Critical Incident Manager, activate voicemail notification system for both the CI Administrators Voicemail List (97000) and the CI Technical Voicemail List (95000). If the incident is a Priority 1 and it is not possible to send a downtime message or enduser message (Application Services), then activate voicemail notification immediately.
      The subscribers to these lists are distinct and the communication must be appropriate for the audience. Communication to the CI Administrators Voicemail List must describe the CI in non- technical terms, where as the communication to the CI Technical Voicemail List should provide technical details where possible.
  • Notify the Critical Incident Manager
    As soon as possible after extent of service difficulty is known, contact the Critical Incident Manager to provide initial details of the problem, extent of services affected and resolution efforts.

    Priority 1 - escalate to the Critical Incident Manager under the Critical Incident Manager Procedure at 30 minutes after initial detection.

    Priority 2 - escalate to the Critical Incident Manager under the Critical Incident Manager Procedure at 2 hours after initial detection.

    If the problem does not look resolvable within the above timeframes, escalate to Critical Incident Manager immediately. The Senior Problem Manager is required to use discretion and decide if it is necessary to escalate the problem to the Critical Incident Manager before the above timeframes.

  • Liaising with Critical Incident Manager
    Continue to liaise with the Critical Incident Manager until the problem is resolved. Further co-ordinate and organize information updates to stakeholders (eg, ITS management, TWP and business owners) until resolution of problem is reached.

Role of Critical Incident Manager (CIM)

When a Priority 1 or 2 problem occurs, the Critical Incident Manager is responsible for:

  • Client communications
    Liaise with Senior Problem Manager to ensure all affected clients are aware of the problem status and ITS have communicated the relevant information to them in a timely manner. Ensure that ITS and IAS Service Desk staff are continually updated via Senior Problem Manager.

  • ITS Senior Management communications
    Ensure applicable senior ITS Managers have been made aware of the problem. This would include the Executive Director’s Office, Director Client Services, the Director of the area responsible for providing the affected service, and the Client Communications Manager.

  • ITS Service Coordinators
    Ensure the responsible ITS Service Coordinator, and any other relevant ITS Service Coordinators, have been made aware of the problem.

  • Correct resources have been allocated to the problem
    Check that correct resources have been allocated to resolving the problem. For example, do we have the correct team working on the problem and are the team members sufficiently experienced to cope with such a problem?
    Remove any obstacles hindering the resources from concentrating on the problem. Examples of this would be too many people requesting problem updates, access to certain skill groups, etc. Whatever is needed, it is the Critical Incident Manager's job to ensure they receive it.

  • Initial Contingency Actions
    Meet with ITS service area manager and/or support team and determine what immediate contingency actions are possible.

  • Vendor support
    Ascertain whether consideration has been given to calling upon the vendors. As a general rule, it is suggested to always call the vendors on-site, if possible, for critical incidences.

  • Ongoing problem resolution
    If the problem is not resolved within 3 hours, CIM should call CI meeting to:
    • get a full briefing from support staff working on problem;
    • undertake analysis of effects and ongoing risks;
    • review the approach to problem resolution;
    • determine an alternative courses of actions (if possible);
    • develop contingency plans;
    • ensure appropriate resources are allocated (including vendor support);
    • confirm client communications with Director, Client Services; and
    • confirm the brief to ITS senior management and business process owners.

    Depending on the type of problem, the CIM should determine frequency of team meetings and ensure meetings are held. It is recommended that Critical Incident meetings are held every 3 hours. Given a protracted outage, a situational assessment should be made near close of business each day (i.e. 4.00 pm).

  • Service restoration and review
    Once the problem is resolved, the CIM must:
    • notify clients (ITS Advisory Notice, and other stakeholders directly where appropriate);
    • notify ITS Senior Management;
    • conduct formal review of problem/problem resolution process to document lessons learnt and compile an outstanding issues list for ongoing work. Each outstanding issue will be assigned to the CI Manager of the relevant ITS Department and Director and tracked by the ITS Executive Officer through the regular ITS Director’s meetings.
    • log critical incident in central repository.

Client communications

  • When problem appears
    Priority 1 & 2: Ensure that appropriate downtime messages are sent -- SPM
    Priority 1 & 2: Ensure Service interruption Hotline has appropriate message -- SPM
    Priority 1: Draft 'allstaff' or ‘alluser” email advising existence and nature of problem, services affected, etc and pass to Director Client Services or Director, Application Services for approval and dissemination – CIM

  • After problem resolved
    Priority 1 & 2: Ensure that appropriate ITS Advisory Notices are sent -- CIM
    Priority 1: Draft 'allstaff' and ‘alluser’ email advising that the problem has been resolved and pass to Director Client Services or Director, Application Services for approval and dissemination -- Client Communications

6. Approval for Procedures

Authorising Body/Officer

ITS Directors

Meeting No and Date ITS Directors meeting

11/01; 24 April 2001

Issue of Policy

1 May 2001

Amended

20/7/2002 – S Bamber
20/2/2004 – Hellyer, Macdonald, White

Approved

ITS Directors
06/04; 24 Feb 2004