A foreign bank operating in Japan faced regulatory fines and escalating oversight from the SEC in that country. One of the causes was an escalating pattern of incorrect or incomplete data being sent to the stock exchange. Another was very late notification to the exchange that these problems had occurred.
A major source of these problems was the sprawling complexity of the environment. At the time, the IT operation at the bank consisted of some 3,000 people, largely consisting of on-site contractors from firms both domestic and foreign. There were over 100 applications in production, and four major organizational divisions within IT that reflected divisions across the rest of the organization. Further complicating the matter was the international nature of the business being conducted, and the fact that the bank relied on infrastructure and application support organizations located in other corporate entities in various parts of the world. The bank's many units around the world employed some 400,000 people at the time.
I was tasked with implementing the ITIL functions for incident management and problem management. This required a good deal of collaboration with Compliance and the technical and support teams globally, to exert some influence on the organization without the authority to demand cooperation. I approached the situation by first understanding the challenges confronting each team and understanding how the standard processes espoused in the ITIL could improve the recovery times during an incident as well as the longer process of fixing the underlying error. At the same time, I had my team take on the responsibility of involving Compliance from those teams that were busy solving the technical issue, to effect the required improvements in notifying the exchange when an error had occurred.
As the process of discovery and process development continued, I produced reports that broke down the steady stream of incidents by department, application, severity, duration, and support teams. This helped us understand the trends and any bottlenecks that were preventing improvement.
Serving such reports to the operations and development teams involved has a number of constructive effects: it helps those teams understand their responsibilities and the impacts of their processes; it helps them understand where they are or might not be making progress; and it allows for problems to "surface" about whether the impacted applications are at all suited for the purpose to which they're being put.
Doing so served as an invaluable education in understanding which metrics really matter. For instance, if you cannot fix a certain factor—either because you cannot control that factor or because nothing you try seems to have an impact—it's likely that the metric is masking root causes that will require further analysis.
Within nine months of the initial kick-off, the bank was still experiencing incidents on a regular basis. But several improvements were noticeable. Regulatory reporting times had fallen from two weeks to a matter of hours, well within the required time frames. One of the front office departments was able to demonstrate a steady decline in the total number of incidents they experienced. The company as a whole was experiencing far fewer total outage hours. And many groups within the vast IT department were working together in a far more collaborative way.
In response to these successes, I was asked to take on the ITIL-based change management function. Having regularly suffered with incidents as a result of changes to the technical environment, I instituted a number of improvements to the way changes were made.
In requiring that all planned production changes first be made to the business-continuity infrastructure for testing, I had the teams making the changes do a dry run of the production change. This had the knock-on effect of bringing the company's business-continuity infrastructure up to production readiness for the first time.
In instituting an instant rejection of any planned change if the progenitors didn't come to the weekly firm-wide change management meeting, I caused people to not only attend the meetings but to hear for the first time the changes that were being planned in other groups.
The metrics in change management were somewhat easier: I only had to show how many incidents occurred due to a change. Within six months of my taking on the function, this number had fallen sharply to only one incident for every 600 changes.
Having successfully passed a number of internal and regulatory audits of our function, our system of change and incident management was selected as a model for use across the region. The winds of the financial disaster were already gathering force by this time, however, and my time with the bank drew to a close. The project to take my function to the region did not materialize.