Search

Disaster recovery: FIA holds annual test

31 January 2014

By

For the last decade, the U.S. futures industry has conducted a disaster recovery test each fall to assess industry-wide preparedness to respond to business disruptions.

The test, which was first established by FIA after the terrorist attacks in 2001, typically involves conducting a mock trading session on the weekend. This allows exchanges, clearinghouses and brokers to test how well their back-up systems operate and identify any problems that might disrupt trading during a real-life disaster.

Over the last 10 years, the industry has undergone a number of changes, including the growth of algorithmic and high-frequency trading, the increasing reliance on co-location of trading infrastructure in data centers, and a restructuring of the exchange landscape through mergers, divestitures, and the introduction of new venues. Because of these impacts, market participants are constantly making changes and enhancements to their technology environments, and no one player or system is totally static.  

The most recent test, which was conducted on Oct. 5, focused on back-up connectivity and functionality. The test benefited from the participation of major derivatives exchanges and clearinghouses, many of which are based outside the U.S. In addition, 64 firms participated in the test. Those firms handle between 83% and 95% of the volume on the participating exchanges, indicating that the test covered a critical mass of the industry.  

While results from the 2013 test were generally positive, the results helped provide guidance on where certain improvements are needed. In addition, a survey of how firms functioned during Superstorm Sandy, which was conducted in conjunction with the annual test, also shed light on both strengths and weaknesses in the industry’s preparedness for disruptions.  

Goals of the test

The key objectives of the annual test is to assess the round-trip communications connectivity between the various market participants–exchanges, clearinghouses, clearing and non-clearing firms–from their secondary or back-up sites and systems. This includes successfully exercising systems such as order management, FIX messaging gateways, risk management, trade matching and execution reporting.  

The amount of the pre-test planning, preparation and execution is not trivial. The preparations for this year’s test started six months in advance of Oct. 5 and involved between 800 and 1,000 support staff and management from various market participants and exchanges across the U.S., Canada and Europe.  

As in the past, the test was organized by FIA’s Information Technology Division, and in particular, the IT division’s business continuity management committee.  

During the test, the exchanges and clearinghouses typically operate from their backup sites using their back-up systems. The test plans lay out the type of pre-test information that the firms and service providers are required to provide to the exchanges, including key contacts and the identification of the systems and interfaces to be used during the test.  

The test script provided by the exchanges dictates which products the firms are expected to create orders for, what functionality to exercise, and what information to retrieve from the exchange web portals or clearing systems.  

Pre-test communications connectivity testing is offered by the exchanges to ensure that firms and service providers can successfully connect from their back-up systems/ sites and to facilitate an orderly start of the actual test.  

Testing takes place on a Saturday when markets are closed. Systems utilized in the test are prepared and backed-up for recovery purposes on Friday evening, as is the norm. Data created during the test (i.e., orders, quotes, trades, output reports, etc.) are captured by the exchanges and stored for future research and analysis purposes. All systems are returned to a production-ready state in anticipation of trading on Sunday evening or Monday.  

These tests are not intended to be high volume stress tests. The rationale is that if a firm can use its back-up systems to connect to an exchange, enter a small but meaningful amount of orders and receive execution reports or clearinghouse outputs, it can conduct business.  

2013 test results

During the 2013 testing, firms tested from their back-up sites as far away as Florida, Illinois, Missouri, New Jersey, New York, North Carolina and Utah, as well as Frankfurt, London, Madrid, Montreal, Paris and Toronto. In addition, two new markets joined the effort–ICE Clear Credit and TrueEX. (See list of market participants).  

Firms indicated that the test helped them:  

  • Exercise their business continuity/disaster recovery plans,
  • Identify internal single points of failure,
  • Test other in-house applications and systems at the same time,
  • Tighten up and document their business continuity procedures,
  • Better understand the need for crosstraining, and
  • Test connectivity with exchanges’ DR sites.

The exchanges and clearinghouses indicated that the test helped them:

  • Test connectivity to/from DR sites,
  • Identify/refine pre-test and post-test procedures  for connectivity testing,
  • Tighten up and document their business continuity and system fail-over procedures,
  • Improve test scripts and plans for future tests,
  • Identify some internal single points of failure, and
  • Better understand the need for crosstraining.

The test uncovered some areas for improvement. Most of these problems encountered were categorized as “real world” issues, such as incorrect network IP addresses or MQ channels, hardware or software configuration anomalies, and lack of technical or domain knowledge on test day. Most of these issues were promptly resolved in concert with staff from the exchanges and clearinghouses, and issues that could not be resolved were documented for follow-up actions.   

One recommendation that emerged from the test was that exchanges should investigate more efficient methods to facilitate seamless fail-over from primary to back-up systems. One problem was that firms with primary systems co-located in an exchange’s data center were impacted when the exchange failed over to its DR data center.

Superstorm Sandy

The committee also conducted a survey among the participants to assess the impact of Superstorm Sandy on their business continuity performance. There was general agreement among those surveyed that regular business continuity and disaster recovery testing paid off when this storm hit the greater New York metropolitan area in the fall of 2012.

Because of regular testing, firms were well prepared to cope with power outages and disruptions caused by the storm. Most of the firms surveyed put their business continuity plans in place ahead of the storm and therefore did not need to rely on disaster recovery operations. In addition, market participants that heeded the storm warnings communicated with critical staff and decision-makers early and often, and this proved to be key to the incident management process.

One of the most disruptive aspects of Superstorm Sandy was the closure of mass transit, highways, bridges and tunnels. Employees in affected areas such as New Jersey, Manhattan and Long Island were unable to leave home due to storm damage, lack of transportation or an unwillingness to leave their families. While many staff could work remotely, widespread power outages impeded their telecommuting and collaboration abilities.  

Depending on their location, some firms had significant damage to mission critical systems or infrastructure and facilities. As a result, firms surveyed said they are reconsidering the proximity and locations of their primary and back-up facilities. In addition, they are contemplating changes to key locations, adding more redundant power and communications, using separate diverse carriers, hardening of data centers and sites, and increasing their business continuity training.  

Looking ahead to 2014  

The 2014 test has been scheduled for Saturday Oct. 25. The committee plans to expand the scope to address the firms’ abilities to test from remote recovery site locations (i.e., out of region capabilities), and variations on what has been tested in the past. This is in part to address evolving regulatory requirements that impact the business continuance and disaster recovery of market participants.  

The committee will continue to coordinate and refine the annual test initiative and strive to add additional participants. This may include additional clearing and non-clearing firms, designated contract markets, swap execution facilities and swap data repositories.  

As has been done in prior years, the committee will liaise and communicate with its equities markets counterparts at the Securities Industry and Financial Markets Association, as both industry groups have conducted somewhat similar testing on common dates in prior years and many firms are members of both industry organizations.

John Rapa is the president and chief executive officer of Tellefsen and Company. 

  • FIA