FlyerTalk Forums - View Single Post - IT outsourcing prediction
View Single Post
Old May 29, 2017, 2:19 pm
  #154  
Plato90s
A FlyerTalk Posting Legend
 
Join Date: Feb 2000
Location: Cambridge
Posts: 63,608
Originally Posted by Jimmie76
Whilst I agree Does the substance of what is alleged to have actually happened sound likely? i.e. doing patches on both data centres, not just one?
Yes, it could easily happen that way. Patching is managed by centralized software tools these days, and someone could easily have scheduled both sets to kick off if it was the high urgency patch for the recent WannaCry outbreak.

Usually there'd be change control procedures, but sometimes those get bypassed for high urgency patches.

Despite the mockery of some comments on TheRegister, reboots actually do spike the power usage of servers and there are often component failures that only show themselves upon initialization. Many's the time when a working server hung upon restart.
Originally Posted by Egoldstein
I find this astonishing if true, but some chatter in another forum seems to confirm that airlines tend to test failover very rarely, unlike financial institutions that have to comply with regulatory minimums and audits on DR compliance.

Talk about penny wise, one hundred million pound foolish!
Having worked in the financial industry, I assure you that the failover testing of financial institutions is on the level of "good enough for the auditors".

It's rarely "good enough to restore the business". In my time in the mutual fund part of a top-10 American bank, we could only restore the core systems within 8 hours. Portfolio managers could do their jobs, but no one would be answering calls and the web site won't be updated. If we had to operate out of the DR site and restore 100% of business functionality, it'd take 3+ weeks at a minimum. And that still wouldn't be full functionality, because CSR, QA, document, legal, HR, etc... would not be online at all.

I've only ever worked at one place that was sufficiently dedicated to the idea of disaster recovery to be able to restore 100% of business functionality - including office space for the bare minimum required business users. At that place we ran weekly minimal IT-only DR tests, validated by QA team. Then monthly test with more systems brought up. Then quarterly tests that require business users to put in synthetic transactions. And then a full annual DR test where at least 1 trade was executed from the DR site with minimal staff on-site using the DR site's computers to make sure that part worked as well.

This was only possible by devoting close to 30% of the entire IT budget to the DR functionality. It was a quant hedge fund, and they understood the value of lost time.

Every time I mentioned that DR site experience, it's met with shock and that includes people who work at other top 10 banks.



Even then... that wasn't a really good DR site the facility was less than 70 miles from the primary site.
Plato90s is offline