IT outsourcing prediction

Reply

Old May 29, 17, 11:48 am
  #151  
 
Join Date: Dec 2015
Location: UK
Programs: BAEC Silver, *A, Marriott
Posts: 175
Originally Posted by toothy View Post
More likely - the failover plan has never been tested
I find this astonishing if true, but some chatter in another forum seems to confirm that airlines tend to test failover very rarely, unlike financial institutions that have to comply with regulatory minimums and audits on DR compliance.

Talk about penny wise, one hundred million pound foolish!
Egoldstein is offline  
Reply With Quote
Old May 29, 17, 1:18 pm
  #152  
 
Join Date: Apr 2014
Location: London
Programs: Don't even mention it. Grrrrrrr.
Posts: 945
Originally Posted by Egoldstein View Post
I find this astonishing if true, but some chatter in another forum seems to confirm that airlines tend to test failover very rarely, unlike financial institutions that have to comply with regulatory minimums and audits on DR compliance.

Talk about penny wise, one hundred million pound foolish!
It's actually very risky executing a DR plan. You may leave yourself with nothing if it doesn't work. So....why test it if you'll probably never need it anyway?

And if you do end up needing it, you can fix the problems at that time.
Banana4321 is offline  
Reply With Quote
Old May 29, 17, 2:13 pm
  #153  
 
Join Date: Mar 2013
Location: EMEA and US of A
Programs: BA Silver, Delta Gold, United Gold, Marriott Gold, Starwood Gold, Hilton Gold, Amex Platinum
Posts: 1,749
Originally Posted by Banana4321 View Post
It's actually very risky executing a DR plan. You may leave yourself with nothing if it doesn't work. So....why test it if you'll probably never need it anyway?

And if you do end up needing it, you can fix the problems at that time.
As insurance that it is there WHEN you'll need it, because these days it is not a matter of IF. A well-thought out, well-engineered DR plan carries a degree of risk, but that is why it must be well-prepared with back-out contingencies at every stage. You have systems and processes that have been tested rigorously to ensure that whatever backup you have does work and all components have individual redundancy so that there is no domino effect.
techie is offline  
Reply With Quote
Old May 29, 17, 2:19 pm
  #154  
A FlyerTalk Posting Legend
 
Join Date: Feb 2000
Location: Cambridge
Posts: 57,618
Originally Posted by Jimmie76 View Post
Whilst I agree Does the substance of what is alleged to have actually happened sound likely? i.e. doing patches on both data centres, not just one?
Yes, it could easily happen that way. Patching is managed by centralized software tools these days, and someone could easily have scheduled both sets to kick off if it was the high urgency patch for the recent WannaCry outbreak.

Usually there'd be change control procedures, but sometimes those get bypassed for high urgency patches.

Despite the mockery of some comments on TheRegister, reboots actually do spike the power usage of servers and there are often component failures that only show themselves upon initialization. Many's the time when a working server hung upon restart.
Originally Posted by Egoldstein View Post
I find this astonishing if true, but some chatter in another forum seems to confirm that airlines tend to test failover very rarely, unlike financial institutions that have to comply with regulatory minimums and audits on DR compliance.

Talk about penny wise, one hundred million pound foolish!
Having worked in the financial industry, I assure you that the failover testing of financial institutions is on the level of "good enough for the auditors".

It's rarely "good enough to restore the business". In my time in the mutual fund part of a top-10 American bank, we could only restore the core systems within 8 hours. Portfolio managers could do their jobs, but no one would be answering calls and the web site won't be updated. If we had to operate out of the DR site and restore 100% of business functionality, it'd take 3+ weeks at a minimum. And that still wouldn't be full functionality, because CSR, QA, document, legal, HR, etc... would not be online at all.

I've only ever worked at one place that was sufficiently dedicated to the idea of disaster recovery to be able to restore 100% of business functionality - including office space for the bare minimum required business users. At that place we ran weekly minimal IT-only DR tests, validated by QA team. Then monthly test with more systems brought up. Then quarterly tests that require business users to put in synthetic transactions. And then a full annual DR test where at least 1 trade was executed from the DR site with minimal staff on-site using the DR site's computers to make sure that part worked as well.

This was only possible by devoting close to 30% of the entire IT budget to the DR functionality. It was a quant hedge fund, and they understood the value of lost time.

Every time I mentioned that DR site experience, it's met with shock and that includes people who work at other top 10 banks.



Even then... that wasn't a really good DR site the facility was less than 70 miles from the primary site.
Plato90s is online now  
Reply With Quote
Old May 29, 17, 2:29 pm
  #155  
 
Join Date: Apr 2014
Location: London
Programs: Don't even mention it. Grrrrrrr.
Posts: 945
Good to get more reality on here Plato90s

There's obviously a perception amongst the general public that somehow every company has a perfect DR plan that is guaranteed to work perfectly every time. The reality is that is it a cost/risk decision like everything else.
Banana4321 is offline  
Reply With Quote
Old May 29, 17, 3:57 pm
  #156  
 
Join Date: Sep 2004
Programs: BA Gold
Posts: 402
Originally Posted by Plato90s View Post

It's rarely "good enough to restore the business". In my time in the mutual fund part of a top-10 American bank, we could only restore the core systems within 8 hours. Portfolio managers could do their jobs, but no one would be answering calls and the web site won't be updated. If we had to operate out of the DR site and restore 100% of business functionality, it'd take 3+ weeks at a minimum. And that still wouldn't be full functionality, because CSR, QA, document, legal, HR, etc... would not be online at all.

I've only ever worked at one place that was sufficiently dedicated to the idea of disaster recovery to be able to restore 100% of business functionality - including office space for the bare minimum required business users. At that place we ran weekly minimal IT-only DR tests, validated by QA team. Then monthly test with more systems brought up. Then quarterly tests that require business users to put in synthetic transactions. And then a full annual DR test where at least 1 trade was executed from the DR site with minimal staff on-site using the DR site's computers to make sure that part worked as well.

This was only possible by devoting close to 30% of the entire IT budget to the DR functionality. It was a quant hedge fund, and they understood the value of lost time.

Every time I mentioned that DR site experience, it's met with shock and that includes people who work at other top 10 banks.



Even then... that wasn't a really good DR site the facility was less than 70 miles from the primary site.
I don't think that's truly representative. Also working in the Financial Industry for a top 10 US institution, we regularly (every 6 months) run DR / SR failovers (Disaster Recovery / Sustained Resilience) which means we fail over to the DR systems and site then run our business from there for the next 6 months, then fail back. Therefore we don't really have the concept of a 'DR' site and system, since they are used to run our production platform every 6 months.
I can see how this would be difficult to implement in the airline industry since it does require some downtime at the weekends to flip/flop back and forth.
FliGuy is offline  
Reply With Quote
Old May 29, 17, 4:05 pm
  #157  
A FlyerTalk Posting Legend
 
Join Date: Feb 2000
Location: Cambridge
Posts: 57,618
Originally Posted by FliGuy View Post
I don't think that's truly representative. Also working in the Financial Industry for a top 10 US institution, we regularly (every 6 months) run DR / SR failovers (Disaster Recovery / Sustained Resilience) which means we fail over to the DR systems and site then run our business from there for the next 6 months, then fail back. Therefore we don't really have the concept of a 'DR' site and system, since they are used to run our production platform every 6 months.
I can see how this would be difficult to implement in the airline industry since it does require some downtime at the weekends to flip/flop back and forth.
It depends a lot on which components of the bank you're talking about. The parts of the business which are based on mainframe technology is indeed fully redundant with geographic diversity, etc....

The part of the business based on LUM (Linux-Unix-Microsoft) are almost never DR ready. That means at the mutual fund unit, client data and holdings were on mainframe - fully redundant. But the PM can only access the mainframe data through an application, and the same was true for statements, client communication, etc... In the event of total outage, the PM can still pick up a phone and call to have orders placed and holdings printed on paper for overnight delivery. But that's hardly 100% business functionality. From a customer perspective, the mutual fund would effectively be crippled until we could restore the mixed Solaris-Windows environment no matter how functional the back office was.

Airlines like BA don't spend enough on IT to run on mainframe technology when it comes to their internal tech. The reservation/ticketing systems are on mainframe and they're robust, but I'm confident that the BA-specific technology are based on LUM.
Plato90s is online now  
Reply With Quote
Old May 29, 17, 4:06 pm
  #158  
 
Join Date: Mar 2014
Posts: 189
So happy I'm no longer in IT. Pity whoever was on call when this disaster happened!
AmaaiZeg is offline  
Reply With Quote
Old May 29, 17, 4:08 pm
  #159  
 
Join Date: Sep 2004
Programs: BA Gold
Posts: 402
I am referring to our trading platforms, front office , middle office and back office. a reasonable mix of mainframe and Linux (who runs on Unix these days :-) ) and Cloud platforms which simplifies the whole process even further.
FliGuy is offline  
Reply With Quote
Old May 29, 17, 4:14 pm
  #160  
 
Join Date: Jun 2009
Location: UK
Programs: Lemonia. Best Greek ever.
Posts: 1,362
The IT spend in the Fin businesses is huge. I don't have the Gartner numbers to hand, but merchant bank type businesses at c 14% of t/o. rings a bell. The timing critical guys and girls see 25% of t/o.

BA are chiselling away at their IT. Anyone know what their spend is?

Maybe they were sold the Outsourcing with the "good" folk fronting up, right down to the code writers and the spannerfolk. What most clients don't know, is that these good folk move on after a couple of weeks, and are replaced by, er, others. That is the culture of Consultants and Outsourcers world wide.
Ancient Observer is offline  
Reply With Quote
Old May 29, 17, 4:14 pm
  #161  
A FlyerTalk Posting Legend
 
Join Date: Feb 2000
Location: Cambridge
Posts: 57,618
Originally Posted by FliGuy View Post
I am referring to our trading platforms, front office , middle office and back office. a reasonable mix of mainframe and Linux (who runs on Unix these days :-) ) and Cloud platforms which simplifies the whole process even further.
This was some years ago when Solaris was still viable.

If your data was on mainframe and there were no intermediate stages holding data - just UI and middleware, it's a lot easier to fail over. Especially since there are market holidays every week.

Consider that Amazon has had outages to their cloud platform which lasted for days. Even for professional large-scale providers, the LUM ecosystem is just not designed for that level of redundancy especially when there's near-continuous data streams.

I current work with an environment where we have active-active for parts of the environment, but that's hugely expensive (reflected in what the client is charged) and there's still the headache of getting bad data replicated across to both sites.

(sigh) I miss working with a mainframe back end. You could depend on those guys to keep their promises of uptime.

Then again... it did cost an arm and a leg.
Originally Posted by Ancient Observer View Post
The IT spend in the Fin businesses is huge. I don't have the Gartner numbers to hand, but merchant bank type businesses at c 14% of t/o. rings a bell. The timing critical guys and girls see 25% of t/o.
At the quant hedge fund, IT spending was the 2nd biggest cash line item - after disbursements to partners.

We spent more on IT than on labor, and that's with some pretty highly paid folks on staff (not me - the folks doing the financial models).



I suspect the BA chief is parsing his words very carefully in that the data center was in UK and the on-site staff are British.

But the team actually supporting the software are likely NOT to be 100% local.


As I recall, there was a major outage of a British bank some years back where it was because the India-based team made a procedural mistake and it fouled up the system horribly where payments weren't scheduled, deposits weren't processed, etc....

A system which is precarious needs experienced hands, and outsourcing is often the worst thing you can do.

ETA:

Here we go - 2012 outage of RBS

http://www.telegraph.co.uk/finance/p...-in-India.html

Last edited by Plato90s; May 29, 17 at 4:19 pm
Plato90s is online now  
Reply With Quote
Old May 30, 17, 3:44 pm
  #162  
 
Join Date: Apr 2017
Location: FAI, DUB
Programs: AS MVPG 75K, LH, Marriott Plat, Hertz Pres Circ, National Executive Elite
Posts: 96
Originally Posted by Egoldstein View Post
I find this astonishing if true, but some chatter in another forum seems to confirm that airlines tend to test failover very rarely, unlike financial institutions that have to comply with regulatory minimums and audits on DR compliance.

Talk about penny wise, one hundred million pound foolish!
A lot of DR compliance is tested "in theory" without to actually do it in practice, even in highly regulated industries. It's both an infrastructure cost issue as well as a business disruption issue.

In order to be fully redundant you would need 2 N infrastructure whereas most companies run on 1.2 to maybe 1.7 N. So they can fail over in emergencies but will have to discontinue some services, that's why full failover is extremely rarely tested.

For highly transactional workloads failover creates potential data integrity issues during consolidation after the failover, which is again why most outfits don't actually do it for real.

Originally Posted by Ancient Observer View Post
The IT spend in the Fin businesses is huge. I don't have the Gartner numbers to hand, but merchant bank type businesses at c 14% of t/o. rings a bell. The timing critical guys and girls see 25% of t/o.

BA are chiselling away at their IT. Anyone know what their spend is?

Maybe they were sold the Outsourcing with the "good" folk fronting up, right down to the code writers and the spannerfolk. What most clients don't know, is that these good folk move on after a couple of weeks, and are replaced by, er, others. That is the culture of Consultants and Outsourcers world wide.
I am not convinced.
In US healthcare the IT spend is about 3-5% of revenue, if memory serves right then the IT spend of US financial institutions was about 7-9% of revenue. I don't have access to the report anymore since I left healthcare, but I was in charge of IT infrastructure with a 22M annual budget (at a healthcare company where we spent about 4.2% of revenue on IT total, not just infra).

I also have to object to Outsourcers constantly changing staff as a general statement. It's true if you go with the lowest bidder, but especially in the data center world managed data centers the turnover is pretty low because the staff is well paid. TDS is one such example (https://www.transitionaldata.com).

It's simply not true that outsourcing is always less expensive and always of ...... quality. The problem is that consumers never see the magnitude of oursourcing, all they know is that sometimes they pick up the phone and are greeted by substandard outsourced support.

Lots of other operations in the airline industry, and most other industries, are outsourced. Why doesn't BA have it's own food services, baggage handlers, aircraft cleaning, aircraft fueling, etc. etc. etc.?

Originally Posted by Plato90s View Post
Then again... it did cost an arm and a leg.At the quant hedge fund, IT spending was the 2nd biggest cash line item - after disbursements to partners.

A system which is precarious needs experienced hands, and outsourcing is often the worst thing you can do.
Ehm ..., of course IT is the 2nd biggest cash line item, and in most companies it's actually the largest line item. Because it's a corporate service whereas all other product areas or business lines are rarely if ever corp wide. That's just accounting for you. IT is less expensive than say product development, because product dev is spread across a dozen line items, if you were to combine all of those then product dev would be much more expensive than IT.

The line item size is completely irrelevant, what matters is what percentage of the budget IT operations is in comparison to revenue.

Again, the implication that outsourcing is always low quality and that if you want "experienced hands" you have to do it in-house is simply not true. I really wish that people would just think for a moment before making such a baseless claim.

Continuing with the "you know what else is outsourced" line; emergency physicians in the US are generally not employed by the hospital they are staffing, they are an "outsourced" resource very much because experienced staff is needed at all times, and a staffing company can provide a level of redunancy and resiliancy that in-house staff never ever could unless you vastly overstaff. It's simple math really.
MeanwhileBackAtFAI is offline  
Reply With Quote
Old May 30, 17, 5:15 pm
  #163  
 
Join Date: Jan 2017
Programs: BAEC Gold
Posts: 29
Originally Posted by Banana4321 View Post
It's actually very risky executing a DR plan. You may leave yourself with nothing if it doesn't work. So....why test it if you'll probably never need it anyway?

And if you do end up needing it, you can fix the problems at that time.
Disasters are inevitable, I've personally had to work on infrastructure hit by floods, lightning strikes, DDOS... Some environments were well maintained and failed over cleanly and some were not. Regular fail-over testing may be a pain but it saves a lot of finger pointing and RFO paperwork in the long run.
onylon is offline  
Reply With Quote
Old May 30, 17, 5:51 pm
  #164  
 
Join Date: Sep 2014
Location: Melbourne, Australia
Posts: 419
Originally Posted by Banana4321 View Post
It's actually very risky executing a DR plan. You may leave yourself with nothing if it doesn't work. So....why test it if you'll probably never need it anyway?

And if you do end up needing it, you can fix the problems at that time.
You do realise that a DR plan involves more than hitting the off switch and crossing your fingers don't you?
Guvner067 is offline  
Reply With Quote
Old May 30, 17, 5:58 pm
  #165  
 
Join Date: Jan 2000
Location: SoCal to the rest of the world...
Programs: AA EXP with lots of BA and CX. (Disgruntled UA Lifetime Plat) - No hotel loyalty anymore
Posts: 6,343
Originally Posted by Plato90s View Post
Despite the mockery of some comments on TheRegister, reboots actually do spike the power usage of servers and there are often component failures that only show themselves upon initialization. Many's the time when a working server hung upon restart.Having worked in the financial industry, I assure you that the failover testing of financial institutions is on the level of "good enough for the auditors".
Those reboot failures are really due to that platform having hit it's MTBF - Running the same servers for 10+ years is ASKING for trouble. Even IBM midrange, mainframe, DEC mainframe's, etc had a service regime to replace components before MTBF failures. This included logic boards, power supply, I/O controllers, etc, etc. Buying an off the shelf Dell server and crossing your fingers you can run them 24x7 for 10 years should ensure that decision maker gets a PhD in Stupidity. 5 Years is industry max for servers. Your process for running the datacenter should be so virtualized that new HW can come in and replace existing running platforms MID cycle with no or limited downtime.

Even AWS expects a 1-3 year lifecycle. You should be building ANY platform to have compute nodes that can come online and replace others for processing - e.g. an elastic cloud so if you have to take resources down you can bring others up elsewhere to replace them. Basically you plan for never going down
NickP 1K is offline  
Reply With Quote

Thread Tools
Search this Thread