Originally Posted by
RustyC
It's a bit unsettling to know there are vulnerabilities that could lead to the entire system basically shutting down like that, as you have to worry not only about accidents but also malice. Isn't the idea with redundancy to have data stored in several places geographically so that something like a fire or power outage in one place can't take the whole thing down? I dunno how realistic a proposition that is for an airline, but you hear about it for large Web-based businesses. Bastian should have some 'splainin' to do to investors over why there's no back-up plan.
Having multiple copies of the data replicated to multiple data centers isn't the problem. It's very easy to maintain the integrity of the data with replication, the problem is leveraging the data to make the system available.
If data center A burns to the ground it's an easy call to declare an disaster and begin recovering at data center B. If data center A is still there, there is just an issue with the ATS / switch gear, then it becomes a tougher decision. How long will it take to change IPs, DNS, start the application up at the DR site, confirm data consistency, test the application and make it available? Is that longer that it will take to just get the broken DC running?
Applications that just keep running in the event of a data center going offline are normally what would be considered cloud native, they create availability at the application layer not at the infrastructure layer. That works great for many types of business, but the types of transactions done by airlines and banking don't work well with that type of application architecture. So you need to build large monolithic applications due to the nature and volume of the business transaction. These applications can't create availability at the application layer, it must be provided by a very redundant infrastructure layer.