FlyerTalk Forums - View Single Post

Aug 9, 2016 | 7:34 am

#526

scongro

Join Date: Sep 2007

Location: JAX

Programs: Delta DM, SPG Gold, Marriott Gold

Posts: 320

Quote:

Originally Posted by appleguru

I don't think anyone is saying it is trivial... But I also don't think it needs to be that hard either. I see a few possible solutions to this problem (I am by no means an expert here):

1) DB failover: redundant DBs that automatically replicate and self-elect a master. Requires 3 or more members so you can have a quorum during master election. During normal operations, all reads and writes happen to an appointed master instance. All instances send "are you still online?" Heartbeat messages to each other to verify status, and slaves replicate data from the master as soon as it is written. If the master ever goes offline, the remaining members vote and elect a new master, automatically resuming operations.

2) blockchain-like distributed ledger. Biggest problem I see with this I current implementations is significant time delay while transactions are verified (proof of work currently takes lots of time, by design). But on the upside, even "untusted" parties can help contribute. Doesn't make a lot of sense for an airline.

3) fully redundant DBs: similar to a raid disk array. Writes happen across multiple DBs in parallel. Writes are verified for congurancy across all mirrors when a read occurs. If a mirror drops, the set is degraded, but still remains functional until the mirror comes back online. Biggest downside to this approach is the additional time needed to verify reads, which may not scale fantastically. But then again, we're talking milliseconds... If you have to wait a few extra ms to confirm that seat/ticket/weight balance call/whatever, it's probally not the end of he world.

All of that is great until you look at time to implement. What you're talking about could take many years to get to production, based on the legacy systems. I've had the advantage in my professional career to work mostly in the ecommerce world - we aren't dealing with any 15+ year old systems because the system/need didn't exist 15 years ago.

I think it's more than fair to feel like whatever happened is unacceptable (and it is very unacceptable in this day and age), yet at the same time understand the position they are in. The bottom line is whatever happened is preventable, given time and money. Every decision to build redundancy in their systems isn't free, and they have to weigh the cost vs. benefit. Obviously a day after a major incident, the natural inclination is to spend every dime of profit to make sure this never happens again, but I guarantee you that setiment will change in some people's minds on Virginia Ave in three years, assuming they don't have another major, public incident. It's natural and is in many cases, human nature. I've been turned down for money multiple times to make systems more redundant and it was really frustrating. In some cases, the exact component I want to harden failed, and in some cases, the component I wanted to harden never failed, and there were a couple times where something I didn't think of failed. Bottom line it was my responsibity to ensure uptime.

In terms of why they haven't issued a statement, what would you like them to say, and how would it change anything? I think as a customer, it's quite clear this isn't weather, terrorism, ATC, or other act of god. Further, as a customer, it's not my job to care exactly what happened. Are they really going to promise it will never happen again? They can't guarantee that.

I have my opinions working in the industry I do, but the main reason I've been on this thread since yesterday morning is because I'm flying tomorrow and I'm hoping I get where I need to get (DFW) some semblance of on time.

Reply