Last edit by: WineCountryUA
Technology Issue
Systemwide
Original travel date(s): January 22 - 23, 2017
Flight changes: The change fee and any difference in fare will be waived for new flights departing on or before January 25, 2017, as long as travel is rescheduled in the same cabin (any fare class) and between the same cities as originally ticketed.
Systemwide
Original travel date(s): January 22 - 23, 2017
Flight changes: The change fee and any difference in fare will be waived for new flights departing on or before January 25, 2017, as long as travel is rescheduled in the same cabin (any fare class) and between the same cities as originally ticketed.
UA mainline domestic flights grounded -- 22 Jan 2017 IT issue [STOP LIFTED]
#106
FlyerTalk Evangelist

Join Date: Jul 2008
Location: IAH
Programs: DL DM, AC 50K, Hyatt Ist-iest, Starriot Platinum, Hilton Diamond
Posts: 12,507
I was in air when this happened, so when we landed at IAH there were no available gates per the captain as they were filled with planes that couldn't take off. Finally pulled up to Terminal C over an hour after landing. 
I guess I should be happy we were in the air before the system went down otherwise I'd still be on a plane and not at home with a drink.

I guess I should be happy we were in the air before the system went down otherwise I'd still be on a plane and not at home with a drink.
#107
Moderator: Budget Travel forum & Credit Card Programs, FlyerTalk Evangelist
Join Date: Aug 2002
Location: YYJ/YVR and back on Van Isle ....... for now
Programs: UA lifetime MM / *A Gold
Posts: 14,281
#108
Moderator: United Airlines; FlyerTalk Evangelist
Join Date: Jun 2007
Location: SFO
Programs: UA Plat 1.9MM, Hyatt Discoverist, Marriott Plat/LT Gold, Hilton Silver, IHG Plat
Posts: 64,491
CNN - United Airlines resumes flights after temporary ground order
The sources said the flights were grounded due to a problem with the communication system that airplanes use to send information to United operations. Aircraft Communications Addressing and Reporting System, or ACARS, is used to record and transmit a range of information, including departure times, as well as weight and balance, which is used to calculate takeoff speeds.
#110
Join Date: Jan 2008
Programs: UA 1K
Posts: 246
Having a backup system means there's two things to go wrong.
That seems kind of glib, but it's not really. What would the backup system actually look like? What would be the conditions to trigger using it rather than the primary system? Would it use the same data? If so, how would you prevent data errors causing the same problems that might cause the primary system to stop working?
Designing complex IT solutions to keep working when there's a failure of an important system is a very hard problem. I'd be somewhat astonished if United didn't have a backup - I'd also be pretty surprised if United didn't end up failing over to the backup system moderately frequently without us knowing, and it wouldn't surprise me if this case were due to a failure of both the primary and backup systems.
United's certainly big enough that they should have sufficient IT to consider this kind of failure and plan for it, but the reality is that computer systems are very complicated and almost nobody is good at making sure they work all the time. Google's entire business is based on having working computers, but even they've had outages.
Is this bad? Yes. Lessons should be learned and I'd hope this specific problem (and similar problems!) never reoccur. But it's very hard to draw general conclusions about IT competence from occurrences like this.
That seems kind of glib, but it's not really. What would the backup system actually look like? What would be the conditions to trigger using it rather than the primary system? Would it use the same data? If so, how would you prevent data errors causing the same problems that might cause the primary system to stop working?
Designing complex IT solutions to keep working when there's a failure of an important system is a very hard problem. I'd be somewhat astonished if United didn't have a backup - I'd also be pretty surprised if United didn't end up failing over to the backup system moderately frequently without us knowing, and it wouldn't surprise me if this case were due to a failure of both the primary and backup systems.
United's certainly big enough that they should have sufficient IT to consider this kind of failure and plan for it, but the reality is that computer systems are very complicated and almost nobody is good at making sure they work all the time. Google's entire business is based on having working computers, but even they've had outages.
Is this bad? Yes. Lessons should be learned and I'd hope this specific problem (and similar problems!) never reoccur. But it's very hard to draw general conclusions about IT competence from occurrences like this.
#115
Join Date: Jan 2010
Location: Philadelphia, PA
Programs: United MP 1K, SPG Gold
Posts: 185
"Flight changes: The change fee and any difference in fare will be waived for new flights departing on or before January 25, 2017, as long as travel is rescheduled in the same cabin (any fare class) and between the same cities as originally ticketed."
Not giving an unconditional full refund? What if someone's travel plans got totally hosed and they don't need to travel anymore? Just going to get a credit then?
Not giving an unconditional full refund? What if someone's travel plans got totally hosed and they don't need to travel anymore? Just going to get a credit then?
#116
Join Date: Jan 2016
Location: SFO
Programs: AA PP, OZ *G, Hyatt Expl, Marriott Gold, Lots of Ex-statuses
Posts: 299
Having a backup system means there's two things to go wrong.
That seems kind of glib, but it's not really. What would the backup system actually look like? What would be the conditions to trigger using it rather than the primary system? Would it use the same data? If so, how would you prevent data errors causing the same problems that might cause the primary system to stop working?
Designing complex IT solutions to keep working when there's a failure of an important system is a very hard problem. I'd be somewhat astonished if United didn't have a backup - I'd also be pretty surprised if United didn't end up failing over to the backup system moderately frequently without us knowing, and it wouldn't surprise me if this case were due to a failure of both the primary and backup systems.
United's certainly big enough that they should have sufficient IT to consider this kind of failure and plan for it, but the reality is that computer systems are very complicated and almost nobody is good at making sure they work all the time. Google's entire business is based on having working computers, but even they've had outages.
Is this bad? Yes. Lessons should be learned and I'd hope this specific problem (and similar problems!) never reoccur. But it's very hard to draw general conclusions about IT competence from occurrences like this.
That seems kind of glib, but it's not really. What would the backup system actually look like? What would be the conditions to trigger using it rather than the primary system? Would it use the same data? If so, how would you prevent data errors causing the same problems that might cause the primary system to stop working?
Designing complex IT solutions to keep working when there's a failure of an important system is a very hard problem. I'd be somewhat astonished if United didn't have a backup - I'd also be pretty surprised if United didn't end up failing over to the backup system moderately frequently without us knowing, and it wouldn't surprise me if this case were due to a failure of both the primary and backup systems.
United's certainly big enough that they should have sufficient IT to consider this kind of failure and plan for it, but the reality is that computer systems are very complicated and almost nobody is good at making sure they work all the time. Google's entire business is based on having working computers, but even they've had outages.
Is this bad? Yes. Lessons should be learned and I'd hope this specific problem (and similar problems!) never reoccur. But it's very hard to draw general conclusions about IT competence from occurrences like this.
Google guarantees 99.9% uptime for their cloud services [1]. That means that in one year their services will be down for no more than 9 hours [2]. Now they can break their guarantee and in that case you get service credit, but in 2014 Amazon services had 2.5 hours of downtime and Google services had 4.5 hours of downtime for the whole year.
It is totally possible for United to dramatically reduce their issues if they invest in the IT. The problem is that the cost of dealing with outages is probably less than the cost of upgrading all their infrastructure, so they don't do anything. There's also the issue that SHARES/GDS is pretty old infrastructure that can't just be swapped out, but they can certainly improve a lot internally.
[1] https://support.google.com/work/answer/6056635?hl=en
[2] https://uptime.is/99.9
[3] http://www.networkworld.com/article/...last-year.html
#117
FlyerTalk Evangelist
Join Date: May 2006
Location: DTW, but drive to/from YYZ/ORD
Programs: Chase Ultimate Rewards 2MM, Diner Club points
Posts: 27,333
It's nice that large airports in the US don't routinely use remote bus gates, but wouldn't they be easily available for situations like this?
#118
FlyerTalk Evangelist
Join Date: Dec 2006
Location: Pacific Northwest
Programs: UA 1MM, AS MVPG, Bonvoyed Gold, Honors Dia, Hyatt Explorer, IHG Plat, ...
Posts: 14,269
Google guarantees 99.9% uptime for their cloud services [1]. That means that in one year their services will be down for no more than 9 hours [2]. Now they can break their guarantee and in that case you get service credit, but in 2014 Amazon services had 2.5 hours of downtime and Google services had 4.5 hours of downtime for the whole year.
http://www.zdnet.com/article/amazon-...uffers-outage/
How will this be counted against the uptime of the entire cloud? These services have many separate services, and it's not as simple as saying "Amazon Cloud is up" or "Amazon cloud is down".
(tech employee, here, too)
#119
Join Date: Feb 2005
Location: CLE, DCA, and 30k feet
Programs: Honors LT Diamond; United 1K; Hertz PC
Posts: 3,891
random google shows a local outage for Sydney AWS last year.
http://www.zdnet.com/article/amazon-...uffers-outage/
How will this be counted against the uptime of the entire cloud? These services have many separate services, and it's not as simple as saying "Amazon Cloud is up" or "Amazon cloud is down".
(tech employee, here, too)
http://www.zdnet.com/article/amazon-...uffers-outage/
How will this be counted against the uptime of the entire cloud? These services have many separate services, and it's not as simple as saying "Amazon Cloud is up" or "Amazon cloud is down".
(tech employee, here, too)
If this was an ACARS issue as has been reported upthread how much of that infrastructure is in UA's direct control vs. ARINC/SITA [though if an ACARS issue I wonder why the effects were concentrated on UA mainline and didn't seem to spread to UAX/AA*/DL*/B6/F9/etc.
Ultimately it sounds like there was a relatively small failure that was magnified by lack of communication/expectation management/resources on the front line -- i.e. available gates/staff to move aircraft off gates for inbound aircraft to deplane, etc.
#120
Join Date: Aug 2010
Location: KEWR
Programs: Marriott Platinum
Posts: 755
Liability exposure is the main reason remote gates aren't prevalent in the US.