FlyerTalk Forums - View Single Post - Sun 17 July : 'FLY' check-in disruption at various BA airports.
Old Jul 18, 2016 | 5:02 am
  #74  
chrismk
10 Countries Visited
20 Countries Visited
30 Countries Visited
15 Years on Site
 
Join Date: Nov 2010
Location: Milton Keynes
Programs: BA Blue
Posts: 375
Originally Posted by flatlander
I don't have inside knowledge, but based on some years of experience of this sort of thing in IT and the reported problems, my guess is that they have built a set of software components to talk to each other and some of them are not fast enough, or do not operate with the same speed under increasing load (not linearly scalable). This is pretty common in several sorts of modern software systems, where there are non-linear effects that mean it works fine until a certain load or traffic level, then stops working spectacularly.

(Examples are systems which assume all data is in main memory so when data set exceeds this performance falls off a cliff, systems that allocate and free a lot of memory for each work item which puts a lot of stress on the memory allocator and garbage collection (Java is particularly prone to this), databases systems with processing scaling proportional to number of items or number of items squared (O(N) or O(N*N), when what you really want is O(1) or O(log(N)) ), etc).

Building linearly scalable systems is hard, and some modern software abstraction frameworks make it harder. I suspect this project had more new software design people and fewer grumpy old experienced people than it needed.

Experienced scalability engineers are usually somewhat grumpy. A couple of decades of telling people they're going to fail and how they will fail, then having to fix it when it goes how you said, will do that to you.
Great post - thank you.
chrismk is offline