How Connexion Works (in a badly written nutshell)
#1
Original Poster
Join Date: Jul 2005
Location: BOS
Programs: CO Silver; DL FO; SPG Gold; HH Gold
Posts: 880
How Connexion Works (in a badly written nutshell)
So I was replying to a post about Connexion outages in a Skype-over-Connexion thread and ended up writing out this long-winded explanation about the technical details of your data getting from the plane to the ground...
This covers those 2 - 4 minute dead spots you see on a Connexion flight, with details specific to a LAX-ICN flight on KE018 on May 9:
So as far as the short dead spots... Connexion routes data from the plane in a rather hackish fashion. I'll try to explain this in somewhat-simple terms: each plane in flight is assigned a /24 (a block of 256 IP addresses, which is the smallest size you can advertise to the world; this means letting the rest of the Internet know how to get to you.) The plane then advertises the /24 via whichever satellite/ground station it's connected to via BGP (Border Gateway Protocol, the protocol used to let routers on the Internet know where each block of IP addresses goes.) When you change satellites, and therefore change your BGP advertisement, it can take a couple minutes for the change to propagate across the Internet.
I did some traceroutes back to the plane from a server elsewhere after the three short dead spots, and was very intrigued to see the specifics of what I suspected. When we left LAX, the plane was communicating via a ground station near Vancouver. We stayed on that one until somewhere over eastern Asia (even on both sides of the 2 hour gap over northeastern Russia), where we switched over to a ground station in Japan (BGP change #1.) Then we switched to one in Moscow for a little bit as we went a little further west (BGP change #2.) Didn't stay on that one for long though; about 30 minutes later we were back to the Japan routing (BGP change #3) which lasted for the rest of the flight. So that explains what happens when you see a short drop in Connexion service but with stable connectivity other than that brief period.
When you change advertisements in this fashion, you create a phenomenon known as "flapping" as routers across the Internet see the routing topology for that advertisement change. Sometimes a single router will see the single change come in several times (the reasons behind that are probably beyond the scope of this forum.) Anyways, there's also a technique know as "dampening". This happens when a router sees a particular advertisement change too much too quickly, and the router dampens that advertisement for some time (longer the more flapping there is.) Most networks don't have dampening enabled, but for those that do, the BGP change can make those networks unreachable from the plane for a few minutes until the dampening times out.
So anyways... Connexion's handling of plane traffic is really just a big hack. The latency via the geosynchronous satellites they use is somewhere around 600 - 700ms roundtrip anyways. Internet traffic on the ground from, say, NYC to LAX is around 55ms roundtrip best case, and NYC to LON 70ms or so. So for most of these cases it wouldn't add appreciable latency to switch satellites but keep the IP traffic routed via a more central location. (Thus not using a /24 for each plane, and also not flapping routes just to track a moving plane.)
Oh and after I wrote all of that I googled and found an explanation from one of the folks over at Renesys with graphics and all. Differring opinions on the use of BGP for this type of task; I do agree minimizing latency is important though, and there really isn't a GOOD way to do it past what they're doing with the rest of the setup as-is. Maybe we'll eventually see Internet on planes via non-geosynchronous satellites or via radio to ground stations which will make it actually fast (and not just moderately high bandwidth.) Amusingly enough I was around for the NANOG that the further linked Boeing presentation occured at; though I wasn't actually attending the sessions.
This covers those 2 - 4 minute dead spots you see on a Connexion flight, with details specific to a LAX-ICN flight on KE018 on May 9:
So as far as the short dead spots... Connexion routes data from the plane in a rather hackish fashion. I'll try to explain this in somewhat-simple terms: each plane in flight is assigned a /24 (a block of 256 IP addresses, which is the smallest size you can advertise to the world; this means letting the rest of the Internet know how to get to you.) The plane then advertises the /24 via whichever satellite/ground station it's connected to via BGP (Border Gateway Protocol, the protocol used to let routers on the Internet know where each block of IP addresses goes.) When you change satellites, and therefore change your BGP advertisement, it can take a couple minutes for the change to propagate across the Internet.
I did some traceroutes back to the plane from a server elsewhere after the three short dead spots, and was very intrigued to see the specifics of what I suspected. When we left LAX, the plane was communicating via a ground station near Vancouver. We stayed on that one until somewhere over eastern Asia (even on both sides of the 2 hour gap over northeastern Russia), where we switched over to a ground station in Japan (BGP change #1.) Then we switched to one in Moscow for a little bit as we went a little further west (BGP change #2.) Didn't stay on that one for long though; about 30 minutes later we were back to the Japan routing (BGP change #3) which lasted for the rest of the flight. So that explains what happens when you see a short drop in Connexion service but with stable connectivity other than that brief period.
When you change advertisements in this fashion, you create a phenomenon known as "flapping" as routers across the Internet see the routing topology for that advertisement change. Sometimes a single router will see the single change come in several times (the reasons behind that are probably beyond the scope of this forum.) Anyways, there's also a technique know as "dampening". This happens when a router sees a particular advertisement change too much too quickly, and the router dampens that advertisement for some time (longer the more flapping there is.) Most networks don't have dampening enabled, but for those that do, the BGP change can make those networks unreachable from the plane for a few minutes until the dampening times out.
So anyways... Connexion's handling of plane traffic is really just a big hack. The latency via the geosynchronous satellites they use is somewhere around 600 - 700ms roundtrip anyways. Internet traffic on the ground from, say, NYC to LAX is around 55ms roundtrip best case, and NYC to LON 70ms or so. So for most of these cases it wouldn't add appreciable latency to switch satellites but keep the IP traffic routed via a more central location. (Thus not using a /24 for each plane, and also not flapping routes just to track a moving plane.)
Oh and after I wrote all of that I googled and found an explanation from one of the folks over at Renesys with graphics and all. Differring opinions on the use of BGP for this type of task; I do agree minimizing latency is important though, and there really isn't a GOOD way to do it past what they're doing with the rest of the setup as-is. Maybe we'll eventually see Internet on planes via non-geosynchronous satellites or via radio to ground stations which will make it actually fast (and not just moderately high bandwidth.) Amusingly enough I was around for the NANOG that the further linked Boeing presentation occured at; though I wasn't actually attending the sessions.
#3
Join Date: Jul 2000
Location: Commuting around the mid-atlantic and rust-belt on any number of RJs
Programs: TSA Random Selectee Platinum, * Gold, SPG/HH/MR mid-tier, and a tiny bag of pretzels.
Posts: 9,255
I boldly predict that when/if enough carriers and planes offer this, Boeing is gonna have to switch to a routing methodology that does not suck, since half the world (or the intelligent world, anyway) is going to dampen these prefixes into the stone age if they are flapping that much.
I'm also curious as to the "most networks don't have dampening enabled" comment--in my past work for an NSP, I found that most people did (and we certainly encouraged it on the part of our multihomed downstream customers). Or, to put it another way--I did not attend the NANOG where this was presentedu but I'd be shocked if the presenter was not raked over the coals, as Boeing basically admitted that they are going to have /24s coming from inconsistant ASNs constantly and that they had to pin a /19 annoucement up to keep the planes visible to the "dark corners of the internet." Not to mention blowing /24s not in the swamp out all over the place.
The correct way to do this is to pin some much shorter prefix to a global IP provider (or single ASN, if you will) who can provide decent connectivity to all of the ground stations. They are basically admitting to doing something which a great deal of the BGP speaking world would otherwise frown upon.
And the latency to a geostationary orbit bird is only about 240-280ms up and down. Everything else is either internal to the plane, or a function of the terrestrial connections.
A solid to Boeing--they are intentionally doing something poorly, probably to reduce their own costs (in terms of finding or building a single ASN that has decent connectivity at all the ground stations).
I'm also curious as to the "most networks don't have dampening enabled" comment--in my past work for an NSP, I found that most people did (and we certainly encouraged it on the part of our multihomed downstream customers). Or, to put it another way--I did not attend the NANOG where this was presentedu but I'd be shocked if the presenter was not raked over the coals, as Boeing basically admitted that they are going to have /24s coming from inconsistant ASNs constantly and that they had to pin a /19 annoucement up to keep the planes visible to the "dark corners of the internet." Not to mention blowing /24s not in the swamp out all over the place.
The correct way to do this is to pin some much shorter prefix to a global IP provider (or single ASN, if you will) who can provide decent connectivity to all of the ground stations. They are basically admitting to doing something which a great deal of the BGP speaking world would otherwise frown upon.
And the latency to a geostationary orbit bird is only about 240-280ms up and down. Everything else is either internal to the plane, or a function of the terrestrial connections.
A solid to Boeing--they are intentionally doing something poorly, probably to reduce their own costs (in terms of finding or building a single ASN that has decent connectivity at all the ground stations).
#4
Original Poster
Join Date: Jul 2005
Location: BOS
Programs: CO Silver; DL FO; SPG Gold; HH Gold
Posts: 880
Originally Posted by ClueByFour
I boldly predict that when/if enough carriers and planes offer this, Boeing is gonna have to switch to a routing methodology that does not suck, since half the world (or the intelligent world, anyway) is going to dampen these prefixes into the stone age if they are flapping that much.
Originally Posted by ClueByFour
I'm also curious as to the "most networks don't have dampening enabled" comment--in my past work for an NSP, I found that most people did (and we certainly encouraged it on the part of our multihomed downstream customers). Or, to put it another way--I did not attend the NANOG where this was presentedu but I'd be shocked if the presenter was not raked over the coals, as Boeing basically admitted that they are going to have /24s coming from inconsistant ASNs constantly and that they had to pin a /19 annoucement up to keep the planes visible to the "dark corners of the internet." Not to mention blowing /24s not in the swamp out all over the place.
Originally Posted by ClueByFour
The correct way to do this is to pin some much shorter prefix to a global IP provider (or single ASN, if you will) who can provide decent connectivity to all of the ground stations. They are basically admitting to doing something which a great deal of the BGP speaking world would otherwise frown upon.
Originally Posted by ClueByFour
And the latency to a geostationary orbit bird is only about 240-280ms up and down. Everything else is either internal to the plane, or a function of the terrestrial connections.
Originally Posted by ClueByFour
A solid to Boeing--they are intentionally doing something poorly, probably to reduce their own costs (in terms of finding or building a single ASN that has decent connectivity at all the ground stations).
But in short--you're right, they're violating multiple unwritten common sense BGP rules, but on a small and reasonably tolerable (for now) scale. There are enough networks out there advertising pointless deaggregates, not filtering customers properly, and so on that are even worse offenders (and don't have a cool application and at least some justification to back it up.)
Last edited by karthik; Jun 10, 2006 at 1:10 pm
#5
Suspended
Join Date: Jul 2001
Location: Watchlisted by the prejudiced, en route to purgatory
Programs: Just Say No to Fleecing and Blacklisting
Posts: 102,095
Interesting thread.
[I don't know if the following has any bearing or not on this, but apparently Boeing has (or at least had) two types of internet connectivity systems on board the Orange Popsicle -- which they were using for demonstrations -- and one of the systems was (and maybe still is ???) giving them quite a bit of trouble (at least in the third quarter of last year). That said, the one that gave them more trouble, IIRC, had to do with what was not then being used/installed on commercially scheduled pax airlines. ]
Also, with which countries was it that Boeing was having an issue (i.e., being told it's a no-go over their airspace) in relation to Connexion? And would that explain any of the dropping?
[I don't know if the following has any bearing or not on this, but apparently Boeing has (or at least had) two types of internet connectivity systems on board the Orange Popsicle -- which they were using for demonstrations -- and one of the systems was (and maybe still is ???) giving them quite a bit of trouble (at least in the third quarter of last year). That said, the one that gave them more trouble, IIRC, had to do with what was not then being used/installed on commercially scheduled pax airlines. ]
Also, with which countries was it that Boeing was having an issue (i.e., being told it's a no-go over their airspace) in relation to Connexion? And would that explain any of the dropping?
#6
Join Date: Jul 2000
Location: Commuting around the mid-atlantic and rust-belt on any number of RJs
Programs: TSA Random Selectee Platinum, * Gold, SPG/HH/MR mid-tier, and a tiny bag of pretzels.
Posts: 9,255
Originally Posted by karthik
But in short--you're right, they're violating multiple unwritten common sense BGP rules, but on a small and reasonably tolerable (for now) scale. There are enough networks out there advertising pointless deaggregates, not filtering customers properly, and so on that are even worse offenders (and don't have a cool application and at least some justification to back it up.)
The other thing is this: if they did this from within a single-AS, they could get around using a /24 for each plane. Burning a globally routable /24 for each aircraft is, (in the IPv4 world) a horrid waste IMHO (I say this as somebody who is constantly juggling to keep an organization with two /16s under control). If you properly aggregate everything into "some" short-ish prefix, you can assign a plane-appropriate sized prefix to each aircraft (a 737 does not need a /24, whereas a 777, 340, or 744 might in theory need something larger someday).
#7
Original Poster
Join Date: Jul 2005
Location: BOS
Programs: CO Silver; DL FO; SPG Gold; HH Gold
Posts: 880
Originally Posted by ClueByFour
The other thing is this: if they did this from within a single-AS, they could get around using a /24 for each plane. Burning a globally routable /24 for each aircraft is, (in the IPv4 world) a horrid waste IMHO (I say this as somebody who is constantly juggling to keep an organization with two /16s under control). If you properly aggregate everything into "some" short-ish prefix, you can assign a plane-appropriate sized prefix to each aircraft (a 737 does not need a /24, whereas a 777, 340, or 744 might in theory need something larger someday).
"Netblock: 10.0.0.0/19
Usage: /24s allocated to individual planes. 1 IP for general use. 253 IPs reserved for...uh...umm...modem dialup pool on plane for legacy users to connect older laptops to via seatback RJ11 jacks"
I'd think any size plane should be the same. One IP to NAT all the passengers, then figure a few for internal use, so maybe a /28 or /29 onboard plus the /30 between the ground and the plane. And things like their "live TV" feed and connectivity to onboard servers should probably be in RFC1918 space.
That "live TV" point irks me slightly too. Apparently the downlink to each plane is 20mbit; 5mbit for end users, 5mbit for airline use, 5mbit for the "live TV" (which is pretty lame, 4 channels on your browser), 5mbit "reserved." Come on, at least drop the lame "reserved" space and give us a full 10mbit. Ever heard of QoS? RSVP/MPLS-TE? I'm actually curious how they're segmenting the 20mbit into four 5mbit channels; it doesn't appear to be CAR or other simple shaping of the sort. I saturated the 5mbit user bandwidth abusing lftp's pget while tcpdumping and mtring and didn't notice abnormally high latency from a big full buffer or any significant packet loss. The overall network design is a bit silly; I could see 3 access points from where I was on the upper deck of a 747 (so figure maybe 5 - 6 in the entire plane)... all on the same channel!
#8
Join Date: Feb 2005
Location: MSN
Posts: 701
Originally Posted by ClueByFour
... Burning a globally routable /24 for each aircraft is, (in the IPv4 world) a horrid waste IMHO (I say this as somebody who is constantly juggling to keep an organization with two /16s under control). ...
Serious Question: I'm pretty knowledgeable when it comes to computers. But, I've only recently gotten more in depth into Internet protocol, large network stuff, etc. And I have a question
I know that large cable internet/broadband companies ran into problems a couple of years ago when a lot of IP's were being assigned to people, and under IPv4, the # of ip's worldwide was fixed [my understanding is that the first octet? (whatever the #'s in ###.xxx.etc are) was supposed to be the institution, but I digress] there was a bit of a problem looming-we we going to run out of IP addresses.
But, we were saved because of the invention of NAT/NAT transversal. My understanding is that NAT allows for more ip's inside a network than are available in the connections between that network and the internet.
So, after all of that, here's my question:
Why can't Connexion/etc. use similar tech to give everyone on the plane the IP's without messing up the internet's routers?
#9
Original Poster
Join Date: Jul 2005
Location: BOS
Programs: CO Silver; DL FO; SPG Gold; HH Gold
Posts: 880
Originally Posted by dizzy
One more reason to force ourselves to fully convert to IPv6
Originally Posted by dizzy
I know that large cable internet/broadband companies ran into problems a couple of years ago when a lot of IP's were being assigned to people, and under IPv4, the # of ip's worldwide was fixed [my understanding is that the first octet? (whatever the #'s in ###.xxx.etc are) was supposed to be the institution, but I digress] there was a bit of a problem looming-we we going to run out of IP addresses.
These days addresses are much more sanely assigned; you need to fill in justification forms for the amount of IP space you request and such. There was never any shortage of IPs or any problems with anyone getting the amount of space they needed. There is plenty of empty space still left; we're 5+ years before having to worry about getting low on space with IPv4. This is just a random estimate, but I'd like to say we can easily double the number of computers on the Internet before perhaps running low on IP space. Here's some data on current status of how much IP space is assigned, allocated to registries for assignment, and actually advertised and in use.
Originally Posted by dizzy
But, we were saved because of the invention of NAT/NAT transversal. My understanding is that NAT allows for more ip's inside a network than are available in the connections between that network and the internet.
Originally Posted by dizzy
Why can't Connexion/etc. use similar tech to give everyone on the plane the IP's without messing up the internet's routers?
The problem is that the plane still needs a public IP address in order to be able to route traffic through the closest ground station. Applying NAT to the actual plane itself does nothing (you'd have to route traffic through a "home" location; and you can accomplish this with a few public IPs anyways.)
#10
Join Date: Jul 2000
Location: Commuting around the mid-atlantic and rust-belt on any number of RJs
Programs: TSA Random Selectee Platinum, * Gold, SPG/HH/MR mid-tier, and a tiny bag of pretzels.
Posts: 9,255
There was something done in the late 1990s which did help (indirectly) to help keep the ipv4 space from becoming full: CIDR (Classless Inter-Domain Routing).
Used to be that routers only understood 4 "classfull" subnet masks: 255.0.0.0, 255.255.0.0, 255.255.255.0, and 255.255.255.255 (eg, /8,/16/24,/32). Each location that needed IPs basically had to have at least 256 (/24), many needed the /16 (65k IPs and change) when they might only need 500 addresses (or, for that matter, 270 IPs).
CIDR allows routing of any size subnet (or, to be more correct, any sized subnet mask).
The other thing is that many providers did not, and some still do not, recognize annoucements of anything "smaller" (or "longer" in terms of subnet masking) than a /19 (32 Class C blocks, or approximately 8192 addresses). This helps to reduce the size of the internet "full" routing table that ISPs and people with multiple connections need to carry.
NAT has helped, but it took a few years to make most common services work with/thru it. NAT is much more useful (as mentioned above) because it offers a layer of "poor man's security."
Used to be that routers only understood 4 "classfull" subnet masks: 255.0.0.0, 255.255.0.0, 255.255.255.0, and 255.255.255.255 (eg, /8,/16/24,/32). Each location that needed IPs basically had to have at least 256 (/24), many needed the /16 (65k IPs and change) when they might only need 500 addresses (or, for that matter, 270 IPs).
CIDR allows routing of any size subnet (or, to be more correct, any sized subnet mask).
The other thing is that many providers did not, and some still do not, recognize annoucements of anything "smaller" (or "longer" in terms of subnet masking) than a /19 (32 Class C blocks, or approximately 8192 addresses). This helps to reduce the size of the internet "full" routing table that ISPs and people with multiple connections need to carry.
NAT has helped, but it took a few years to make most common services work with/thru it. NAT is much more useful (as mentioned above) because it offers a layer of "poor man's security."
#11
Original Poster
Join Date: Jul 2005
Location: BOS
Programs: CO Silver; DL FO; SPG Gold; HH Gold
Posts: 880
Originally Posted by ClueByFour
There was something done in the late 1990s which did help (indirectly) to help keep the ipv4 space from becoming full: CIDR (Classless Inter-Domain Routing).
CIDR allows routing of any size subnet (or, to be more correct, any sized subnet mask).
The other thing is that many providers did not, and some still do not, recognize annoucements of anything "smaller" (or "longer" in terms of subnet masking) than a /19 (32 Class C blocks, or approximately 8192 addresses). This helps to reduce the size of the internet "full" routing table that ISPs and people with multiple connections need to carry.
CIDR allows routing of any size subnet (or, to be more correct, any sized subnet mask).
The other thing is that many providers did not, and some still do not, recognize annoucements of anything "smaller" (or "longer" in terms of subnet masking) than a /19 (32 Class C blocks, or approximately 8192 addresses). This helps to reduce the size of the internet "full" routing table that ISPs and people with multiple connections need to carry.
Just a quick 'sh ip bgp' on route-views.oregon-ix.net shows a ton of peers showing a handful of routes smaller than 4.0.0.0/8 in that block; a practice Genuity and now Level3 actively disallows and pursues customers doing. If they want to multihome, they're told to get their own space and not do it out of 4-space. If they weren't you'd see far more smaller routes there. By comparison, there are heaps of smaller-than-/20 routes with good global visibility inside 12.0.0.0/8 as AT&T doesn't seem to care if people advertise subnets of that space to other upstreams.
#12
Join Date: Jul 2000
Location: Commuting around the mid-atlantic and rust-belt on any number of RJs
Programs: TSA Random Selectee Platinum, * Gold, SPG/HH/MR mid-tier, and a tiny bag of pretzels.
Posts: 9,255
Originally Posted by karthik
Right; I've been in the industry short enough to just take CIDR for granted. It certainly prevented things from falling apart. Not to quibble too much, but if there is that type of filtering it tends to be to /20... because that's the smallest allocation out of legacy class A space (with an exception for /24 in legacy class C space of course.) Most providers are just filtering to /24 for all routes now though... (Verio for a while was even filtering legacy B space to /16 since that's the smallest allocated out of there.)
ARIN used to (right after CIDR started--dunno what their practice is now because I've not been in the provider game for 5 or 6 years) have "starter" allocations: they would give you a /20, but "reserve" the other half of the /19 for you if you could not justify a /19. In the interest of "fairness" (loose) they'd let you announce the whole /19, but not everybody did. Ergo, some folks loosened the filter to a /20. Lots of people with /20s out of the A space took in in the shorts about 1996 or 1997 (IIRC) because of this. By 1998, you could announce a /20 from anywhere and generally expect it to be globally routed.
Just a quick 'sh ip bgp' on route-views.oregon-ix.net shows a ton of peers showing a handful of routes smaller than 4.0.0.0/8 in that block; a practice Genuity and now Level3 actively disallows and pursues customers doing. If they want to multihome, they're told to get their own space and not do it out of 4-space. If they weren't you'd see far more smaller routes there. By comparison, there are heaps of smaller-than-/20 routes with good global visibility inside 12.0.0.0/8 as AT&T doesn't seem to care if people advertise subnets of that space to other upstreams.
The most valuable thing there is anymore is a /24 out of the swamp. Fortunately, I have one in reserve .
Edited to add: the idea of "micro-allocations" is still enough to provoke a religious war among BGP zealots of various positions on the matter.
Last edited by ClueByFour; Jun 14, 2006 at 10:57 pm