Originally Posted by
invisible
Physics is a problem here. For sync replication RTT should not exceed 30ms with all processing time included. Limits to 40 miles radius, requires DWDM and (very) expensive SFPs and cables.
Yes and no. 30ms is a written down number for a use case or normal operations. Thinking of DR sites is a completely different thing.
You need online operations - and a backlog (or none of it) for DR restore.
Telling that a fire in/at a generator is killing everything is just heinious to say the least.
I am not an armchair expert here and I've seen just enough "IT only" companies going out of business because "CIO" thinks that freak accidents doesnt happen.
While nowadays the CIO league is all looking into cloudy and "continous delivery" they just forget about "continious operations" and even more not on "disrupted operations".
Previously was talked about AWS.. all fine, but only if you use it with a clue.
We currently have one operations where it begins at some miliseconds and ends up in max. 2 hours to restore "service" and a max. of 8 hours to catch up data (as available..) -- the max. is to boot-provision a complete ops within Rackspace or AWS (yah, you never know, right?). And that's for a company that is in the 10-20mln revenue range.
Mainframe? Yep, doesnt go AWS - yet very basic principles even apply there, esp. on risk with physical incidents like flood, power-loss, fire. Any S/390 can be sourced from so many levels and keep up that "1960s COBOL" VM across at least that distance to contain fire. My former company helped with bringing Linux on S/390 and yes, going full IPL is something you really, really want to avoid. (IPL comes from power loss or main kernel going nuts by errornous coding).
Being totally OT already: the joke from Mainframe to Cloud goes as this:
"What's the difference?"
"Back then the bill came from IBM, now it's from Amazon"