Originally Posted by
ethernal
Endpoint detection and response (EDR) platforms work a bit differently than most other solutions. This wasn't a traditional "software update" as in, the underlying client is being updated with new code or features. Instead, the issue is what is called a "content update". Basically, it's information that tells the EDR solution what to look for. It could be a specific malware pattern or it could be new behaviors that could represent an external attack or an internal insider threat.
These content updates are often generated more or less automatically based on wild detections of malware. A potential malware file is loaded into an isolated sandbox managed by Crowdstrike (or other EDR provider) and "detonated" and monitored automatically to understand what happens. If it starts trying to contact known command and control nodes or does clearly malicious behavior, this malware pattern is then added to a content update automatically and distributed out to the millions of endpoints within minutes.
This is an oversimplification (and there are supposed to be some checks to make sure things like this incident don't happen), but this rapid content updates is a selling point for these solutions. Within 5-10 minutes of detection of malware on any computer anywhere in the world (at least with a Falcon client on it), every computer that has Crowdstrike (Falcon) is "inoculated" from that attack. Given how quickly cyber threats move, this rapid update is actually pretty important to cybersecurity protection.
Delaying the content updates and doing testing on them is creating a separate risk (one of breach) - and those can be even worse than an endpoint bricking itself.
That isn't to say this is an unforeseeable or unmitigable problem. I posted earlier about a client I have that explicitly runs two separate EDR tech stacks running in different datacenters to mitigate this sort of risk. Other providers do have "staggered update" approaches where you can tag highest risk devices (such as end user PCs or more externally exposed devices) as a first tier update with a brief (<1 hour) delay to full deployment. But, again, the flip side to that approach is "I may be vulnerable" - and once malware is within your internal network, it can spread VERY fast if exploiting a zero-day (previously unknown) defect in the OS or other software, especially if you don't have mitigating protections like macro- and micro-segmentation of networks with traffic inspection between segments.
So, I don't consider it particularly negligent. It is still not "industry standard" in most places (outside of ultra-high uptime environments - think stock exchanges, root DNS servers, etc) to have segregated EDR stacks or staggered content updates. And even those mostly use *nix operating systems anyways which were not impacted by this content update. Perhaps it will be after this incident, though.
Thank you, I certainly appreciate your detailed response. I am a software engineer and have some understanding of how this works. But any code that bricks computers and servers at the kernel level should ever, ever be rolled out without IT oversight,. No computer virus in history has doe as much damage as this "anti viral".