CrowdStrike: When a Software Update Goes Wrong
Jul 23, 2024 Robert Villano Cyber Security 3 min read
On July 19, mayhem visited cybersecurity software maker CrowdStrike and its thousands of customers around the globe impacting 8.5 million Windows devices.
Airlines, energy companies, and healthcare systems were among the businesses most severely impacted, which also forced several federal agencies to close down their local offices and disrupted emergency 911 call services.
What caused this outage?
CrowdStrike Founder and CEO, George Kurtz announced that the outage was caused by a defect found in its Falcon content update for Windows hosts. Mac and Linux hosts were not impacted.
It was not a cyberattack and the CrowdStrike sensor was operating normally. But does that matter if the underlying operating system crashes and issues a Blue Screen of Death (BSOD)? Ah, no worries, you’re still protected -- at least according to the CEO.
The obvious question is, “How did this happen’? The culprit is Channel File 291 (named with a pattern ‘C-00000291-*.sys’) contained a new detection logic to address malicious misuse of named pipes. The file was only served for a short window of one hour between 4 and 5 AM UTC.
Named pipes are an inter-process communication (IPC) mechanism included in Microsoft Windows platforms to provide reliable one-way and two-way data communications among processes on the same computer or among processes on different computers across a network.
To CrowdStrike’s credit, they took full responsibility for the faulty software update and subsequent global chaos, but doesn’t the behemoth in Redmond bare any responsibility? Almost immediately, Microsoft's Chief Communications Officer, Frank X. Shaw, noted on X, a 2009 agreement between the European Commission and Microsoft required Redmond to give security software the same level of access to Windows as Microsoft itself.
Yes, security software vendors can make products that take full advantage of Windows services, until you have a flawed update that gets pushed to millions of Windows devices. Surely, CrowdStrike would have benefited from a stronger SDLC process.
Publicly, they have not acknowledged their exact misstep – save that for congressional hearings with the House Oversite Committee and CISA. But it does call into question, who has oversite for the software supply chain? Is it the underlying operating system OEM or the third-party software provider? Or ideally, both. A collaborative software assurance certification might be helpful.
While these questions are left to be rooted out in Washington, it's clear that the best defense is a strong offense – VIGILANCE.
Any SDLC should have robust authentication mechanisms, vigilant code auditing, and a myriad of customer configurations and environments for testing. Sprinkle in back out procedure, a robust business continuity plan and DR measures.
As a precautionary measure, it is advisable to rollout updates to pre-production environments in a segmented distribution and not ALL at once. As with elections, initial results are important indicators for the success of your software update campaign.
What we can learn from the CrowdStrike outage
For IT leaders, the CrowdStrike outage serves as a reminder to:
-
Develop a Business Continuity Plan
-
Implement Crisis Management Best Practices
-
Build a Vendor Security Assessment Service
-
Optimize IT Change Management
-
Implement Risk-Based Vulnerability Management
-
A mismatch between inputs validated by a Content Validator and those provided to a Content Interpreter,
-
an out-of-bounds read issue in the Content Interpreter, and
-
the absence of a specific test.
UPDATE: As of August 6, 2024
-
A mismatch between inputs validated by a Content Validator and those provided to a Content Interpreter,
-
an out-of-bounds read issue in the Content Interpreter, and
-
the absence of a specific test.
CrowdStrike vows to work with Microsoft on secure and reliable access to the Windows kernel.
The attempt to access the 21st value when expecting on a 20th value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash.
The company says this is now incapable of recurring, it also says that process improvements and mitigation steps are underway and will be deployed to ensure further enhanced resilience.
Sourcepass specializes in each of these areas and can help your company develop each. Speak with an IT specialist today to learn more.
Robert Villano is a Cyber Manager at Sourcepass. To learn more, reach out to Robert Villano at (877) 678-8080.