All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.
Apparently caused by a bad CrowdStrike update.
Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…
Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.
Agreed, this will probably kill them over the next few years unless they can really magic up something.
They probably don’t get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn’t a contract breach.
If you are running crowdstrike, it’s probably because you have some regulatory obligations and an auditor to appease - you aren’t going to be able to just turn it off overnight, but I’m sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can’t imagine them seeing much growth
Don’t most indemnity clauses have exceptions for gross negligence? Pushing out an update this destructive without it getting caught by any quality control checks sure seems grossly negligent.
deleted by creator
explain to the project manager with crayons why you shouldn’t do this
Can’t; the project manager ate all the crayons
Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.
Most companies, mine included, try to roll out updates during the middle or start of a week. That way if there are issues the full team is available to address them.
deleted by creator
And hence the term read-only Friday.
rolling out an update to production that there was clearly no testing
Or someone selected “env2” instead of “env1” (#cattleNotPets names) and tested in prod by mistake.
Look, it’s a gaffe and someone’s fired. But it doesn’t mean fuck ups are endemic.
I think you’re on the nose, here. I laughed at the headline, but the more I read the more I see how fucked they are. Airlines. Industrial plants. Fucking governments. This one is big in a way that will likely get used as a case study.
The London Stock Exchange went down. They’re fukd.
Testing in production will do that
Not everyone is fortunate enough to have a seperate testing environment, you know? Manglement has to cut cost somewhere.
Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?
Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.
What lawsuits do you think are going to happen?
Forget lawsuits, they’re going to be in front of congress for this one
For what? At best it would be a hearing on the challenges of national security with industry.
Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao
I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.
It’s only marginal for running custom code. Every large organization has at least a few of them running important out-of-the-box services.
Well, I’ve seen some, but they usually don’t have automatic updates and generally do not have access to the Internet.
Almost everyone, because the Windows server market share isn’t marginal at all.
Not too long ago, a lot of Customer Relationship Management (CRM) software ran on MS SQL Server. Businesses made significant investments in software and training, and some of them don’t have the technical, financial, or logistical resources to adapt - momentum keeps them using Windows Server.
For example, small businesses that are physically located in rural areas can’t use cloud based services because rural internet is too slow and unreliable. Its not quite the case that there’s no amount of money you can pay for a good internet connection in rural America, but last time I looked into it, Verizon wanted to charge me $20,000 per mile to run a fiber optic cable from the nearest town to my client’s farm.
How many coffee cups have you drank in the last 12 hours?
I work in a data center
I lost count
What was Dracula doing in your data centre?
Because he’s Dracula. He’s twelve million years old.
THE WORMS
I work in a datacenter, but no Windows. I slept so well.
Though a couple years back some ransomware that also impacted Linux ran through, but I got to sleep well because it only bit people with easily guessed root passwords. It bit a lot of other departments at the company though.
This time even the Windows folks were spared, because CrowdStrike wasn’t the solution they infested themselves with (they use other providers, who I fully expect to screw up the same way one day).
There was a point where words lost all meaning and I think my heart was one continuous beat for a good hour.
Did you feel a great disturbance in the force?
Oh yeah I felt a great disturbance (900 alarms) in the force (Opsgenie)
How’s it going, Obi-Wan?
CrowdStrike: It’s Friday, let’s throw it over the wall to production. See you all on Monday!
They did it on Thursday. All of SFO was BSODed for me when I got off a plane at SFO Thursday night.
This is going to be a Big Deal for a whole lot of people. I don’t know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.
Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can’t do credit card transactions right now
Several 911 systems were affected or completely down too
Clownstrike
Crowdshite haha gotem
CrowdCollapse
An offline server is a secure server!
The thought of a local computer being unable to boot because some remote server somewhere is unavailable makes me laugh and sad at the same time.
I don’t think that’s what’s happening here. As far as I know it’s an issue with a driver installed on the computers, not with anything trying to reach out to an external server. If that were the case you’d expect it to fail to boot any time you don’t have an Internet connection.
Windows is bad but it’s not that bad yet.
It’s just a fun coincidence that the azure outage was around the same time.
expect it to fail to boot any time you don’t have an Internet connection.
So, like the UbiSoft umbilical but for OSes.
Edit: name of publisher not developer.
Been at work since 5AM… finally finished deleting the C-00000291*.sys file in CrowdStrike directory.
182 machines total. Thankfully the process in of itself takes about 2-3 minutes. For virtual machines, it’s a bit of a pain, at least in this org.
lmao I feel kinda bad for those companies that have 10k+ endpoints to do this to. Eff… that. Lot’s of immediate short term contract hires for that, I imagine.
How do you deal with places with thousands of remote endpoints??
That’s one of those situations where they need to immediately hire local contractors to those remote sites. This outage literally requires touching the equipment. lol
I’d even say, fly out each individual team member to those sites… but even the airports are down.
lol
too bad me posting this will bump the comment count though. maybe we should try to keep the vote count to 404
I can only see 368 comments rn, there must be some weird-ass puritan server blocking .ml users. It’s not beehaw as I can see comments from there.
I can only conclude that it is probably some liberals trying to block “Tankies” and no comment of value was lost.
crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It’s not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming…
they have to go into every data center and manually fix all the computer servers.
Jesus christ, you would think that (a) the company would have safeguards in place and (b) businesses using the product would do better due diligence. Goes to show thwre are no grown ups in the room inside these massive corporations that rule every aspect of our lives.
I’m calling it now. In the future there will be some software update for your electric car, and due to some jackass, millions of cars will end up getting bricked in the middle of the road where they have to manually be rebooted.
Laid off one too many persons, finance bros taking over
Stop running production services on M$. There is a better backend OS.
There’s a better frontend OS
Doesn’t mean people want to go away from what they know
I’m so exhausted… This is madness. As a Linux user I’ve busy all day telling people with bricked PCs that Linux is better but there are just so many. It never ends. I think this is outage is going to keep me busy all weekend.
What are you, an apostle? Lol. This issue affects Windows, but it’s not a Windows issue. It’s wholly on CrowdStrike for a malformed driver update. This could happen to Linux just as easily given how CS operates. I like Linux too, but this isn’t the battle.
A month or so ago a crowdstrike update was breaking some of our Linux vms with newer kernels. So it’s not just the os.
How? I’m really curious to learn.
Crowdstrike bricked networking on our linuxes for quite a few versions.
I don’t know how on either one. I just know it happened.
You’re comment I came looking for. You get a standing ovation or something.
Annoyingly, my laptop seems to be working perfectly.
That’s the burden when you run Arch, right?
oh joy. can’t wait to have to fix this for all of our clients today…