This isn’t a gloat post. In fact, I was completely oblivious to this massive outage until I tried to check my bank balance and it wouldn’t log in.
Apparently Visa Paywave, banks, some TV networks, EFTPOS, etc. have gone down. Flights have had to be cancelled as some airlines systems have also gone down. Gas stations and public transport systems inoperable. As well as numerous Windows systems and Microsoft services affected. (At least according to one of my local MSMs.)
Seems insane to me that one company’s messed up update could cause so much global disruption and so many systems gone down :/ This is exactly why centralisation of services and large corporations gobbling up smaller companies and becoming behemoth services is so dangerous.
I’ve just spent the past 6 hours booting into safe mode and deleting crowd strike files on servers.
Feel you there. 4 hours here. All of them cloud instances whereby getting acces to the actual console isn’t as easy as it should be, and trying to hit F8 to get the menu to get into safe mode can take a very long time.
Ha! Yes. Same issue. Clicking Reset in vSphere and then quickly switching tabs to hold down F8 has been a ball ache to say the least!
Just go into settings and add a boot delay, then set it back when you’re done.
What I usually do is set next boot to BIOS so I have time to get into the console and do whatever.
Also instead of using a browser, I prefer to connect vmware Workstation to vCenter so all the consoles insta open in their own tabs in the workspace.
Can’t you automate it?
Sadly not. Windows doesn’t boot. You can boot it into safe mode with networking, at which point maybe with anaible we could login to delete the file but since it’s still manual work to get windows into safe mode there’s not much point
It is theoretically automatable, but on bare metal it requires having hardware that’s not normally just sitting in every data centre, so it would still require someone to go and plug something into each machine.
On VMs it’s more feasible, but on those VMs most people are probably just mounting the disk images and deleting the bad file to begin with.
I guess it depends on numbers too. We had 200 to work on. If you’re talking hundreds more than looking at automation would be a better solution. In our scenario it was just easier to throw engineers at it. I honestly thought at first this was my weekend gone but we got through them easily in the end.
The real problem with VM setups is that the host system might have crashed too
Since it has to happen in windows safe mode it seems to be very hard to automate the process. I haven’t seen a solution yet.