All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…

  • Monument@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 months ago

    Honestly kind of excited for the company blogs to start spitting out their disaster recovery crisis management stories.

    I mean - this is just a giant test of disaster recovery crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

    If a company uses IPMI (Called Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
    But that’s obviously predicated on them having already deployed/configured the tools.

  • YTG123@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    >Make a kernel-level antivirus
    >Make it proprietary
    >Don’t test updates… for some reason??

    • CircuitSpells@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      I mean I know it’s easy to be critical but this was my exact thought, how the hell didn’t they catch this in testing?

      • Voroxpete@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can’t test for.

        But this… This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn’t raises massive questions about the safety and security of Crowdstrike’s internal processes.

    • areyouevenreal@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      4 months ago

      Lots of security systems are kernel level (at least partially) this includes SELinux and AppArmor by the way. It’s a necessity for these things to actually be effective.

  • Encrypt-Keeper@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

      • Pringles@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it’s most certainly not a marginal percentage.

        • jj4211@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          4 months ago

          I’m not getting an account on Statista, and I agree that its marketshare isn’t “marginal” in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but “servers” I’d expect to count internet servers…

  • jedibob5@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

    • Bell@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

      • sandalbucket@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

  • richtellyard@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    This is going to be a Big Deal for a whole lot of people. I don’t know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

    • RegalPotoo@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can’t do credit card transactions right now

      • index@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        4 months ago

        cos 3 of the big ones can’t do credit card transactions right now

        Bitcoin still up and running perhaps people can use that

  • Raxiel@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    4 months ago

    A lot of people I work with were affected, I wasn’t one of them. I had assumed it was because I put my machine to sleep yesterday (and every other day this week) and just woke it up after booting it. I assumed it was an on startup thing and that’s why I didn’t have it.

    Our IT provider already broke EVERYTHING earlier this month when they remote installed" Nexthink Collector" which forced a 30+ minute CHKDSK on every boot for EVERYONE, until they rolled out a fix (which they were at least able to do remotely), and I didn’t want to have to deal with that the week before I go in leave.

    But it sounds like it even happened to running systems so now I don’t know why I wasn’t affected, unless it’s a windows 10 only thing?

    Our IT have had some grief lately, but at least they specified Intel 12th gen on our latest CAD machines, rather than 13th or 14th, so they’ve got at least one win.

    • wizardbeard@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Your computer was likely not powered on during the time window between the fucked update pushing out and when they stopped pushing it out.

      • Raxiel@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        That makes sense, although I must have just missed it, for people I work with to catch it.

  • Treczoks@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    I was quite surprised when I heard the news. I had been working for hours on my PC without any issues. It pays off not to use Windows.

    • wizardbeard@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      It’s not a flaw with Windows causing this.

      The issue is with a widely used third party security software that installs as a kernel level driver. It had an auto update that causes bluescreening moments after booting into the OS.

      This same software is available for Linux and Mac, and had similar issues with specific Linux distros a month ago. It just didn’t get reported on because it didn’t have as wide of an impact.

        • TopRamenBinLaden@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          My Windows gaming PC is completely fine right now, because I don’t use crowd strike. Microsoft didn’t have anything to do with crowd strikes’ rollout or support.

          I love Linux and use it as my daily driver for everything besides some online games. There are plenty of legitimate reasons to criticize Microsoft and Windows, but crowd strike breaking stuff isn’t one of them, at least in my opinion.

  • Damage@feddit.it
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    The thought of a local computer being unable to boot because some remote server somewhere is unavailable makes me laugh and sad at the same time.

    • rxxrc@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      I don’t think that’s what’s happening here. As far as I know it’s an issue with a driver installed on the computers, not with anything trying to reach out to an external server. If that were the case you’d expect it to fail to boot any time you don’t have an Internet connection.

      Windows is bad but it’s not that bad yet.