cross-posted from: https://feddit.org/post/28915273

[…]

That marketing may have outstripped reality. Early reports from Mythos preview users including AWS and Mozilla indicate that while the model is very good and very fast at finding vulnerabilities, and requires less hands-on guidance from security engineers - making it a welcome time-saver for the human teams - it has yet to eclipse human security researchers.

“So far we’ve found no category or complexity of vulnerability that humans can find that this model can’t,” Mozilla CTO Bobby Holley said, after revealing that Mythos found 271 vulnerabilities in Firefox 150. Then he added: “We also haven’t seen any bugs that couldn’t have been found by an elite human researcher.” In other words, it’s like adding an automated security researcher to your team. Not a zero-day machine that’s too dangerous for the world.

  • MangoCats@feddit.it
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    In other words, it’s like adding an automated security researcher to your team. Not a zero-day machine that’s too dangerous for the world.

    Missing the point? Hiring an elite human researcher isn’t easy, or cheap. It’s beyond the means of the vast majority of people out there. $20/Month Claude Pro subscription? Not so much.

    The question for me: How much better is Mythos than Opus 4.6 or 4.7, or Sonnet for that matter? Those models and similar from other companies are already being effectively leveraged by threat actors. If Mythos reduces the time x money cost of finding a new zero-day by a factor of 10 vs Opus 4.7 - that’s concerning. If it’s a factor of 1.1 - meh… the world is going to have to learn how to deal with these things sooner than later, and that means the “white hats” are going to need superior funding to the “black hats” along with cooperation to close the gaps they find, or the “black hats” are going to be getting a lot more annoying than they already are.

    • ashughes@feddit.uk
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      How much better is Mythos than Opus 4.6 or 4.7, or Sonnet for that matter?

      Opus 4.6 resulted in 22 fixes in Firefox 148, compared to 271 fixes with Mythos in Firefox 150.

      source

        • frongt@lemmy.zip
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Firefox is a massive program, so yeah it’s gonna have a lot of bugs. Even a simple HTML rendering browser is a complex program.

            • Nalivai@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 months ago

              What do you do with your browsers so they crash? Mine didn’t do that in at least a decade

              • MangoCats@feddit.it
                link
                fedilink
                English
                arrow-up
                0
                ·
                2 months ago

                More often than crashing outright, I hit situations where the browser just isn’t working, won’t load pages or won’t execute button clicks on pages or similar and the only thing (on Windows) that will fix it is a reboot. In Linux usually closing the browser and restarting will get it going again. Yeah, BSODs are rare lately (though not entirely gone), but malfunctions still abound.

                • Nalivai@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  2 months ago

                  Interesting. So far, all my experiences with stuff like that turned out to be faulty hardware.

    • Nalivai@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      People for some reason assume that you can pay $20 for a bot and it will do something. You need a person with a lot of experience to get something useful from this bot, and every time we actually measure, the results that your experienced person will be quicker and better not using it at all, and doing the same work themselves.
      The corporate solution is to hire a not experienced person to wrangle the bots, but that’s a sure way to introduce bugs, not fix them.