• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    5 days ago

    Open models are going to kick the stool out. Hopefully.

    GLM 4.5 is already #2 on lm arena, above Grok and ChatGPT, and runnable on homelab rigs, yet just 32B active (which is mad). Extrapolate that a bit, and it’s just a race to the zero-cost bottom. None of this is sustainable.

    • dubyakay@lemmy.ca
      link
      fedilink
      English
      arrow-up
      5
      ·
      5 days ago

      I did not understand half of what you’ve written. But what do I need to get this running on my home PC?

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        4 days ago

        I am referencing this: https://z.ai/blog/glm-4.5

        The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.

        GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.

        You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).

        https://github.com/ikawrakow/ik_llama.cpp/

        But these are…not lightweight models. If you don’t want a homelab, there are better ones that will fit on more typical hardware configs.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          4 days ago

          It’s going to be slow as molasses on ollama. It needs a better runtime, and GLM 4.5 probably isn’t supported at this moment anyway.