The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.
GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.
You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).
I did not understand half of what you’ve written. But what do I need to get this running on my home PC?
I am referencing this: https://z.ai/blog/glm-4.5
The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.
GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.
You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).
https://github.com/ikawrakow/ik_llama.cpp/
But these are…not lightweight models. If you don’t want a homelab, there are better ones that will fit on more typical hardware configs.
You can probably just use ollama and import the model.
It’s going to be slow as molasses on ollama. It needs a better runtime, and GLM 4.5 probably isn’t supported at this moment anyway.