- cross-posted to:
- fosai@lemmy.world
- hackernews
- cross-posted to:
- fosai@lemmy.world
- hackernews
cross-posted from: https://piefed.zip/c/fosai/p/958141/30b-a3b-glm-4-7-flash-released
Small/fast model with MIT license for local use.
Benchmarks look good for the size. But IMO these smaller models aren’t consistent enough to live up to their promises.
You must log in or register to comment.
Anyone get this working in llama.cpp yet?
I know flash attention and PyTorch have patchy support.
Oo. I use Qwen3-30B-A3B-Thinking-2507 as my generic “workhorse” local LLM, so this looks like it might be a nice upgrade with exactly the same basic specs. I’ll try it out.




