- cross-posted to:
- fosai@lemmy.world
- hackernews
- cross-posted to:
- fosai@lemmy.world
- hackernews
cross-posted from: https://piefed.zip/c/fosai/p/958141/30b-a3b-glm-4-7-flash-released
Small/fast model with MIT license for local use.
Benchmarks look good for the size. But IMO these smaller models aren’t consistent enough to live up to their promises.




Anyone get this working in llama.cpp yet?
I know flash attention and PyTorch have patchy support.
Seems to be a new architecture so custom support is needed.
Tracking issue
PR