autoround (optimized for intel but works on amd) integer quantization provides good CPU performance, and good accuracy benchmarks.

humanspiral@lemmy.ca · 3 months ago

autoround (optimized for intel but works on amd) integer quantization provides good CPU performance, and good accuracy benchmarks.

hendrik@palaver.p3x.de · edit-2 3 months ago

So… Any context on how it compares to other quantization techniques? Is it faster or slower at similar accuracy?

humanspiral@lemmy.ca · 3 months ago

int4 would be faster on CPU than fp4. They show benchmarks that claim better “accuracy”/less retardation than other 4 bit quantization methods (all fp4 variants) int4 and fp4 is the same memory requirement. I don’t think they claim that the actual “post quantization” transformation process is less resources than fp4 alternatives.

autoround (optimized for intel but works on amd) integer quantization provides good CPU performance, and good accuracy benchmarks.

autoround (optimized for intel but works on amd) integer quantization provides good CPU performance, and good accuracy benchmarks.

GitHub - intel/auto-round: Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and TorchAO