Advanced quantization toolkit for LLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, AutoBits and seamless integration with Transformers, vLLM, SGLang. - intel/auto-round
int4 would be faster on CPU than fp4. They show benchmarks that claim better “accuracy”/less retardation than other 4 bit quantization methods (all fp4 variants) int4 and fp4 is the same memory requirement. I don’t think they claim that the actual “post quantization” transformation process is less resources than fp4 alternatives.
int4 would be faster on CPU than fp4. They show benchmarks that claim better “accuracy”/less retardation than other 4 bit quantization methods (all fp4 variants) int4 and fp4 is the same memory requirement. I don’t think they claim that the actual “post quantization” transformation process is less resources than fp4 alternatives.