Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. - GitHub - intel/auto-round: Advanced Quantization Algorithm for LLMs and VLMs, with support for C...
int4 would be faster on CPU than fp4. They show benchmarks that claim better “accuracy”/less retardation than other 4 bit quantization methods (all fp4 variants) int4 and fp4 is the same memory requirement. I don’t think they claim that the actual “post quantization” transformation process is less resources than fp4 alternatives.
So… Any context on how it compares to other quantization techniques? Is it faster or slower at similar accuracy?
int4 would be faster on CPU than fp4. They show benchmarks that claim better “accuracy”/less retardation than other 4 bit quantization methods (all fp4 variants) int4 and fp4 is the same memory requirement. I don’t think they claim that the actual “post quantization” transformation process is less resources than fp4 alternatives.