You must log in or register to comment.
So… Any context on how it compares to other quantization techniques? Is it faster or slower at similar accuracy?
int4 would be faster on CPU than fp4. They show benchmarks that claim better “accuracy”/less retardation than other 4 bit quantization methods (all fp4 variants) int4 and fp4 is the same memory requirement. I don’t think they claim that the actual “post quantization” transformation process is less resources than fp4 alternatives.

