The sampling acceleration from nvfp4 quantization in Krea2 is not significant.
RTX5060 64GB DDR4
Same here RTX5060Ti 16G, 96G RAM
Me,too.nvfp4 even slower than mxfp8. RTX5080,96G RAM
Possibly uploaded wrong version of it which doesn't allow the fast nvfp4 matmuls, re-uploaded now. For me it's ~15% faster than fp8 on 5090, not all layers could use nvfp4 matmuls due to rather bad quality loss, so it's not going to be that much faster, still should definitely not be slower at least.
Possibly uploaded wrong version of it which doesn't allow the fast nvfp4 matmuls, re-uploaded now. For me it's ~15% faster than fp8 on 5090, not all layers could use nvfp4 matmuls due to rather bad quality loss, so it's not going to be that much faster, still should definitely not be slower at least.
Thanks for the update! I just used the new nvfp4. Generating that 3840x2160 image went from 20s per step down to 15s per step.

