doubled TPS NVFP4 vs FP8

#15
by sSeeMyRolexXx - opened

nvfp4-benchmark-table-screenshot

7afdb82da94d15e39c0aaad10214c8

Congratulations to you. How do you make it happen? I only get 50 t/s on vllm 0.24 and 5090.

I can get ~100 tokens/s with MTP on 5090 (configuration from https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4/discussions/5#6a45eadef261c02fd4dcd255), 200 would be great!

Sign up or log in to comment