doubled TPS NVFP4 vs FP8
#15
by sSeeMyRolexXx - opened
Congratulations to you. How do you make it happen? I only get 50 t/s on vllm 0.24 and 5090.
I can get ~100 tokens/s with MTP on 5090 (configuration from https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4/discussions/5#6a45eadef261c02fd4dcd255), 200 would be great!

