Model seems to stop responding after 200-300 lines.
#2
by SuperbEmphasis - opened
I am using this NVFP4 model with 4xH100 GPUs.
This seems to occur every attempt:
I am using VLLM 0.23.0 and I am using the listed vllm serve command.
unrelated, awesome work! The model seems smart of the size. My company has a lot of restrictions on which models we can use. So thank you for open sourcing this and giving it an open license! AWESOME!