Model seems to stop responding after 200-300 lines.

by SuperbEmphasis - opened 3 days ago

I am using this NVFP4 model with 4xH100 GPUs.

This seems to occur every attempt:

I am using VLLM 0.23.0 and I am using the listed vllm serve command.

unrelated, awesome work! The model seems smart of the size. My company has a lot of restrictions on which models we can use. So thank you for open sourcing this and giving it an open license! AWESOME!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment