Nemotron in vLLM
Collection
Nemotron models that have been converted and/or quantized to work well in vLLM
•
7 items
•
Updated
Converted checkpoint of nvidia/Nemotron-4-340B-Base. Specifically it was produced from the v1.2 .nemo checkpoint on NGC.
This runs in vLLM with this PR: https://github.com/vllm-project/vllm/pull/6611. Support in transformers is still pending.
Please see the FP8 checkpoint for evaluations since I only have done single-node inference.