Nemotron models that have been converted and/or quantized to work well in vLLM
Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Organizations
Collections
1
spaces
3
models
64
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Minitron-8B-Base-FP8
Text Generation
•
Updated
•
6
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Minitron-4B-Base-FP8
Text Generation
•
Updated
•
2
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Nemotron-4-340B-Instruct-hf
Text Generation
•
Updated
•
8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Nemotron-4-340B-Instruct-hf-FP8
Text Generation
•
Updated
•
4
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Nemotron-4-340B-Base-hf-FP8
Text Generation
•
Updated
•
4
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Nemotron-4-340B-Base-hf
Text Generation
•
Updated
•
5
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/nemotron-3-8b-chat-4k-sft-hf
Text Generation
•
Updated
•
2
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Nemotron-4-340B-Instruct-FP8-Dynamic
Text Generation
•
Updated
•
8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Nemotron-4-340B-Instruct-vllm
Text Generation
•
Updated
•
6
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
mgoin/Mistral-Nemo-Instruct-2407-FP8-KV
Text Generation
•
Updated
•
67