Text Generation
Safetensors
Model Optimizer
gemma4
nvidia
ModelOpt
Gemma-4-31B-IT
lighthouse
quantized
NVFP4
conversational
modelopt
Instructions to use nvidia/Gemma-4-31B-IT-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Inference
This model wasn't trained with FP4 or NVFP4
1
#8 opened about 1 month ago
by
yangus87
1*H100 with vLLM 0.19.0 Failed
#7 opened about 1 month ago
by
JeffreySheng
Question about q_scale / KV cache scale fallback in vLLM for Gemma-4-31B-IT-NVFP4: expected accuracy impact?
👀 4
#6 opened about 1 month ago
by
Shaoqing
Why not quantize the MATRICES of Wq, Wk, Wv, Wo?
1
#5 opened about 2 months ago
by
BeetSoup
这个版本对于5090单卡来说还是太大了
10
#4 opened about 2 months ago
by
iwaitu
Why is this 4bit version has a 32.7 GB size?
➕ 3
20
#3 opened about 2 months ago
by
alexcardo