Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

nvidia
/
Gemma-4-31B-IT-NVFP4

Text Generation
Safetensors
Model Optimizer
gemma4
nvidia
ModelOpt
Gemma-4-31B-IT
lighthouse
quantized
NVFP4
conversational
modelopt
Model card Files Files and versions
xet
Community
12

Instructions to use nvidia/Gemma-4-31B-IT-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Inference
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

This model wasn't trained with FP4 or NVFP4

1
#8 opened about 1 month ago by
yangus87

1*H100 with vLLM 0.19.0 Failed

#7 opened about 1 month ago by
JeffreySheng

Question about q_scale / KV cache scale fallback in vLLM for Gemma-4-31B-IT-NVFP4: expected accuracy impact?

👀 4
#6 opened about 1 month ago by
Shaoqing

Why not quantize the MATRICES of Wq, Wk, Wv, Wo?

1
#5 opened about 2 months ago by
BeetSoup

这个版本对于5090单卡来说还是太大了

10
#4 opened about 2 months ago by
iwaitu

Why is this 4bit version has a 32.7 GB size?

➕ 3
20
#3 opened about 2 months ago by
alexcardo
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs