You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Sarvam-30B INT4 W4A16 Quantized Model

Base Model

Base model: sarvamai/sarvam-30b

This is an INT4 / W4A16 quantized version of Sarvam-30B.

Quantization Method

  • Method: GPTQ using llmcompressor
  • Scheme: W4A16
  • Source model dtype during quantization: BF16
  • Calibration samples: 128
  • Calibration sequence length: 2048
  • Saved format: Hugging Face save_pretrained format with compressed safetensors

Precision Policy

Preserved / ignored during quantization:

  • Embeddings
  • LM head
  • Attention modules and projections
  • Router / gating modules
  • MoE router-related modules

Main quantized target:

  • Linear layers outside the ignore list
  • Expert / FFN-heavy parts of the model

Serving

This submission is intended for vLLM.

Run with:

vllm serve --config vllm_config.yaml

Equivalent explicit command:

vllm serve . \
  --served-model-name sarvam-int4 \
  --trust-remote-code \
  --dtype bfloat16 \
  --host 0.0.0.0 \
  --port 8000 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.88 \
  --max-num-seqs 1

Validation

The model was validated through vLLM on seven prompts covering:

  • English reasoning
  • BoolQ-style reasoning
  • Hindi / Indian-language response
  • Math / science
  • Medical-style educational synthesis
  • Multiple choice
  • Open-ended generation

Observed result:

  • 6 PASS
  • 1 PASS_WITH_FORMAT_WARNING
  • 0 FAIL

Known Caveats

  • Requires trust_remote_code=True.
  • Tested with vLLM.
  • The provided serving config is vllm_config.yaml.
Downloads last month
189
Safetensors
Model size
7B params
Tensor type
BF16
I64
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for meghanamakkapati/sarvam30b_INT4_quantisation

Quantized
(21)
this model