ai4bharat/sangraha
Viewer • Updated • 268M • 8.58k • 73
This is an IQ2_M quantization of sarvamai/sarvam-30b, produced using llama.cpp with a multilingual Indic calibration dataset.
The official Sarvam GGUF release provides Q4_K_M only. This repo adds:
| File | Size | Description |
|---|---|---|
sarvam-30b-IQ2_M-indic.gguf |
~10 GB | IQ2_M quantized model |
sarvam-30b-indic.imatrix |
~82 MB | Indic calibration imatrix |
indic_calibration_final.txt |
~67 MB | Indic calibration data |
# Download
huggingface-cli download YOUR_USERNAME/sarvam-30b-IQ2_M-indic \
--local-dir ./sarvam-30b-IQ2_M
# Run server
./llama-server \
-m ./sarvam-30b-IQ2_M/sarvam-30b-IQ2_M-indic.gguf \
--port 8080 -n 2000
# Run CLI
./llama-cli \
-m ./sarvam-30b-IQ2_M/sarvam-30b-IQ2_M-indic.gguf \
-n 2000 --no-warmup
Sarvam-30B is a reasoning model. It will produce a [Start thinking] chain
before answering — typically 500-2000 tokens. This is expected behaviour,
not a bug. Set -n to at least 2000 to ensure it reaches the final answer.
[@BOS@]
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
YOUR QUESTION HERE<|im_end|>
<|im_start|>assistant
| Setup | Works? | Notes |
|---|---|---|
| 16 GB RAM, no GPU | Yes | ~6-8 t/s, slow but functional |
| 32 GB RAM, no GPU | Yes | ~8-10 t/s |
| Any NVIDIA GPU | Yes | Add -ngl 99 for GPU offload |
Prompt: भारत के बारे में बताओ
Response (after thinking chain):
भारत एक विशाल और विविधतापूर्ण देश है जो दक्षिणी एशिया में स्थित है...
Create a Modelfile:
FROM deepak-p-yadav/sarvam-30b-IQ2_M-indic/sarvam-30b-IQ2_M-indic.gguf
TEMPLATE """{{ if .System }}<|start_of_turn|><|system|>
{{ .System }}<|end_of_turn|>
{{ end }}{{ if .Prompt }}<|start_of_turn|><|user|>
{{ .Prompt }}<|end_of_turn|>
{{ end }}<|start_of_turn|><|assistant|>
"""
PARAMETER stop "<|end_of_turn|>"
PARAMETER stop "<|start_of_turn|>"
PARAMETER num_predict 3000
SYSTEM "You are a helpful indic multilingual assistant. Answer directly in language user provides."
Then run:
ollama create sarvam-30b-indic -f Modelfile
ollama run sarvam-30b-indic "भारत के बारे में बताओ"
Quantized from Sarvam's official Q4_K_M using --allow-requantize.
Qualitative testing shows coherent Hindi, Tamil, and Telugu output on
simple factual prompts. The IQ2_M compression introduces some quality
degradation compared to Q4_K_M — most noticeable on complex multi-step
reasoning tasks. For maximum quality use the official Q4_K_M from
sarvamai/sarvam-30b-gguf.
@misc{sarvam_sovereign_models,
title={Introducing Sarvam's Sovereign Models},
author={{Sarvam Foundation Models Team}},
year={2026},
url={https://www.sarvam.ai/blogs/sarvam-30b-105b}
}
This quantization is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0),
inherited from the original [sarvamai/sarvam-30b](https://huggingface.co/sarvamai/sarvam-30b) model.
Base model
sarvamai/sarvam-30b