Sarvam-30B 8-Bit (BitsAndBytes)

This repository provides an 8-bit quantized version of the base model sarvamai/sarvam-30b using bitsandbytes.

8-bit quantization reduces memory usage while maintaining very high model quality.

Base model sarvamai/sarvam-30b

Architecture SarvamMoEForCausalLM


Quantization Details

Quantization method: BitsAndBytes 8-bit

Configuration used:

  • load_in_8bit = True

Approximate GPU memory usage:

Model GPU VRAM
FP16 original ~60 GB
8-bit ~30 GB

This version provides near-FP16 quality while using roughly half the memory.


Installation

Install dependencies.

pip install transformers accelerate bitsandbytes torch safetensors

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "neuralnets/sarvam-30b-8bit",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "neuralnets/sarvam-30b-8bit",
    trust_remote_code=True
)

Example Inference

prompt = "Explain mixture of experts in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Hardware Requirements

Recommended GPUs:

  • A100 40GB or 80GB
  • RTX 4090
  • RTX 3090

CPU RAM recommendation:

  • 32 GB or more

Notes

  • Uses bitsandbytes 8-bit quantization integrated with Hugging Face Transformers.
  • Requires trust_remote_code=True due to the Sarvam architecture.
  • Suitable for high-quality inference.

Base Model

Original model repository:

sarvamai/sarvam-30b

Refer to the base model page for detailed information about training and architecture.


License

This repository distributes a quantized derivative of the upstream model.

Users must comply with the license of the original model:

sarvamai/sarvam-30b

Downloads last month
5
Safetensors
Model size
32B params
Tensor type
F32
F16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for neuralnets/sarvam-30b-8bit

Quantized
(22)
this model