Saanvi-C0-12B πŸ€–βš‘

License
Python 3.8+
Hugging Face

A next-generation 12B LLM optimized for speed, efficiency, and contextual accuracy.
Powered by RAG-based enhancements β€’ 4-bit quantization β€’ Flash Attention 2 β€’ bfloat16 β€’ 128k context window


πŸš€ Why Upgrade to Saanvi-C0-12B?

Saanvi-C0-12B brings a huge leap in capability over smaller models, maintaining efficiency while significantly improving reasoning, fluency, and task completion and math!

Feature Benefit
⚑ Flash Attention 2 Up to 2.7Γ— faster inference
🧠 4-bit Quantization Runs on 8GB VRAM GPUs
🎯 Instruction-Tuned Better task performance
πŸ”₯ RAG-Enhanced More precise contextual retrieval
βž— Math-Expert Precise Mathematics knowledge

πŸ–₯️ Optimized for Mid-Tier GPUs

  • Runs on mid-range GPUs with 8GB+ VRAM (RTX 3050, RTX 2060, etc.).
  • More robust than our 3B model with better contextual retention and instruction-following.
  • 4-bit quantization minimizes VRAM usage without sacrificing quality.

⚑ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "riple-saanvi-lab/Saanvi-C0-12B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="bfloat16", device_map="auto")

while True:
    user_input = input("\nπŸ‘€ You: ").strip()
    if user_input.lower() == "exit":
        break
    inputs = tokenizer(user_input, return_tensors="pt").to(model.device)
    output = model.generate(**inputs, max_length=2048, do_sample=True)
    print("πŸ€– AI:", tokenizer.decode(output[0], skip_special_tokens=True))

πŸ“¦ Installation

pip install torch transformers

πŸ“Š Benchmarks

A100-40GB Performance

Batch Size Throughput Latency VRAM Usage
1 42 tok/sec 85ms 8.2GB
8 218 tok/sec 430ms 12.5GB

πŸš€ On Mid-Tier GPUs (RTX 3050, RTX 2060, RTX 3060 12GB)

  • VRAM Usage: ~8.2GB (single batch)
  • Speed: ~10-15 tok/sec
  • Best Practices: Stick to smaller batch sizes for best performance.

πŸ“œ License

Licensed under the Apache 2.0 License. See the LICENSE file for details.

πŸ’‘ Pro Tip: For maximum efficiency, use torch.compile() and CUDA graphs on high-end GPUs!


Downloads last month
117
Safetensors
Model size
12.2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for riple-saanvi-lab/Saanvi-C0-12B

Quantizations
2 models