File size: 2,952 Bytes
a845e43 091bddb a845e43 c55fe35 a845e43 8134b5a a845e43 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
tags:
- text-generation
- transformer
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
---
# Saanvi-C0-12B π€β‘



**A next-generation 12B LLM optimized for speed, efficiency, and contextual accuracy.**
_Powered by RAG-based enhancements β’ 4-bit quantization β’ Flash Attention 2 β’ bfloat16 β’ 128k context window_
---
## π Why Upgrade to Saanvi-C0-12B?
Saanvi-C0-12B brings a **huge leap in capability** over smaller models, maintaining efficiency while significantly improving reasoning, fluency, and task completion and math!
| Feature | Benefit |
| --------------------- | --------------------------- |
| β‘ Flash Attention 2 | Up to **2.7Γ faster** inference |
| π§ 4-bit Quantization | **Runs on 8GB VRAM** GPUs |
| π― Instruction-Tuned | **Better task performance** |
| π₯ RAG-Enhanced | **More precise contextual retrieval** |
| β Math-Expert | **Precise Mathematics knowledge** |
### π₯οΈ Optimized for Mid-Tier GPUs
- **Runs on mid-range GPUs with 8GB+ VRAM** (RTX 3050, RTX 2060, etc.).
- **More robust than our 3B model** with better contextual retention and instruction-following.
- **4-bit quantization** minimizes VRAM usage without sacrificing quality.
---
## β‘ Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "riple-saanvi-lab/Saanvi-C0-12B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="bfloat16", device_map="auto")
while True:
user_input = input("\nπ€ You: ").strip()
if user_input.lower() == "exit":
break
inputs = tokenizer(user_input, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_length=2048, do_sample=True)
print("π€ AI:", tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## π¦ Installation
```bash
pip install torch transformers
```
---
## π Benchmarks
**A100-40GB Performance**
| Batch Size | Throughput | Latency | VRAM Usage |
| ---------- | ----------- | ------- | ---------- |
| 1 | 42 tok/sec | 85ms | 8.2GB |
| 8 | 218 tok/sec | 430ms | 12.5GB |
**π On Mid-Tier GPUs (RTX 3050, RTX 2060, RTX 3060 12GB)**
- **VRAM Usage**: ~8.2GB (single batch)
- **Speed**: ~10-15 tok/sec
- **Best Practices**: Stick to **smaller batch sizes** for best performance.
---
## π License
Licensed under the [Apache 2.0 License](LICENSE). See the [LICENSE](LICENSE) file for details.
π‘ **Pro Tip**: For **maximum efficiency**, use `torch.compile()` and CUDA graphs on high-end GPUs!
--- |