Saanvi-C0-12B / README.md
SumanX22's picture
Update README.md
c55fe35 verified
---
tags:
- text-generation
- transformer
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
---
# Saanvi-C0-12B πŸ€–βš‘
![License](https://img.shields.io/badge/License-Apache%202.0-blue)
![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-green)
![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-yellow)
**A next-generation 12B LLM optimized for speed, efficiency, and contextual accuracy.**
_Powered by RAG-based enhancements β€’ 4-bit quantization β€’ Flash Attention 2 β€’ bfloat16 β€’ 128k context window_
---
## πŸš€ Why Upgrade to Saanvi-C0-12B?
Saanvi-C0-12B brings a **huge leap in capability** over smaller models, maintaining efficiency while significantly improving reasoning, fluency, and task completion and math!
| Feature | Benefit |
| --------------------- | --------------------------- |
| ⚑ Flash Attention 2 | Up to **2.7Γ— faster** inference |
| 🧠 4-bit Quantization | **Runs on 8GB VRAM** GPUs |
| 🎯 Instruction-Tuned | **Better task performance** |
| πŸ”₯ RAG-Enhanced | **More precise contextual retrieval** |
| βž— Math-Expert | **Precise Mathematics knowledge** |
### πŸ–₯️ Optimized for Mid-Tier GPUs
- **Runs on mid-range GPUs with 8GB+ VRAM** (RTX 3050, RTX 2060, etc.).
- **More robust than our 3B model** with better contextual retention and instruction-following.
- **4-bit quantization** minimizes VRAM usage without sacrificing quality.
---
## ⚑ Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "riple-saanvi-lab/Saanvi-C0-12B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="bfloat16", device_map="auto")
while True:
user_input = input("\nπŸ‘€ You: ").strip()
if user_input.lower() == "exit":
break
inputs = tokenizer(user_input, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_length=2048, do_sample=True)
print("πŸ€– AI:", tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## πŸ“¦ Installation
```bash
pip install torch transformers
```
---
## πŸ“Š Benchmarks
**A100-40GB Performance**
| Batch Size | Throughput | Latency | VRAM Usage |
| ---------- | ----------- | ------- | ---------- |
| 1 | 42 tok/sec | 85ms | 8.2GB |
| 8 | 218 tok/sec | 430ms | 12.5GB |
**πŸš€ On Mid-Tier GPUs (RTX 3050, RTX 2060, RTX 3060 12GB)**
- **VRAM Usage**: ~8.2GB (single batch)
- **Speed**: ~10-15 tok/sec
- **Best Practices**: Stick to **smaller batch sizes** for best performance.
---
## πŸ“œ License
Licensed under the [Apache 2.0 License](LICENSE). See the [LICENSE](LICENSE) file for details.
πŸ’‘ **Pro Tip**: For **maximum efficiency**, use `torch.compile()` and CUDA graphs on high-end GPUs!
---