|
--- |
|
tags: |
|
- text-generation |
|
- transformer |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
# Saanvi-C0-12B π€β‘ |
|
|
|
 |
|
 |
|
 |
|
|
|
**A next-generation 12B LLM optimized for speed, efficiency, and contextual accuracy.** |
|
_Powered by RAG-based enhancements β’ 4-bit quantization β’ Flash Attention 2 β’ bfloat16 β’ 128k context window_ |
|
|
|
--- |
|
|
|
## π Why Upgrade to Saanvi-C0-12B? |
|
|
|
Saanvi-C0-12B brings a **huge leap in capability** over smaller models, maintaining efficiency while significantly improving reasoning, fluency, and task completion and math! |
|
|
|
| Feature | Benefit | |
|
| --------------------- | --------------------------- | |
|
| β‘ Flash Attention 2 | Up to **2.7Γ faster** inference | |
|
| π§ 4-bit Quantization | **Runs on 8GB VRAM** GPUs | |
|
| π― Instruction-Tuned | **Better task performance** | |
|
| π₯ RAG-Enhanced | **More precise contextual retrieval** | |
|
| β Math-Expert | **Precise Mathematics knowledge** | |
|
|
|
|
|
### π₯οΈ Optimized for Mid-Tier GPUs |
|
- **Runs on mid-range GPUs with 8GB+ VRAM** (RTX 3050, RTX 2060, etc.). |
|
- **More robust than our 3B model** with better contextual retention and instruction-following. |
|
- **4-bit quantization** minimizes VRAM usage without sacrificing quality. |
|
|
|
--- |
|
|
|
## β‘ Quick Start |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_path = "riple-saanvi-lab/Saanvi-C0-12B" |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="bfloat16", device_map="auto") |
|
|
|
while True: |
|
user_input = input("\nπ€ You: ").strip() |
|
if user_input.lower() == "exit": |
|
break |
|
inputs = tokenizer(user_input, return_tensors="pt").to(model.device) |
|
output = model.generate(**inputs, max_length=2048, do_sample=True) |
|
print("π€ AI:", tokenizer.decode(output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
## π¦ Installation |
|
|
|
```bash |
|
pip install torch transformers |
|
``` |
|
|
|
--- |
|
|
|
## π Benchmarks |
|
|
|
**A100-40GB Performance** |
|
|
|
| Batch Size | Throughput | Latency | VRAM Usage | |
|
| ---------- | ----------- | ------- | ---------- | |
|
| 1 | 42 tok/sec | 85ms | 8.2GB | |
|
| 8 | 218 tok/sec | 430ms | 12.5GB | |
|
|
|
**π On Mid-Tier GPUs (RTX 3050, RTX 2060, RTX 3060 12GB)** |
|
- **VRAM Usage**: ~8.2GB (single batch) |
|
- **Speed**: ~10-15 tok/sec |
|
- **Best Practices**: Stick to **smaller batch sizes** for best performance. |
|
|
|
--- |
|
|
|
## π License |
|
|
|
Licensed under the [Apache 2.0 License](LICENSE). See the [LICENSE](LICENSE) file for details. |
|
|
|
π‘ **Pro Tip**: For **maximum efficiency**, use `torch.compile()` and CUDA graphs on high-end GPUs! |
|
|
|
--- |