Instructions to use DSMJ910/qwen2.5-3b-hinglish-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DSMJ910/qwen2.5-3b-hinglish-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-3B-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "DSMJ910/qwen2.5-3b-hinglish-lora") - Notebooks
- Google Colab
- Kaggle
Qwen2.5-3B Hinglish Instruction-Tuned (QLoRA)
LoRA adapter fine-tuning Qwen2.5-3B-Instruct for natural code-mixed Hindi-English (Hinglish) conversation. Trained on 10,594 synthetic Hinglish instruction examples covering casual chat, customer support, Q&A, and sentiment classification.
Headline result
| Metric | Qwen-base | GPT-4o-mini | GPT-4o | This model |
|---|---|---|---|---|
| Hinglish marker density | 8.9% | 29.5% | 24.6% | 31.6% |
| English drift rate | 32% | 0% | 4% | 0% |
| Devanagari injection bug | 12.5% | 0% | 2.5% | 0% |
| Claude judge register score (/5) | 1.24 | 2.50 | 2.12 | 3.98 |
| Claude judge total (/20) | 6.72 | 13.56 | 12.90 | 12.48 |
Bottom line: Matches or exceeds GPT-4o-mini on Hinglish register naturalness, with comparable to ~3.4× lower serving cost depending on infrastructure choice. Trails GPT-4o-mini ~8% on content quality (intent accuracy + factuality). Optimal for style-sensitive conversational use cases at sustained traffic where dedicated GPU instances become economical vs per-token API pricing.
Cost comparison (measured)
Benchmarked on NVIDIA T4 (HuggingFace transformers, fp16, batch=16, ~294 tok/sec).
| Infrastructure | $/M tokens | vs GPT-4o-mini |
|---|---|---|
| AWS T4 on-demand | $0.50 | parity |
| GCP T4 on-demand | $0.33 | 1.5× cheaper |
| AWS T4 reserved (1yr) | $0.30 | 1.7× cheaper |
| RunPod community | $0.18 | 2.8× cheaper |
| AWS T4 spot | $0.15 | 3.4× cheaper |
| GPT-4o-mini API | $0.51 (blended 20%/80% in/out) | baseline |
Note: a vLLM or TGI deployment would likely improve self-hosted throughput by ~50-100%, shifting the comparison further in the fine-tune's favor. This was not benchmarked here due to environment constraints.
How to use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, "DSMJ910/qwen2.5-3b-hinglish-lora")
messages = [{"role": "user", "content": "Bhai weekend pe Bangalore mein kya karein?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
outputs = model.generate(inputs, max_new_tokens=300, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training details
- Base model: Qwen2.5-3B-Instruct (4-bit NF4 quantization)
- Adapter: LoRA rank=16, alpha=32, dropout=0
- Target modules: all linear layers (q/k/v/o projections + MLP gate/up/down)
- Trainable parameters:
30M (1% of base model) - Training data: 10,594 synthetic Hinglish instruction examples (see dataset link)
- Hyperparameters: lr=2e-4, batch_size=16 (effective), 2 epochs, AdamW 8-bit, linear schedule, bf16
- Hardware: Single Blackwell GPU (95 GB VRAM)
- Training time: 9.2 minutes
- Adapter size: 125 MB
Evaluation
Quantitative
- 50-prompt hand-curated Hinglish eval set (4 categories: casual, customer support, Q&A, sentiment)
- Automated metrics: Hinglish marker density, English drift detection, Devanagari injection check
- LLM-as-judge: Claude Sonnet 4.6 evaluating pairwise on 4 axes (Register, Intent, Quality, Culture)
- Methodological note: Used Claude (different vendor than training data generator GPT-4o-mini) to avoid evaluation circularity.
Known limitations
- Roman script only. Training data is 100% Roman Hinglish; mixed-script inputs (Devanagari) may not be handled robustly. Future v2 will address.
- Conversational > instructional. Model defaults to "friendly chat" mode which sometimes reduces precision on classification tasks (e.g., confuses sentiment vs intent classification).
- Synthetic training data. All training examples were generated by GPT-4o-mini; this introduces stylistic patterns specific to GPT-4o-mini that the fine-tune inherits.
- Small eval set. N=50 prompts; larger evaluation would tighten confidence intervals.
Citation
If you use this model, please cite:
@misc{hinglish-qwen-3b-2026,
title={Qwen2.5-3B Hinglish: QLoRA Fine-tuning for Indian Code-Mixed Conversation},
author={Muskan Jaiswal},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/DSMJ910/qwen2.5-3b-hinglish-lora}
}
- Downloads last month
- 38