Instructions to use vikhram-labs/NariRaksha-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use vikhram-labs/NariRaksha-3B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for vikhram-labs/NariRaksha-3B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for vikhram-labs/NariRaksha-3B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for vikhram-labs/NariRaksha-3B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="vikhram-labs/NariRaksha-3B", max_seq_length=2048, )
NariRaksha-3B
QLoRA fine-tune of Qwen/Qwen2.5-3B-Instruct on NariRaksha-100K, a dataset of Indian women's safety scenarios. The model produces structured safety assessments β risk type, severity, reasoning, recommended action, and legal context (BNS / IT Act / PWDVA) β from a free-text scenario description.
Status: open research artifact. This is an early-stage fine-tune released for research, replication, and community evaluation β not a validated or deployment-ready safety system. See Evaluation and Limitations before using it for anything beyond experimentation.
Quick Usage
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"vikhram-labs/NariRaksha-3B", load_in_4bit=True
)
FastLanguageModel.for_inference(model)
prompt = "A woman in Chennai has been receiving repeated unwanted messages from a former colleague across multiple platforms over several weeks."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Method | QLoRA, 4-bit (NF4) |
| LoRA rank / alpha | r=16, Ξ±=32 |
| Framework | Unsloth |
| Training data | Full NariRaksha-100K (~100K rows, single split, no held-out eval set) |
| Steps | 100 |
Training Loss
| Step | Training Loss |
|---|---|
| 25 | 2.3122 |
| 50 | 0.2770 |
| 75 | 0.0698 |
This is the complete loss telemetry currently available β three logged points over 100 total steps, training loss only. Read it as a directional signal that the model is fitting the data quickly, not as evidence of task generalization. See Evaluation for why.
Evaluation
No held-out validation/test split or downstream benchmark has been run on this model. NariRaksha-100K was released as a single split, and training to date has used 100% of it, so the loss curve above reflects in-sample fit only.
The drop from 2.31 β 0.07 over 75 steps is steep enough, on a 3B model with
this little data exposure, that it should be read as a likely signal of
memorization rather than generalization β particularly since a
meaningful share of the dataset's reasoning and recommended_action
fields are template-conditioned and repeat near-verbatim across many rows
(documented in the dataset card).
A model can drive loss this low partly by memorizing a small set of stock
phrases rather than learning to reason over novel scenarios.
What's needed before this can be called a validated model: a held-out
eval split stratified by risk_type/severity, eval-loss tracking
alongside training loss, and qualitative testing on scenarios outside the
training distribution (including paraphrases and edge cases). None of that
exists yet for this checkpoint.
Intended Use
Released as an open research artifact for:
- Studying QLoRA fine-tuning behavior on small, template-heavy safety datasets
- Replication and ablation by other researchers
- A baseline to compare against once eval infrastructure exists
Not currently intended for:
- Production deployment in any safety, triage, or emergency-response context
- Use as a source of legal citations or helpline numbers without independent verification β the underlying dataset's legal/contact information is only partially verified (see dataset card)
- Any setting where a hallucinated or memorized-but-wrong output could cause real-world harm to a person in distress
Limitations
- No eval split / no generalization evidence. See Evaluation above.
- Likely overfitting at current checkpoint. Outputs on scenarios close to training examples may look strong; outputs on genuinely novel scenarios are untested and may default to memorized boilerplate.
- Inherits dataset limitations. Legal citations (BNS/IT Act/PWDVA sections) and helpline numbers in training data are a mix of verified and unverified entries β the model can confidently generate incorrect ones.
- Small training run. 100 steps is a minimal fine-tune; this checkpoint should be treated as a proof of concept, not a finished model.
- Single-domain legal context. Legal references are India-specific and not applicable elsewhere.
Next Steps (Planned / Suggested)
- Carve out a stratified eval split from NariRaksha-100K and report eval loss alongside training loss
- Deduplicate or reweight template-heavy
reasoning/recommended_actionspans to reduce memorization pressure - Independent verification pass on legal/helpline fields prior to any claim of factual reliability
- Longer training run with proper train/eval tracking before any deployment-readiness claims
Citation
@incollection{nariraksha2026,
title = {NariRaksha: Gender-Responsive AI for Women's Safety},
author = {Vikhram S and Jeffin Gracewell J},
booktitle = {India AI Impact Summit 2026 Casebook on AI and Gender Empowerment},
year = {2026},
publisher = {Ministry of Electronics and Information Technology (MeitY), Government of India and UN Women}
}
License
Apache 2.0.