NariRaksha-3B

QLoRA fine-tune of Qwen/Qwen2.5-3B-Instruct on NariRaksha-100K, a dataset of Indian women's safety scenarios. The model produces structured safety assessments β€” risk type, severity, reasoning, recommended action, and legal context (BNS / IT Act / PWDVA) β€” from a free-text scenario description.

Status: open research artifact. This is an early-stage fine-tune released for research, replication, and community evaluation β€” not a validated or deployment-ready safety system. See Evaluation and Limitations before using it for anything beyond experimentation.

Quick Usage

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "vikhram-labs/NariRaksha-3B", load_in_4bit=True
)
FastLanguageModel.for_inference(model)

prompt = "A woman in Chennai has been receiving repeated unwanted messages from a former colleague across multiple platforms over several weeks."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Base model Qwen/Qwen2.5-3B-Instruct
Method QLoRA, 4-bit (NF4)
LoRA rank / alpha r=16, Ξ±=32
Framework Unsloth
Training data Full NariRaksha-100K (~100K rows, single split, no held-out eval set)
Steps 100

Training Loss

Step Training Loss
25 2.3122
50 0.2770
75 0.0698

This is the complete loss telemetry currently available β€” three logged points over 100 total steps, training loss only. Read it as a directional signal that the model is fitting the data quickly, not as evidence of task generalization. See Evaluation for why.

Evaluation

No held-out validation/test split or downstream benchmark has been run on this model. NariRaksha-100K was released as a single split, and training to date has used 100% of it, so the loss curve above reflects in-sample fit only.

The drop from 2.31 β†’ 0.07 over 75 steps is steep enough, on a 3B model with this little data exposure, that it should be read as a likely signal of memorization rather than generalization β€” particularly since a meaningful share of the dataset's reasoning and recommended_action fields are template-conditioned and repeat near-verbatim across many rows (documented in the dataset card). A model can drive loss this low partly by memorizing a small set of stock phrases rather than learning to reason over novel scenarios.

What's needed before this can be called a validated model: a held-out eval split stratified by risk_type/severity, eval-loss tracking alongside training loss, and qualitative testing on scenarios outside the training distribution (including paraphrases and edge cases). None of that exists yet for this checkpoint.

Intended Use

Released as an open research artifact for:

  • Studying QLoRA fine-tuning behavior on small, template-heavy safety datasets
  • Replication and ablation by other researchers
  • A baseline to compare against once eval infrastructure exists

Not currently intended for:

  • Production deployment in any safety, triage, or emergency-response context
  • Use as a source of legal citations or helpline numbers without independent verification β€” the underlying dataset's legal/contact information is only partially verified (see dataset card)
  • Any setting where a hallucinated or memorized-but-wrong output could cause real-world harm to a person in distress

Limitations

  • No eval split / no generalization evidence. See Evaluation above.
  • Likely overfitting at current checkpoint. Outputs on scenarios close to training examples may look strong; outputs on genuinely novel scenarios are untested and may default to memorized boilerplate.
  • Inherits dataset limitations. Legal citations (BNS/IT Act/PWDVA sections) and helpline numbers in training data are a mix of verified and unverified entries β€” the model can confidently generate incorrect ones.
  • Small training run. 100 steps is a minimal fine-tune; this checkpoint should be treated as a proof of concept, not a finished model.
  • Single-domain legal context. Legal references are India-specific and not applicable elsewhere.

Next Steps (Planned / Suggested)

  • Carve out a stratified eval split from NariRaksha-100K and report eval loss alongside training loss
  • Deduplicate or reweight template-heavy reasoning / recommended_action spans to reduce memorization pressure
  • Independent verification pass on legal/helpline fields prior to any claim of factual reliability
  • Longer training run with proper train/eval tracking before any deployment-readiness claims

Citation

@incollection{nariraksha2026,
  title     = {NariRaksha: Gender-Responsive AI for Women's Safety},
  author    = {Vikhram S and Jeffin Gracewell J},
  booktitle = {India AI Impact Summit 2026 Casebook on AI and Gender Empowerment},
  year      = {2026},
  publisher = {Ministry of Electronics and Information Technology (MeitY), Government of India and UN Women}
}

License

Apache 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vikhram-labs/NariRaksha-3B

Base model

Qwen/Qwen2.5-3B
Adapter
(1261)
this model

Dataset used to train vikhram-labs/NariRaksha-3B