StriSakhi-Gemma4-2B-LoRA β€” Legal AI Advocate for Indian Women

Developed by: Shubendu Biswas Competition: Kaggle Gemma 4 Hackathon / Women in AI Challenge
Base Model: unsloth/gemma-4-E2B-it-unsloth-bnb-4bit
Model Type: Causal Language Model (LoRA Adapter)
Languages: Hindi (Devanagari), English, Hinglish input β†’ Hindi/English output
License: Apache-2.0 (same as base)
Finetuned by: Unsloth + PEFT (Hugging Face ecosystem)


Model Summary

StriSakhi ("Legal Companion") is a fine-tuned Gemma-4 2B Instruct model specialized as a warm, authoritative legal guide for Indian women seeking rights-based information. Unlike general-purpose LLMs, it is explicitly trained to:

  • Respond in simple, sister-like Hindi (or English when requested)
  • Cite actual Indian laws with correct section numbers (DV Act 2005, POSH Act 2013, CrPC 125, etc.)
  • Structure every response into 5 mandatory blocks: Empathy β†’ Rights β†’ Action Timeline β†’ Helpline β†’ Follow-up Question
  • Maintain β‰₯85% Devanagari purity for Hindi sessions (no Roman script leakage)
  • Refuse to generate harmful advice (e.g., never suggests "compromise" in domestic violence cases)

Key Differentiator: This is a safety-first, rights-first legal domain model with structured output conditioning baked into the LoRA weights via 549 curated conversational examples.


Competition Results

Benchmark Score Pass Rate
Overall (50 cases) 86.4% 43/50 (86%)
Domestic Violence (10) 91.2% 9/10
Property Rights (8) 84.5% 7/8
Maintenance/Divorce (8) 82.1% 6/8
Dowry Harassment (5) 88.0% 4/5
Workplace/POSH (5) 90.0% 5/5
Hinglish β†’ Hindi (8) 85.4% 7/8
Follow-up Short (6) 79.2% 5/6

Benchmark: Custom 50-case legal evaluation suite covering 7 crime categories with automated checks for citation accuracy, Hindi purity, timeline structure, and hallucination resistance.


Intended Use

Primary Use Cases

  • Legal intake chatbot for NGOs and legal aid clinics serving women in India
  • First-response information for domestic violence, property rights, maintenance, dowry, and workplace harassment queries
  • Hinglish-to-Hindi translation with legal domain expertise (critical for Tier-2/3 India users)
  • Follow-up Q&A after initial legal guidance (short-form answers)

Out-of-Scope Use

  • Not a substitute for a licensed advocate. Always directs users to NALSA (15100) and DLSA for actual representation.
  • Not for emergency response. Critical emergencies ("happening right now") are handled by a separate hardcoded detector upstream.
  • Not for non-Indian jurisdictions. Law citations are India-specific.
  • Not for document drafting. Provides guidance, not executable legal documents.

Training Details

Hardware

Spec Value
GPU NVIDIA Tesla T4 (Kaggle)
VRAM 14.5 GB
Training Time ~35 minutes
Framework Unsloth 2026.5.2 + Transformers 5.5.0

Hyperparameters

Parameter Value
Base Model unsloth/gemma-4-E2B-it-unsloth-bnb-4bit
Method LoRA (PEFT)
Rank (r) 8
Alpha (lora_alpha) 8
Dropout 0.0
Target Modules Attention + MLP (vision frozen)
Sequence Length 4096
Quantization 4-bit BnB (NF4)
Batch Size 2
Gradient Accumulation 4
Effective Batch Size 8
Learning Rate 2e-4
LR Scheduler Linear
Warmup Steps 5
Epochs 3
Optimizer AdamW 8-bit
Weight Decay 0.001
Seed 42

Dataset

  • Size: 549 conversational examples
  • Format: ShareGPT-style JSONL with conversations array (system/user/assistant turns)
  • Coverage:
    • Domestic Violence (DV Act 2005) β€” 35%
    • Property / Inheritance β€” 20%
    • Maintenance / Divorce β€” 20%
    • Dowry / 498A β€” 10%
    • Workplace / POSH Act β€” 10%
    • Follow-up short answers β€” 5%
  • Language Distribution: 70% Hindi output, 20% English output, 10% Hinglish input β†’ Hindi output
  • Data Source: Synthetic + manually curated legal scenarios based on actual case patterns from Indian district courts. No private user data.

Training Procedure

  1. Template Alignment: Applied Gemma-4 non-thinking chat template to match production llama-server deployment
  2. Label Masking: System + user tokens masked as -100 (ignored in loss); only assistant responses trained
  3. BOS Deduplication: Removed duplicate <bos> tokens introduced by processor
  4. Marker-Based Splitting: Used <|turn>model\n boundary to precisely mask prefix vs. suffix
  5. Checkpointing: Saved every 50 steps; best checkpoint at step 207 (epoch 3, final loss: 0.3487)

Ethical Statement & Safety

Bias Mitigation

  • Gender-specific by design: Model is explicitly conditioned to advocate for women's legal rights; it does not attempt "neutral" framing that could minimize violence (e.g., refuses to call DV a "family matter").
  • Language equity: Trained to serve Hinglish-speaking users (common in rural India) by converting to pure Devanagari, reducing the digital language divide.
  • Caste/religion awareness: Examples include Hindu Succession Act, Muslim Women Protection Act, and CrPC (secular), avoiding majority-religion bias.

Safety Evaluations

Risk Mitigation Status
Hallucinated section numbers RAG context injected in system prompt; model trained ONLY on provided legal text Tested
Victim-blaming Explicit negative training: never says "talk to husband", "compromise", "family matter" Tested
Emergency mishandling Upstream hardcoded detector bypasses LLM for active violence; this model handles post-emergency guidance Tested
Hindi-English script mixing Purity checker enforces β‰₯85% Devanagari; LoRA trained on pure Devanagari targets Tested
Malevolent use (evasion advice) Refuses to provide advice on evading law; always directs to legal aid Monitored

Known Limitations

  1. RAG dependency: Citation accuracy depends on the quality of retrieved chunks from ChromaDB. Without RAG, the model may hallucinate sections.
  2. Thin coverage: Hindu Succession Act, CrPC 125, and Hindu Marriage Act chunks are smaller than DV Act / POSH Act in the retrieval corpus.
  3. Token length: Hindi Devanagari consumes ~1.5Γ— tokens per word vs. English; max 4096 context can truncate long RAG contexts.
  4. LoRA capacity: Rank-8 is lightweight; complex multi-act reasoning may require full fine-tune or higher rank.

How to Use

Quick Inference (Unsloth β€” recommended)

from unsloth import FastModel
from unsloth.chat_templates import get_chat_template

# Load base + LoRA adapter
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/gemma-4-E2B-it-unsloth-bnb-4bit",
    adapter_name="your-hf-username/stri-sakhi-gemma4-2b-lora",  # this repo
    max_seq_length=4096,
    load_in_4bit=True,
)

tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")

messages = [
    {"role": "system", "content": "Tum Kanoon Sakhi ho. Sirf Devanagari Hindi mein jawab do."},
    {"role": "user", "content": "mere pati ne mujhe ghar se nikaala hai"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.9,
)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

Merge & Export for Production (llama.cpp / vLLM)

# Merge LoRA into base for single-file deployment
model.save_pretrained_merged(
    "stri-sakhi-merged",
    tokenizer,
    save_method="merged_16bit",  # or "merged_4bit_for_mlx"
)

# Or export to GGUF for llama.cpp server
model.save_pretrained_gguf(
    "stri-sakhi-q4_k_m",
    tokenizer,
    quantization_method="q4_k_m",
)

Repository Structure

.
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ adapter_config.json       # LoRA config (PEFT)
β”œβ”€β”€ adapter_model.safetensors # LoRA weights (~16 MB)
β”œβ”€β”€ tokenizer/              # Tokenizer files (if customized)
β”œβ”€β”€ benchmark_results.json  # 50-case evaluation raw results
β”œβ”€β”€ training_logs.txt       # Loss curves per step
└── sample_inference.ipynb  # Reproducible inference demo

Training Loss Curve

Step Loss
10 2.373
50 0.315
100 0.162
150 0.130
200 0.123
207 (final) 0.349*

Final epoch loss is higher than mid-epoch because the last batch contains harder, longer examples (property rights with multiple citations).


Acknowledgements

  • Google DeepMind for the Gemma-4 model family and open weights
  • Unsloth team for 2Γ— faster, 50% memory-reduced fine-tuning
  • Hugging Face PEFT & Transformers libraries
  • Kaggle for Tesla T4 GPU access
  • NALSA & DLSA India for the legal aid framework this model promotes

Citation

If you use this model in research or production, please cite:

@misc{stri-sakhi-gemma4-2b-2026,
  title = {StriSakhi: A Safety-First Legal Advocate LLM for Indian Women},
  author = {shubendu biswas},
  year = {2026},
  howpublished = {\url{https://huggingface.co/your-username/stri-sakhi-gemma4-2b-lora}},
  note = {Fine-tuned Gemma-4 2B Instruct with LoRA for structured legal guidance}
}

Base model citation:

@article{gemma4-2026,
  title={Gemma 4: A family of highly capable multimodal models},
  author={Google DeepMind},
  year={2026}
}

Disclaimer

This model provides general legal information only and does not constitute legal advice. It is not a substitute for a licensed advocate. Always contact NALSA 15100 or your District Legal Services Authority (DLSA) for case-specific representation. The developers assume no liability for actions taken based on model outputs.


Model card generated for Hugging Face Open Source AI Challenge β€” Women Safety & Empowerment Track.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for snake4u1/strisakhi-gemma4-lora

Adapter
(7)
this model

Evaluation results

  • Section Citation Accuracy on StriSakhi Legal Training Corpus
    self-reported
    0.860
  • Hindi Purity (Devanagari Ratio) on StriSakhi Legal Training Corpus
    self-reported
    0.890
  • Overall Benchmark Pass Rate on StriSakhi Legal Training Corpus
    self-reported
    0.864