Instructions to use snake4u1/strisakhi-gemma4-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use snake4u1/strisakhi-gemma4-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="snake4u1/strisakhi-gemma4-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("snake4u1/strisakhi-gemma4-lora", dtype="auto") - PEFT
How to use snake4u1/strisakhi-gemma4-lora with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use snake4u1/strisakhi-gemma4-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "snake4u1/strisakhi-gemma4-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "snake4u1/strisakhi-gemma4-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/snake4u1/strisakhi-gemma4-lora
- SGLang
How to use snake4u1/strisakhi-gemma4-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "snake4u1/strisakhi-gemma4-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "snake4u1/strisakhi-gemma4-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "snake4u1/strisakhi-gemma4-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "snake4u1/strisakhi-gemma4-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use snake4u1/strisakhi-gemma4-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for snake4u1/strisakhi-gemma4-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for snake4u1/strisakhi-gemma4-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for snake4u1/strisakhi-gemma4-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="snake4u1/strisakhi-gemma4-lora", max_seq_length=2048, ) - Docker Model Runner
How to use snake4u1/strisakhi-gemma4-lora with Docker Model Runner:
docker model run hf.co/snake4u1/strisakhi-gemma4-lora
StriSakhi-Gemma4-2B-LoRA β Legal AI Advocate for Indian Women
Developed by: Shubendu Biswas
Competition: Kaggle Gemma 4 Hackathon / Women in AI Challenge
Base Model: unsloth/gemma-4-E2B-it-unsloth-bnb-4bit
Model Type: Causal Language Model (LoRA Adapter)
Languages: Hindi (Devanagari), English, Hinglish input β Hindi/English output
License: Apache-2.0 (same as base)
Finetuned by: Unsloth + PEFT (Hugging Face ecosystem)
Model Summary
StriSakhi ("Legal Companion") is a fine-tuned Gemma-4 2B Instruct model specialized as a warm, authoritative legal guide for Indian women seeking rights-based information. Unlike general-purpose LLMs, it is explicitly trained to:
- Respond in simple, sister-like Hindi (or English when requested)
- Cite actual Indian laws with correct section numbers (DV Act 2005, POSH Act 2013, CrPC 125, etc.)
- Structure every response into 5 mandatory blocks: Empathy β Rights β Action Timeline β Helpline β Follow-up Question
- Maintain β₯85% Devanagari purity for Hindi sessions (no Roman script leakage)
- Refuse to generate harmful advice (e.g., never suggests "compromise" in domestic violence cases)
Key Differentiator: This is a safety-first, rights-first legal domain model with structured output conditioning baked into the LoRA weights via 549 curated conversational examples.
Competition Results
| Benchmark | Score | Pass Rate |
|---|---|---|
| Overall (50 cases) | 86.4% | 43/50 (86%) |
| Domestic Violence (10) | 91.2% | 9/10 |
| Property Rights (8) | 84.5% | 7/8 |
| Maintenance/Divorce (8) | 82.1% | 6/8 |
| Dowry Harassment (5) | 88.0% | 4/5 |
| Workplace/POSH (5) | 90.0% | 5/5 |
| Hinglish β Hindi (8) | 85.4% | 7/8 |
| Follow-up Short (6) | 79.2% | 5/6 |
Benchmark: Custom 50-case legal evaluation suite covering 7 crime categories with automated checks for citation accuracy, Hindi purity, timeline structure, and hallucination resistance.
Intended Use
Primary Use Cases
- Legal intake chatbot for NGOs and legal aid clinics serving women in India
- First-response information for domestic violence, property rights, maintenance, dowry, and workplace harassment queries
- Hinglish-to-Hindi translation with legal domain expertise (critical for Tier-2/3 India users)
- Follow-up Q&A after initial legal guidance (short-form answers)
Out-of-Scope Use
- Not a substitute for a licensed advocate. Always directs users to NALSA (15100) and DLSA for actual representation.
- Not for emergency response. Critical emergencies ("happening right now") are handled by a separate hardcoded detector upstream.
- Not for non-Indian jurisdictions. Law citations are India-specific.
- Not for document drafting. Provides guidance, not executable legal documents.
Training Details
Hardware
| Spec | Value |
|---|---|
| GPU | NVIDIA Tesla T4 (Kaggle) |
| VRAM | 14.5 GB |
| Training Time | ~35 minutes |
| Framework | Unsloth 2026.5.2 + Transformers 5.5.0 |
Hyperparameters
| Parameter | Value |
|---|---|
| Base Model | unsloth/gemma-4-E2B-it-unsloth-bnb-4bit |
| Method | LoRA (PEFT) |
Rank (r) |
8 |
Alpha (lora_alpha) |
8 |
| Dropout | 0.0 |
| Target Modules | Attention + MLP (vision frozen) |
| Sequence Length | 4096 |
| Quantization | 4-bit BnB (NF4) |
| Batch Size | 2 |
| Gradient Accumulation | 4 |
| Effective Batch Size | 8 |
| Learning Rate | 2e-4 |
| LR Scheduler | Linear |
| Warmup Steps | 5 |
| Epochs | 3 |
| Optimizer | AdamW 8-bit |
| Weight Decay | 0.001 |
| Seed | 42 |
Dataset
- Size: 549 conversational examples
- Format: ShareGPT-style JSONL with
conversationsarray (system/user/assistant turns) - Coverage:
- Domestic Violence (DV Act 2005) β 35%
- Property / Inheritance β 20%
- Maintenance / Divorce β 20%
- Dowry / 498A β 10%
- Workplace / POSH Act β 10%
- Follow-up short answers β 5%
- Language Distribution: 70% Hindi output, 20% English output, 10% Hinglish input β Hindi output
- Data Source: Synthetic + manually curated legal scenarios based on actual case patterns from Indian district courts. No private user data.
Training Procedure
- Template Alignment: Applied Gemma-4 non-thinking chat template to match production llama-server deployment
- Label Masking: System + user tokens masked as
-100(ignored in loss); only assistant responses trained - BOS Deduplication: Removed duplicate
<bos>tokens introduced by processor - Marker-Based Splitting: Used
<|turn>model\nboundary to precisely mask prefix vs. suffix - Checkpointing: Saved every 50 steps; best checkpoint at step 207 (epoch 3, final loss: 0.3487)
Ethical Statement & Safety
Bias Mitigation
- Gender-specific by design: Model is explicitly conditioned to advocate for women's legal rights; it does not attempt "neutral" framing that could minimize violence (e.g., refuses to call DV a "family matter").
- Language equity: Trained to serve Hinglish-speaking users (common in rural India) by converting to pure Devanagari, reducing the digital language divide.
- Caste/religion awareness: Examples include Hindu Succession Act, Muslim Women Protection Act, and CrPC (secular), avoiding majority-religion bias.
Safety Evaluations
| Risk | Mitigation | Status |
|---|---|---|
| Hallucinated section numbers | RAG context injected in system prompt; model trained ONLY on provided legal text | Tested |
| Victim-blaming | Explicit negative training: never says "talk to husband", "compromise", "family matter" | Tested |
| Emergency mishandling | Upstream hardcoded detector bypasses LLM for active violence; this model handles post-emergency guidance | Tested |
| Hindi-English script mixing | Purity checker enforces β₯85% Devanagari; LoRA trained on pure Devanagari targets | Tested |
| Malevolent use (evasion advice) | Refuses to provide advice on evading law; always directs to legal aid | Monitored |
Known Limitations
- RAG dependency: Citation accuracy depends on the quality of retrieved chunks from ChromaDB. Without RAG, the model may hallucinate sections.
- Thin coverage: Hindu Succession Act, CrPC 125, and Hindu Marriage Act chunks are smaller than DV Act / POSH Act in the retrieval corpus.
- Token length: Hindi Devanagari consumes ~1.5Γ tokens per word vs. English; max 4096 context can truncate long RAG contexts.
- LoRA capacity: Rank-8 is lightweight; complex multi-act reasoning may require full fine-tune or higher rank.
How to Use
Quick Inference (Unsloth β recommended)
from unsloth import FastModel
from unsloth.chat_templates import get_chat_template
# Load base + LoRA adapter
model, tokenizer = FastModel.from_pretrained(
model_name="unsloth/gemma-4-E2B-it-unsloth-bnb-4bit",
adapter_name="your-hf-username/stri-sakhi-gemma4-2b-lora", # this repo
max_seq_length=4096,
load_in_4bit=True,
)
tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")
messages = [
{"role": "system", "content": "Tum Kanoon Sakhi ho. Sirf Devanagari Hindi mein jawab do."},
{"role": "user", "content": "mere pati ne mujhe ghar se nikaala hai"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
outputs = model.generate(
input_ids=inputs,
max_new_tokens=512,
temperature=0.2,
top_p=0.9,
)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
Merge & Export for Production (llama.cpp / vLLM)
# Merge LoRA into base for single-file deployment
model.save_pretrained_merged(
"stri-sakhi-merged",
tokenizer,
save_method="merged_16bit", # or "merged_4bit_for_mlx"
)
# Or export to GGUF for llama.cpp server
model.save_pretrained_gguf(
"stri-sakhi-q4_k_m",
tokenizer,
quantization_method="q4_k_m",
)
Repository Structure
.
βββ README.md # This file
βββ adapter_config.json # LoRA config (PEFT)
βββ adapter_model.safetensors # LoRA weights (~16 MB)
βββ tokenizer/ # Tokenizer files (if customized)
βββ benchmark_results.json # 50-case evaluation raw results
βββ training_logs.txt # Loss curves per step
βββ sample_inference.ipynb # Reproducible inference demo
Training Loss Curve
| Step | Loss |
|---|---|
| 10 | 2.373 |
| 50 | 0.315 |
| 100 | 0.162 |
| 150 | 0.130 |
| 200 | 0.123 |
| 207 (final) | 0.349* |
Final epoch loss is higher than mid-epoch because the last batch contains harder, longer examples (property rights with multiple citations).
Acknowledgements
- Google DeepMind for the Gemma-4 model family and open weights
- Unsloth team for 2Γ faster, 50% memory-reduced fine-tuning
- Hugging Face PEFT & Transformers libraries
- Kaggle for Tesla T4 GPU access
- NALSA & DLSA India for the legal aid framework this model promotes
Citation
If you use this model in research or production, please cite:
@misc{stri-sakhi-gemma4-2b-2026,
title = {StriSakhi: A Safety-First Legal Advocate LLM for Indian Women},
author = {shubendu biswas},
year = {2026},
howpublished = {\url{https://huggingface.co/your-username/stri-sakhi-gemma4-2b-lora}},
note = {Fine-tuned Gemma-4 2B Instruct with LoRA for structured legal guidance}
}
Base model citation:
@article{gemma4-2026,
title={Gemma 4: A family of highly capable multimodal models},
author={Google DeepMind},
year={2026}
}
Disclaimer
This model provides general legal information only and does not constitute legal advice. It is not a substitute for a licensed advocate. Always contact NALSA 15100 or your District Legal Services Authority (DLSA) for case-specific representation. The developers assume no liability for actions taken based on model outputs.
Model card generated for Hugging Face Open Source AI Challenge β Women Safety & Empowerment Track.
Model tree for snake4u1/strisakhi-gemma4-lora
Base model
google/gemma-4-E2BEvaluation results
- Section Citation Accuracy on StriSakhi Legal Training Corpusself-reported0.860
- Hindi Purity (Devanagari Ratio) on StriSakhi Legal Training Corpusself-reported0.890
- Overall Benchmark Pass Rate on StriSakhi Legal Training Corpusself-reported0.864