Instructions to use snake4u1/strisakhi-gemma4-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use snake4u1/strisakhi-gemma4-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="snake4u1/strisakhi-gemma4-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("snake4u1/strisakhi-gemma4-lora", dtype="auto")

PEFT
How to use snake4u1/strisakhi-gemma4-lora with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use snake4u1/strisakhi-gemma4-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "snake4u1/strisakhi-gemma4-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "snake4u1/strisakhi-gemma4-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/snake4u1/strisakhi-gemma4-lora

SGLang

How to use snake4u1/strisakhi-gemma4-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "snake4u1/strisakhi-gemma4-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "snake4u1/strisakhi-gemma4-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "snake4u1/strisakhi-gemma4-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "snake4u1/strisakhi-gemma4-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use snake4u1/strisakhi-gemma4-lora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for snake4u1/strisakhi-gemma4-lora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for snake4u1/strisakhi-gemma4-lora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for snake4u1/strisakhi-gemma4-lora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="snake4u1/strisakhi-gemma4-lora",
    max_seq_length=2048,
)

Docker Model Runner
How to use snake4u1/strisakhi-gemma4-lora with Docker Model Runner:
```
docker model run hf.co/snake4u1/strisakhi-gemma4-lora
```

StriSakhi-Gemma4-2B-LoRA — Legal AI Advocate for Indian Women

Developed by: Shubendu Biswas Competition: Kaggle Gemma 4 Hackathon / Women in AI Challenge
Base Model: unsloth/gemma-4-E2B-it-unsloth-bnb-4bit
Model Type: Causal Language Model (LoRA Adapter)
Languages: Hindi (Devanagari), English, Hinglish input → Hindi/English output
License: Apache-2.0 (same as base)
Finetuned by: Unsloth + PEFT (Hugging Face ecosystem)

Model Summary

StriSakhi ("Legal Companion") is a fine-tuned Gemma-4 2B Instruct model specialized as a warm, authoritative legal guide for Indian women seeking rights-based information. Unlike general-purpose LLMs, it is explicitly trained to:

Respond in simple, sister-like Hindi (or English when requested)
Cite actual Indian laws with correct section numbers (DV Act 2005, POSH Act 2013, CrPC 125, etc.)
Structure every response into 5 mandatory blocks: Empathy → Rights → Action Timeline → Helpline → Follow-up Question
Maintain ≥85% Devanagari purity for Hindi sessions (no Roman script leakage)
Refuse to generate harmful advice (e.g., never suggests "compromise" in domestic violence cases)

Key Differentiator: This is a safety-first, rights-first legal domain model with structured output conditioning baked into the LoRA weights via 549 curated conversational examples.

Competition Results

Benchmark	Score	Pass Rate
Overall (50 cases)	86.4%	43/50 (86%)
Domestic Violence (10)	91.2%	9/10
Property Rights (8)	84.5%	7/8
Maintenance/Divorce (8)	82.1%	6/8
Dowry Harassment (5)	88.0%	4/5
Workplace/POSH (5)	90.0%	5/5
Hinglish → Hindi (8)	85.4%	7/8
Follow-up Short (6)	79.2%	5/6

Benchmark: Custom 50-case legal evaluation suite covering 7 crime categories with automated checks for citation accuracy, Hindi purity, timeline structure, and hallucination resistance.

Intended Use

Primary Use Cases

Legal intake chatbot for NGOs and legal aid clinics serving women in India
First-response information for domestic violence, property rights, maintenance, dowry, and workplace harassment queries
Hinglish-to-Hindi translation with legal domain expertise (critical for Tier-2/3 India users)
Follow-up Q&A after initial legal guidance (short-form answers)

Out-of-Scope Use

Not a substitute for a licensed advocate. Always directs users to NALSA (15100) and DLSA for actual representation.
Not for emergency response. Critical emergencies ("happening right now") are handled by a separate hardcoded detector upstream.
Not for non-Indian jurisdictions. Law citations are India-specific.
Not for document drafting. Provides guidance, not executable legal documents.

Training Details

Hardware

Spec	Value
GPU	NVIDIA Tesla T4 (Kaggle)
VRAM	14.5 GB
Training Time	~35 minutes
Framework	Unsloth 2026.5.2 + Transformers 5.5.0

Hyperparameters

Parameter	Value
Base Model	`unsloth/gemma-4-E2B-it-unsloth-bnb-4bit`
Method	LoRA (PEFT)
Rank (`r`)	8
Alpha (`lora_alpha`)	8
Dropout	0.0
Target Modules	Attention + MLP (vision frozen)
Sequence Length	4096
Quantization	4-bit BnB (NF4)
Batch Size	2
Gradient Accumulation	4
Effective Batch Size	8
Learning Rate	2e-4
LR Scheduler	Linear
Warmup Steps	5
Epochs	3
Optimizer	AdamW 8-bit
Weight Decay	0.001
Seed	42

Dataset

Size: 549 conversational examples
Format: ShareGPT-style JSONL with conversations array (system/user/assistant turns)
Coverage:
- Domestic Violence (DV Act 2005) — 35%
- Property / Inheritance — 20%
- Maintenance / Divorce — 20%
- Dowry / 498A — 10%
- Workplace / POSH Act — 10%
- Follow-up short answers — 5%
Language Distribution: 70% Hindi output, 20% English output, 10% Hinglish input → Hindi output
Data Source: Synthetic + manually curated legal scenarios based on actual case patterns from Indian district courts. No private user data.

Training Procedure

Template Alignment: Applied Gemma-4 non-thinking chat template to match production llama-server deployment
Label Masking: System + user tokens masked as -100 (ignored in loss); only assistant responses trained
BOS Deduplication: Removed duplicate <bos> tokens introduced by processor
Marker-Based Splitting: Used <|turn>model\n boundary to precisely mask prefix vs. suffix
Checkpointing: Saved every 50 steps; best checkpoint at step 207 (epoch 3, final loss: 0.3487)

Ethical Statement & Safety

Bias Mitigation

Gender-specific by design: Model is explicitly conditioned to advocate for women's legal rights; it does not attempt "neutral" framing that could minimize violence (e.g., refuses to call DV a "family matter").
Language equity: Trained to serve Hinglish-speaking users (common in rural India) by converting to pure Devanagari, reducing the digital language divide.
Caste/religion awareness: Examples include Hindu Succession Act, Muslim Women Protection Act, and CrPC (secular), avoiding majority-religion bias.

Safety Evaluations

Risk	Mitigation	Status
Hallucinated section numbers	RAG context injected in system prompt; model trained ONLY on provided legal text	Tested
Victim-blaming	Explicit negative training: never says "talk to husband", "compromise", "family matter"	Tested
Emergency mishandling	Upstream hardcoded detector bypasses LLM for active violence; this model handles post-emergency guidance	Tested
Hindi-English script mixing	Purity checker enforces ≥85% Devanagari; LoRA trained on pure Devanagari targets	Tested
Malevolent use (evasion advice)	Refuses to provide advice on evading law; always directs to legal aid	Monitored

Known Limitations

RAG dependency: Citation accuracy depends on the quality of retrieved chunks from ChromaDB. Without RAG, the model may hallucinate sections.
Thin coverage: Hindu Succession Act, CrPC 125, and Hindu Marriage Act chunks are smaller than DV Act / POSH Act in the retrieval corpus.
Token length: Hindi Devanagari consumes ~1.5× tokens per word vs. English; max 4096 context can truncate long RAG contexts.
LoRA capacity: Rank-8 is lightweight; complex multi-act reasoning may require full fine-tune or higher rank.

How to Use

Quick Inference (Unsloth — recommended)

from unsloth import FastModel
from unsloth.chat_templates import get_chat_template

# Load base + LoRA adapter
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/gemma-4-E2B-it-unsloth-bnb-4bit",
    adapter_name="your-hf-username/stri-sakhi-gemma4-2b-lora",  # this repo
    max_seq_length=4096,
    load_in_4bit=True,
)

tokenizer = get_chat_template(tokenizer, chat_template="gemma-4")

messages = [
    {"role": "system", "content": "Tum Kanoon Sakhi ho. Sirf Devanagari Hindi mein jawab do."},
    {"role": "user", "content": "mere pati ne mujhe ghar se nikaala hai"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")

outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.9,
)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

Merge & Export for Production (llama.cpp / vLLM)

# Merge LoRA into base for single-file deployment
model.save_pretrained_merged(
    "stri-sakhi-merged",
    tokenizer,
    save_method="merged_16bit",  # or "merged_4bit_for_mlx"
)

# Or export to GGUF for llama.cpp server
model.save_pretrained_gguf(
    "stri-sakhi-q4_k_m",
    tokenizer,
    quantization_method="q4_k_m",
)

Repository Structure

.
├── README.md                 # This file
├── adapter_config.json       # LoRA config (PEFT)
├── adapter_model.safetensors # LoRA weights (~16 MB)
├── tokenizer/              # Tokenizer files (if customized)
├── benchmark_results.json  # 50-case evaluation raw results
├── training_logs.txt       # Loss curves per step
└── sample_inference.ipynb  # Reproducible inference demo

Training Loss Curve

Step	Loss
10	2.373
50	0.315
100	0.162
150	0.130
200	0.123
207 (final)	0.349*

Final epoch loss is higher than mid-epoch because the last batch contains harder, longer examples (property rights with multiple citations).

Acknowledgements

Google DeepMind for the Gemma-4 model family and open weights
Unsloth team for 2× faster, 50% memory-reduced fine-tuning
Hugging Face PEFT & Transformers libraries
Kaggle for Tesla T4 GPU access
NALSA & DLSA India for the legal aid framework this model promotes

Citation

If you use this model in research or production, please cite:

@misc{stri-sakhi-gemma4-2b-2026,
  title = {StriSakhi: A Safety-First Legal Advocate LLM for Indian Women},
  author = {shubendu biswas},
  year = {2026},
  howpublished = {\url{https://huggingface.co/your-username/stri-sakhi-gemma4-2b-lora}},
  note = {Fine-tuned Gemma-4 2B Instruct with LoRA for structured legal guidance}
}

Base model citation:

@article{gemma4-2026,
  title={Gemma 4: A family of highly capable multimodal models},
  author={Google DeepMind},
  year={2026}
}

Disclaimer

This model provides general legal information only and does not constitute legal advice. It is not a substitute for a licensed advocate. Always contact NALSA 15100 or your District Legal Services Authority (DLSA) for case-specific representation. The developers assume no liability for actions taken based on model outputs.

Model card generated for Hugging Face Open Source AI Challenge — Women Safety & Empowerment Track.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for snake4u1/strisakhi-gemma4-lora

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Quantized

unsloth/gemma-4-E2B-it-unsloth-bnb-4bit

Adapter

(7)

this model

Evaluation results

Section Citation Accuracy on StriSakhi Legal Training Corpus
self-reported

0.860
Hindi Purity (Devanagari Ratio) on StriSakhi Legal Training Corpus
self-reported

0.890
Overall Benchmark Pass Rate on StriSakhi Legal Training Corpus
self-reported

0.864