Instructions to use agp9/gemma-4-e2b-sos-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use agp9/gemma-4-e2b-sos-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e2b-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "agp9/gemma-4-e2b-sos-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use agp9/gemma-4-e2b-sos-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for agp9/gemma-4-e2b-sos-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for agp9/gemma-4-e2b-sos-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for agp9/gemma-4-e2b-sos-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="agp9/gemma-4-e2b-sos-lora", max_seq_length=2048, )
Gemma-4-E2B-SOS-LoRA
Fine-tuned LoRA adapter for disaster triage and emergency response.
Built on Gemma 4 E2B using Unsloth + TRL SFTTrainer. Trained on a synthetic dataset of ~2000 START Protocol triage and FEMA emergency scenarios.
Base Model
unsloth/gemma-4-e2b-it-unsloth-bnb-4bit — Gemma 4 E2B (5.15B total, 2.3B effective parameters) in 4-bit NF4 quantization.
Training Details
- Hardware: Kaggle T4 (16 GB VRAM, NVIDIA Tesla T4)
- Framework: Unsloth 2026.5.2 + PEFT 0.18.1 + TRL SFTTrainer
- Quantization: 4-bit NF4 (bitsandbytes)
- LoRA Rank: 16
- LoRA Alpha: 16
- LoRA Dropout: 0
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Batch Size: 2 (gradient accumulation 4 → effective batch 8)
- Learning Rate: 2e-4 (cosine schedule, 10 warmup steps)
- Optimizer: AdamW 8-bit
- Precision: BF16 mixed precision
- Training Steps: 500 (2 epochs, 2000 examples)
- Final Loss: 0.1424
- Total Runtime: 826 seconds (13.8 min on T4)
- Trainable Parameters: 31,006,720 (0.60% of 5.15B)
- Adapter Size: 124 MB (safetensors format)
Dataset
Synthetic dataset with 2000 examples across three categories:
START Protocol triage (1200 examples) — Victim assessment based on respiratory rate, pulse, capillary refill, and mental status. Outputs RED (Immediate), YELLOW (Delayed), GREEN (Minor), or BLACK (Deceased) per the START triage system.
FEMA emergency response (500 examples) — Protocols for earthquake, fire, flood, tornado, tsunami, CPR, bleeding control, burns, hypothermia, heat stroke, snake bites, chemical exposure, choking, allergic reactions, and more.
Triage edge cases (300 examples) — Respiratory distress, mass casualty scenarios, pediatric victims, pregnant patients, amputations, and multi-casualty sorting.
Usage
With Unsloth (recommended)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name='unsloth/gemma-4-e2b-it-unsloth-bnb-4bit',
max_seq_length=2048,
load_in_4bit=True,
)
model.load_adapter('agp9/gemma-4-e2b-sos-lora')
FastLanguageModel.for_inference(model)
messages = [{'role': 'user', 'content': [{'type': 'text', 'text': 'START triage: Adult male found crushed under rubble. RR=6, pulse=absent, cap_refill=4s, mental=unresponsive.'}]}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda')
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
With PEFT
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e2b-it-unsloth-bnb-4bit", device_map="auto")
model = PeftModel.from_pretrained(base, "agp9/gemma-4-e2b-sos-lora")
tokenizer = AutoTokenizer.from_pretrained("unsloth/gemma-4-e2b-it-unsloth-bnb-4bit")
Inference Results
Example outputs from the fine-tuned model:
| Input | Output |
|---|---|
| RR=6, pulse=absent, unresponsive | RED - Immediate |
| RR=22, pulse=present, cap_refill=2s, alert | YELLOW - Delayed |
| "What to do during an earthquake?" | DROP, COVER, HOLD ON. |
| "How do I stop severe bleeding?" | Direct pressure. Elevate. Tourniquet as last resort. |
Training Notebook
Project
Gemma-SOS — An offline Android app for disaster response. Runs Gemma 4 E2B on-device via LiteRT-LM. Features:
- START Protocol triage (instant local engine + LLM)
- SOS beacon with GPS coordinates
- QR-based mesh sync for patient data
- Offline maps with resource finder
- Wreckage analyzer (camera-based structural assessment)
Competition
This adapter was created for the Gemma 4 Good Hackathon (Google DeepMind x Kaggle) — Unsloth Special Technology Track.
Environmental Impact
- Hardware: NVIDIA Tesla T4 (Kaggle)
- Training Time: ~14 minutes
- Cloud Provider: Kaggle (Google Cloud)
- Estimated CO₂: <0.1 kg CO₂eq
- Downloads last month
- 102