OfflineAid — Gemma 4 E4B fine-tune (merged fp16 safetensors)

Stage-1 fine-tune of unsloth/gemma-4-E4B-it for the OfflineAid Mac-side pack-builder agent — an offline AI assistant for Australian consumer-safety scenarios (anti-scam, disaster, travel) targeting the Kaggle Gemma 4 Good Hackathon.

This repo contains the fully merged fp16 safetensors (4 shards, ~16 GB) produced by peft.PeftModel.merge_and_unload() from the LoRA adapter at helenk/gemma-4-E4B-lora. For the Q4_K_M GGUF quantization that ships with Ollama, see helenk/gemma-4-E4B-finetune-GGUF.

Tier A held-out eval (vs stock + RAG)

Held-out: 111 rows stratified per-language (37 EN + 37 ZH + 37 AR) from the 1,113-row helenkwok/offlineaid corpus, seed=3407. Q4_K_M-quantized version evaluated against stock gemma4-stock (also Q4_K_M) via Ollama. Greedy decoding, explicit "Answer in {language}" directive.

Language Metric stock + RAG ft + RAG Δ
EN ROUGE-L F1 0.688 0.699 +0.011
EN Format-OK % 91.9% 94.6% +2.7 pp
ZH ROUGE-L F1 0.229 0.227 −0.002
ZH Format-OK % 45.9% 62.2% +16.3 pp
AR ROUGE-L F1 0.085 0.139 +63%
AR Format-OK % 21.6% 54.1% +32.4 pp (2.5×)
all ROUGE-L F1 0.334 0.355 +0.021
all Format-OK % 53.2% 70.3% +17.0 pp

Format-OK = pred is non-empty AND contains at least one character in the expected script (CJK for ZH, Arabic for AR, ASCII for EN). The honest multilingual signal — ROUGE-L is lexically blind to two valid translations of the same English source.

Training

  • Method: Unsloth LoRA on Kaggle T4 (1× T4, ~7 min)
  • Adapter source: helenk/gemma-4-E4B-lora
  • Base: unsloth/gemma-4-E4B-it
  • LoRA config: r=16, α=16, dropout=0, target = q,k,v,o,gate,up,down_proj, vision layers off
  • Chat template: gemma-4-thinking
  • Loss: train_on_responses_only (mask user + evidence turn, train only on answer tokens)
  • Data: helenkwok/offlineaid v3 — 1,002-row train split (90/10 stratified per-language EN/ZH/AR from 1,113-row total, seed=3407). Each row is {instruction, input (= verbatim evidence_quote from .gov.au source), output (grounded answer in target language), language}.
  • Hyperparams: 2 epochs, batch 8 (per_device 2 × grad_accum 4), lr 2e-5, warmup 5, fp16, weight decay 0.01, seed 3407
  • Notebook: scripts/render_finetune_variant.py renders one canonical notebook to E2B and E4B variants; canonical at notebooks/_canonical-finetune.ipynb.

Merge recipe

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-E4B-it", torch_dtype="float16", device_map="cpu")
# Unsloth wraps Linear layers in Gemma4ClippableLinear — strip before peft injection
# (see scripts/merge_e4b_lora.py for the unwrap helper)
model = PeftModel.from_pretrained(base, "helenk/gemma-4-E4B-lora").merge_and_unload()
model.save_pretrained("gemma-4-E4B-offlineaid-merged", safe_serialization=True, max_shard_size="5GB")
AutoTokenizer.from_pretrained("helenk/gemma-4-E4B-lora").save_pretrained("gemma-4-E4B-offlineaid-merged")

Full script: scripts/merge_e4b_lora.py.

Intended use

Stage 3 of the OfflineAid pipeline — Mac-side pack-builder agent loop (Ollama via pydantic-ai). Pixel 7 production deployment uses the stock gemma-4-E2B-it.litertlm plus retrieval, not this fine-tune; see the project writeup for the architectural rationale.

License

Inherits Google's Gemma Terms of Use. Training data (helenkwok/offlineaid) is CC-BY-4.0.

Sibling repos

Downloads last month
10
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for helenk/gemma-4-E4B-finetune

Adapter
(48)
this model

Collection including helenk/gemma-4-E4B-finetune