Scam Sentinel — Fine-tuned Gemma 4 E2B for Multimodal Scam Risk Detection

LoRA adapter fine-tuned on Gemma 4 E2B-it for the Gemma 4 Good Hackathon (2026), Safety & Trust + Unsloth tracks.

This is not a final forensic deepfake detector. It is a multimodal scam risk assistant that combines phone call transcript analysis, conversation patterns, and verification workflows.

Headline Results (300-sample real evaluation, apples-to-apples)

All three rows use the same 300-sample real test set, no RAG, identical v3 system prompt. The only variables are base-model size and the presence of the LoRA adapter.

Setup	Size	Accuracy	Precision	Recall	F1	FPR
Gemma 4 E4B base	~8B	53.0%	46.9%	97.6%	63.4%	78.9%
Gemma 4 E2B base	~5B	41.7%	41.4%	96.8%	58.0%	97.7%
Gemma 4 E2B + QLoRA (this adapter)	~5B	89.7%	98.0%	76.8%	86.1%	1.1%

Key findings

Same-size apples-to-apples (E2B base → E2B + QLoRA): F1 jumps +28.1 pt (58.0 → 86.1), FPR collapses 88× (97.7% → 1.1%), Precision more than doubles (41.4% → 98.0%).
Untuned Gemma 4 base is unusable for this task: both base models flag the vast majority of normal messages as suspicious (FPR 78.9% and 97.7%). The instruction-tuned base has no domain prior for scam vs. normal text.
Fine-tuning beats raw scale: the fine-tuned 5B model outperforms the larger 8B base by +22.7 F1 points (63.4 → 86.1).
Recall trade-off is intentional: 96.8% (E2B base) → 76.8% (fine-tuned). See "Design rationale" below — the production cascade's Stage 1 retains high recall.

Model Details

Developed by: Alice0914 (Gemma 4 Good Hackathon submission)
Base model: unsloth/gemma-4-E2B-it (~5B params, MatFormer architecture)
Adapter type: LoRA (PEFT) — 28.7M trainable params (0.56% of base)
Training framework: Unsloth + TRL SFTTrainer
Quantization at training: 4-bit NF4 (QLoRA)
License: Apache 2.0
Language: English
Project: Scam Sentinel GitHub repo

Intended Use

Direct Use

Analyze SMS, email, or transcribed phone-call messages and output structured JSON containing:

risk_level: safe / low / medium / high / critical
patterns: Detected scam patterns (urgency, impersonation, secrecy, etc.)
user_message: Plain-language explanation answering "Is this a scam? Why? What to do? How to verify?"
tool_calls: Function calls into 12 protective tools (notify family, suggest callback, block payment, etc.)

Out-of-Scope Use

Voice authenticity / deepfake audio detection (use a dedicated audio model)
Languages other than English
Real-time telephony interception (requires phone-system integration)
Replacement for human judgment in financial decisions

How to Use

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Alice0914/gemma4-e2b-scam-sentinel",
    max_seq_length=1024,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

# (Load the full system prompt from the project repo)
system_prompt = "..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "ANALYZE THIS INPUT:\n\nTEXT: Mom, send $500 right now\nMETADATA: {\"channel\": \"sms\"}"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

- Training Data

3,100 chat-formatted samples (system + user + assistant)
Generated from 80 hand-written seeds + 571 real UCI SMS Spam samples × Gemma-4 paraphrased variants
8 categories: family_impersonation, prosecutor_scam, bec_scam, romance_scam, package_scam, bank_phishing, phishing_link, normal
Assistant responses follow a 5-step Chain-of-Thought (IDENTIFY → ASSESS → EXPLAIN → DECIDE TOOLS → ANSWER) + JSON output format
Train / dev split: 3,100 / 771 (stratified by category)

- Training Hyperparameters

Method: QLoRA (4-bit NF4 base + LoRA r=16)
LoRA: r=16, alpha=32, dropout=0.05, target_modules="all-linear"
Batch: 1 × grad_accum 8 (effective batch 8)
Epochs: 2 (~775 steps)
Learning rate: 2e-4, cosine schedule, warmup_ratio=0.03
Optimizer: paged_adamw_8bit
Precision: bf16 (compute) / NF4 (base weights)
Max sequence length: 1024
Random seed: 3407

- Hardware

GPU: Google Colab Pro L4 (22.5 GB VRAM)
Training time: ~50 minutes for 2 epochs
Framework versions: PEFT 0.19.1, transformers ≥4.50, trl ≥1.4, Unsloth (latest from GitHub)

Evaluation

- Testing Data

Held-out set of 300 hand-labeled real samples
Distribution: 175 safe / 7 low / 79 medium / 26 high / 13 critical
Sources: FTC consumer-fraud reports, UCI SMS Spam Collection (training-disjoint subset), custom edge cases
The evaluation set is disjoint from training via the seeds_real.jsonl filter — verified by hash check

- Metrics (this adapter)

Binary danger-vs-safe (matches the project's baseline reporting protocol):

Metric	Value
Accuracy	89.7%
Precision	98.0%
Recall	76.8%
F1	86.1%
FPR	1.1%
JSON parsing success	95.3% (286/300)

Strict 5-class match: 69.0% (model occasionally over-classifies within the dangerous range, e.g., medium → high — the correct failure mode for a safety-critical app)

- What Fine-tuning Changed (E2B base → E2B + QLoRA)

Behavior	Base (E2B ~5B)	Fine-tuned (E2B ~5B)	Δ
FPR	97.7%	1.1%	88× reduction
Precision	41.4%	98.0%	+56.6 pt
Accuracy	41.7%	89.7%	+48.0 pt
F1	58.0%	86.1%	+28.1 pt
Recall	96.8%	76.8%	−20.0 pt (intentional trade-off)

The base instruction-tuned model has no in-domain prior for "what does a normal message look like?" — it flags 97.7% of safe messages as suspicious
Fine-tuning re-calibrates the decision boundary using 3,100 in-domain examples
The recall reduction is a deliberate trade-off favoring user trust over raw catch rate

- Fine-tuning vs Raw Scale (E4B base → E2B + QLoRA)

Behavior	E4B base (~8B)	E2B + QLoRA (~5B)	Δ
F1	63.4%	86.1%	+22.7 pt
FPR	78.9%	1.1%	72× reduction
Precision	46.9%	98.0%	+51.1 pt

A fine-tuned smaller model decisively outperforms a larger base model on this task
Demonstrates that domain adaptation dominates scale for safety-critical classification with limited training compute
Total cost: one Colab L4 session, ~50 minutes

Note on comparison fairness: All three setups use the same 300-sample test set and identical v3 system prompt; no RAG. Base models use Ollama Q4_K_M quantization; the fine-tune uses Unsloth NF4 (4-bit). Both are 4-bit; quantization differences contribute marginally — the +28.1 F1 / 88× FPR delta is dominated by the adapter, not quantization or size.

- Design Rationale: Precision over Recall

In the Scam Sentinel production system, this adapter is Stage 2 of a two-stage cascade:

Stage 1 — a fast classifier (e.g., gemma3:4b) ensures every potentially dangerous message is escalated (recall 99%+)
Stage 2 — this fine-tuned adapter provides high-confidence reasoning and tool calls only when action is warranted (precision 98%)
Stage 1 handles "catch everything"
Stage 2's job is to justify action — blocking payments, alerting family, demanding callback verification
With 1.1% FPR, when this model flags a message, downstream actions are trusted by users
Higher recall at this stage would re-introduce the user-trust collapse seen in the base model (FPR 97.7%), making the product unusable in real deployment regardless of recall

Bias, Risks, and Limitations

- Language

English-only: Trained on English text; performance on other languages is not validated

- Classification Behavior

Over-classification bias within the dangerous range: The model leans toward "more dangerous" classifications (e.g., medium → high)
This is intentional — once false positives on safe messages are eliminated, the safer error mode within non-safe messages is to over-classify
Downstream tools (wait timer, callback verification) make over-classification cheap to recover from

- Recall Trade-off

Some borderline messages may be missed
Recommended deployment pairs this adapter with a high-recall first-pass classifier (cascade Stage 1)

- Training Data Provenance

Synthetic origin: 80% of training data was Gemma-paraphrased from hand-written seeds and real UCI SMS spam
Evaluation uses real held-out data only to detect any overfit to synthetic style

- Tool Calls Are Advisory

The 12 protective tools are recommended actions
Downstream systems must enforce safety policies independently
The model does not execute actions — it returns structured intent

Citation

Project: Scam Sentinel — submission for the Gemma 4 Good Hackathon (2026).

@misc{scam-sentinel-2026,
  author = {Alice0914},
  title = {Scam Sentinel: Multimodal Scam Risk Assistant with Fine-tuned Gemma 4 E2B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Alice0914/gemma4-e2b-scam-sentinel}},
  note = {Submission for the Gemma 4 Good Hackathon}
}

## Framework Versions
- PEFT 0.19.1
- Unsloth (latest from GitHub)
- transformers ≥4.50
- trl ≥1.4

Downloads last month: 123

Model tree for Alice0914/gemma4-e2b-scam-sentinel

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Finetuned

unsloth/gemma-4-E2B-it

Adapter

(26)

this model