Scam Sentinel — Fine-tuned Gemma 4 E2B for Multimodal Scam Risk Detection

LoRA adapter fine-tuned on Gemma 4 E2B-it for the Gemma 4 Good Hackathon (2026), Safety & Trust + Unsloth tracks.

This is not a final forensic deepfake detector. It is a multimodal scam risk assistant that combines phone call transcript analysis, conversation patterns, and verification workflows.

Headline Results (300-sample real evaluation, apples-to-apples)

All three rows use the same 300-sample real test set, no RAG, identical v3 system prompt. The only variables are base-model size and the presence of the LoRA adapter.

Setup Size Accuracy Precision Recall F1 FPR
Gemma 4 E4B base ~8B 53.0% 46.9% 97.6% 63.4% 78.9%
Gemma 4 E2B base ~5B 41.7% 41.4% 96.8% 58.0% 97.7%
Gemma 4 E2B + QLoRA (this adapter) ~5B 89.7% 98.0% 76.8% 86.1% 1.1%

Key findings

  1. Same-size apples-to-apples (E2B base → E2B + QLoRA): F1 jumps +28.1 pt (58.0 → 86.1), FPR collapses 88× (97.7% → 1.1%), Precision more than doubles (41.4% → 98.0%).
  2. Untuned Gemma 4 base is unusable for this task: both base models flag the vast majority of normal messages as suspicious (FPR 78.9% and 97.7%). The instruction-tuned base has no domain prior for scam vs. normal text.
  3. Fine-tuning beats raw scale: the fine-tuned 5B model outperforms the larger 8B base by +22.7 F1 points (63.4 → 86.1).
  4. Recall trade-off is intentional: 96.8% (E2B base) → 76.8% (fine-tuned). See "Design rationale" below — the production cascade's Stage 1 retains high recall.

Model Details

  • Developed by: Alice0914 (Gemma 4 Good Hackathon submission)
  • Base model: unsloth/gemma-4-E2B-it (~5B params, MatFormer architecture)
  • Adapter type: LoRA (PEFT) — 28.7M trainable params (0.56% of base)
  • Training framework: Unsloth + TRL SFTTrainer
  • Quantization at training: 4-bit NF4 (QLoRA)
  • License: Apache 2.0
  • Language: English
  • Project: Scam Sentinel GitHub repo

Intended Use

Direct Use

Analyze SMS, email, or transcribed phone-call messages and output structured JSON containing:

  • risk_level: safe / low / medium / high / critical
  • patterns: Detected scam patterns (urgency, impersonation, secrecy, etc.)
  • user_message: Plain-language explanation answering "Is this a scam? Why? What to do? How to verify?"
  • tool_calls: Function calls into 12 protective tools (notify family, suggest callback, block payment, etc.)

Out-of-Scope Use

  • Voice authenticity / deepfake audio detection (use a dedicated audio model)
  • Languages other than English
  • Real-time telephony interception (requires phone-system integration)
  • Replacement for human judgment in financial decisions

How to Use

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Alice0914/gemma4-e2b-scam-sentinel",
    max_seq_length=1024,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

# (Load the full system prompt from the project repo)
system_prompt = "..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "ANALYZE THIS INPUT:\n\nTEXT: Mom, send $500 right now\nMETADATA: {\"channel\": \"sms\"}"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

- Training Data

  • 3,100 chat-formatted samples (system + user + assistant)
  • Generated from 80 hand-written seeds + 571 real UCI SMS Spam samples × Gemma-4 paraphrased variants
  • 8 categories: family_impersonation, prosecutor_scam, bec_scam, romance_scam, package_scam, bank_phishing, phishing_link, normal
  • Assistant responses follow a 5-step Chain-of-Thought (IDENTIFY → ASSESS → EXPLAIN → DECIDE TOOLS → ANSWER) + JSON output format
  • Train / dev split: 3,100 / 771 (stratified by category)

- Training Hyperparameters

  • Method: QLoRA (4-bit NF4 base + LoRA r=16)
  • LoRA: r=16, alpha=32, dropout=0.05, target_modules="all-linear"
  • Batch: 1 × grad_accum 8 (effective batch 8)
  • Epochs: 2 (~775 steps)
  • Learning rate: 2e-4, cosine schedule, warmup_ratio=0.03
  • Optimizer: paged_adamw_8bit
  • Precision: bf16 (compute) / NF4 (base weights)
  • Max sequence length: 1024
  • Random seed: 3407

- Hardware

  • GPU: Google Colab Pro L4 (22.5 GB VRAM)
  • Training time: ~50 minutes for 2 epochs
  • Framework versions: PEFT 0.19.1, transformers ≥4.50, trl ≥1.4, Unsloth (latest from GitHub)

Evaluation

- Testing Data

  • Held-out set of 300 hand-labeled real samples
  • Distribution: 175 safe / 7 low / 79 medium / 26 high / 13 critical
  • Sources: FTC consumer-fraud reports, UCI SMS Spam Collection (training-disjoint subset), custom edge cases
  • The evaluation set is disjoint from training via the seeds_real.jsonl filter — verified by hash check

- Metrics (this adapter)

Binary danger-vs-safe (matches the project's baseline reporting protocol):

Metric Value
Accuracy 89.7%
Precision 98.0%
Recall 76.8%
F1 86.1%
FPR 1.1%
JSON parsing success 95.3% (286/300)
  • Strict 5-class match: 69.0% (model occasionally over-classifies within the dangerous range, e.g., medium → high — the correct failure mode for a safety-critical app)

- What Fine-tuning Changed (E2B base → E2B + QLoRA)

Behavior Base (E2B ~5B) Fine-tuned (E2B ~5B) Δ
FPR 97.7% 1.1% 88× reduction
Precision 41.4% 98.0% +56.6 pt
Accuracy 41.7% 89.7% +48.0 pt
F1 58.0% 86.1% +28.1 pt
Recall 96.8% 76.8% −20.0 pt (intentional trade-off)
  • The base instruction-tuned model has no in-domain prior for "what does a normal message look like?" — it flags 97.7% of safe messages as suspicious
  • Fine-tuning re-calibrates the decision boundary using 3,100 in-domain examples
  • The recall reduction is a deliberate trade-off favoring user trust over raw catch rate

- Fine-tuning vs Raw Scale (E4B base → E2B + QLoRA)

Behavior E4B base (~8B) E2B + QLoRA (~5B) Δ
F1 63.4% 86.1% +22.7 pt
FPR 78.9% 1.1% 72× reduction
Precision 46.9% 98.0% +51.1 pt
  • A fine-tuned smaller model decisively outperforms a larger base model on this task
  • Demonstrates that domain adaptation dominates scale for safety-critical classification with limited training compute
  • Total cost: one Colab L4 session, ~50 minutes

Note on comparison fairness: All three setups use the same 300-sample test set and identical v3 system prompt; no RAG. Base models use Ollama Q4_K_M quantization; the fine-tune uses Unsloth NF4 (4-bit). Both are 4-bit; quantization differences contribute marginally — the +28.1 F1 / 88× FPR delta is dominated by the adapter, not quantization or size.

- Design Rationale: Precision over Recall

In the Scam Sentinel production system, this adapter is Stage 2 of a two-stage cascade:

  • Stage 1 — a fast classifier (e.g., gemma3:4b) ensures every potentially dangerous message is escalated (recall 99%+)

  • Stage 2 — this fine-tuned adapter provides high-confidence reasoning and tool calls only when action is warranted (precision 98%)

  • Stage 1 handles "catch everything"

  • Stage 2's job is to justify action — blocking payments, alerting family, demanding callback verification

  • With 1.1% FPR, when this model flags a message, downstream actions are trusted by users

  • Higher recall at this stage would re-introduce the user-trust collapse seen in the base model (FPR 97.7%), making the product unusable in real deployment regardless of recall


Bias, Risks, and Limitations

- Language

  • English-only: Trained on English text; performance on other languages is not validated

- Classification Behavior

  • Over-classification bias within the dangerous range: The model leans toward "more dangerous" classifications (e.g., medium → high)
  • This is intentional — once false positives on safe messages are eliminated, the safer error mode within non-safe messages is to over-classify
  • Downstream tools (wait timer, callback verification) make over-classification cheap to recover from

- Recall Trade-off

  • Some borderline messages may be missed
  • Recommended deployment pairs this adapter with a high-recall first-pass classifier (cascade Stage 1)

- Training Data Provenance

  • Synthetic origin: 80% of training data was Gemma-paraphrased from hand-written seeds and real UCI SMS spam
  • Evaluation uses real held-out data only to detect any overfit to synthetic style

- Tool Calls Are Advisory

  • The 12 protective tools are recommended actions
  • Downstream systems must enforce safety policies independently
  • The model does not execute actions — it returns structured intent

Citation

Project: Scam Sentinel — submission for the Gemma 4 Good Hackathon (2026).

@misc{scam-sentinel-2026,
  author = {Alice0914},
  title = {Scam Sentinel: Multimodal Scam Risk Assistant with Fine-tuned Gemma 4 E2B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Alice0914/gemma4-e2b-scam-sentinel}},
  note = {Submission for the Gemma 4 Good Hackathon}
}

## Framework Versions
- PEFT 0.19.1
- Unsloth (latest from GitHub)
- transformers ≥4.50
- trl ≥1.4
Downloads last month
123
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alice0914/gemma4-e2b-scam-sentinel

Adapter
(26)
this model