Phishing Email Detector - Qwen3.5-2B Fine-tuned

A finetuned version of Qwen/Qwen3.5-2B for email phishing detection. Given a raw email (sender, receiver, subject, body), the model returns a structured JSON analysis including phishing verdict, confidence score, threat type, risk level, and reasoning.


Input Format

The model expects a plain text email in the following format:

Sender: <sender email address>
Receiver: <receiver email address>
Date: <email date>
Subject: <email subject>
Body: <email body text>

Example input:

Sender: security@netf1ix-account-alert.com
Receiver: kevin.walsh@hotmail.com
Date: Sun, 01 Jun 2025 22:10:05 +0000
Subject: Netflix: Payment declined - Update your billing information
Body: Dear Kevin, We were unable to process your most recent payment and your Netflix membership is at risk of cancellation. Update billing: http://netf1ix-account-alert.com/billing-update. Act within 48 hours.

Output Format

The model responds with a structured JSON object:

{
  "is_phishing": true,
  "confidence_score": 0.98,
  "threat_type": "Credential Harvesting / Financial Fraud",
  "risk_level": "CRITICAL",
  "reasoning": "The sender domain 'netf1ix-account-alert.com' replaces the letter 'l' with '1' to impersonate Netflix..."
}
Field Type Description
is_phishing boolean true if phishing, false if legitimate
confidence_score float (0.0–1.0) Model's confidence in its verdict
threat_type string or null Category of threat, or null if legitimate
risk_level string One of LOW, MEDIUM, HIGH, CRITICAL
reasoning string Human-readable explanation of the verdict

Quick Start

Installation

pip install transformers torch accelerate

Usage

import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_PATH = "Davv4/phishing-qwen3.5-2b"

SYSTEM_PROMPT = (
    "You are an email security analyst. Analyze the provided email and determine "
    "if it is a phishing attempt. Respond ONLY with a valid JSON object using this "
    'exact schema: {"is_phishing": boolean, "confidence_score": number (0.0-1.0), '
    '"threat_type": "string or null", "risk_level": "LOW|MEDIUM|HIGH|CRITICAL", '
    '"reasoning": "string"}'
)

# Load model
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model.eval()

def analyze_email(email_text: str) -> dict:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": email_text}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.1,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    generated = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )

    try:
        result = json.loads(generated.strip())
    except json.JSONDecodeError:
        result = {"error": "Failed to parse model output", "raw": generated}

    return result


# Example usage
email = """Sender: security@netf1ix-account-alert.com
Receiver: kevin.walsh@hotmail.com
Date: Sun, 01 Jun 2025 22:10:05 +0000
Subject: Netflix: Payment declined - Update your billing information
Body: Dear Kevin, We were unable to process your most recent payment and your Netflix membership is at risk of cancellation. Update billing: http://netf1ix-account-alert.com/billing-update. Act within 48 hours."""

result = analyze_email(email)
print(json.dumps(result, indent=2))

Expected output:

{
  "is_phishing": true,
  "confidence_score": 0.98,
  "threat_type": "Credential Harvesting / Financial Fraud",
  "risk_level": "CRITICAL",
  "reasoning": "The sender domain 'netf1ix-account-alert.com' replaces the letter 'l' with '1' to impersonate Netflix. The link routes to a fraudulent domain rather than netflix.com. Loss aversion and urgency tactics are used to pressure the victim."
}

Training Details

Property Value
Base model Qwen/Qwen3.5-2B
Fine-tuning method LoRA (rank 8, alpha 16)
Training framework LLaMA Factory
Training epochs 10
Batch size 2 (gradient accumulation steps: 8)
Learning rate 3e-4
Quantization (training) 8-bit (BnB)
Compute type bfloat16
Task Supervised Fine-Tuning (SFT)

Limitations

  • Trained on a relatively small dataset, so it best used as a supplementary tool, not a sole decision maker.
  • May not generalize well to highly novel phishing techniques not represented in training data.
  • Output is always in English regardless of the input email language.
  • Always validate the JSON output programmatically as the model may occasionally produce malformed responses.

License

This model inherits the Apache 2.0 license from the base Qwen3.5-2B model.

Downloads last month
268
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Davv4/phishing-qwen3.5-2b

Finetuned
Qwen/Qwen3.5-2B
Adapter
(94)
this model