Phishing Email Detector - Qwen3.5-2B Fine-tuned

A finetuned version of Qwen/Qwen3.5-2B for email phishing detection. Given a raw email (sender, receiver, subject, body), the model returns a structured JSON analysis including phishing verdict, confidence score, threat type, risk level, and reasoning.

Input Format

The model expects a plain text email in the following format:

Sender: <sender email address>
Receiver: <receiver email address>
Date: <email date>
Subject: <email subject>
Body: <email body text>

Example input:

Sender: security@netf1ix-account-alert.com
Receiver: kevin.walsh@hotmail.com
Date: Sun, 01 Jun 2025 22:10:05 +0000
Subject: Netflix: Payment declined - Update your billing information
Body: Dear Kevin, We were unable to process your most recent payment and your Netflix membership is at risk of cancellation. Update billing: http://netf1ix-account-alert.com/billing-update. Act within 48 hours.

Output Format

The model responds with a structured JSON object:

{
  "is_phishing": true,
  "confidence_score": 0.98,
  "threat_type": "Credential Harvesting / Financial Fraud",
  "risk_level": "CRITICAL",
  "reasoning": "The sender domain 'netf1ix-account-alert.com' replaces the letter 'l' with '1' to impersonate Netflix..."
}

Field	Type	Description
`is_phishing`	`boolean`	`true` if phishing, `false` if legitimate
`confidence_score`	`float` (0.0–1.0)	Model's confidence in its verdict
`threat_type`	`string` or `null`	Category of threat, or `null` if legitimate
`risk_level`	`string`	One of `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`
`reasoning`	`string`	Human-readable explanation of the verdict

Quick Start

Installation

pip install transformers torch accelerate

Usage

import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_PATH = "Davv4/phishing-qwen3.5-2b"

SYSTEM_PROMPT = (
    "You are an email security analyst. Analyze the provided email and determine "
    "if it is a phishing attempt. Respond ONLY with a valid JSON object using this "
    'exact schema: {"is_phishing": boolean, "confidence_score": number (0.0-1.0), '
    '"threat_type": "string or null", "risk_level": "LOW|MEDIUM|HIGH|CRITICAL", '
    '"reasoning": "string"}'
)

# Load model
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model.eval()

def analyze_email(email_text: str) -> dict:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": email_text}
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.1,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    generated = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )

    try:
        result = json.loads(generated.strip())
    except json.JSONDecodeError:
        result = {"error": "Failed to parse model output", "raw": generated}

    return result


# Example usage
email = """Sender: security@netf1ix-account-alert.com
Receiver: kevin.walsh@hotmail.com
Date: Sun, 01 Jun 2025 22:10:05 +0000
Subject: Netflix: Payment declined - Update your billing information
Body: Dear Kevin, We were unable to process your most recent payment and your Netflix membership is at risk of cancellation. Update billing: http://netf1ix-account-alert.com/billing-update. Act within 48 hours."""

result = analyze_email(email)
print(json.dumps(result, indent=2))

Expected output:

{
  "is_phishing": true,
  "confidence_score": 0.98,
  "threat_type": "Credential Harvesting / Financial Fraud",
  "risk_level": "CRITICAL",
  "reasoning": "The sender domain 'netf1ix-account-alert.com' replaces the letter 'l' with '1' to impersonate Netflix. The link routes to a fraudulent domain rather than netflix.com. Loss aversion and urgency tactics are used to pressure the victim."
}

Training Details

Property	Value
Base model	Qwen/Qwen3.5-2B
Fine-tuning method	LoRA (rank 8, alpha 16)
Training framework	LLaMA Factory
Training epochs	10
Batch size	2 (gradient accumulation steps: 8)
Learning rate	3e-4
Quantization (training)	8-bit (BnB)
Compute type	bfloat16
Task	Supervised Fine-Tuning (SFT)

Limitations

Trained on a relatively small dataset, so it best used as a supplementary tool, not a sole decision maker.
May not generalize well to highly novel phishing techniques not represented in training data.
Output is always in English regardless of the input email language.
Always validate the JSON output programmatically as the model may occasionally produce malformed responses.

License

This model inherits the Apache 2.0 license from the base Qwen3.5-2B model.

Downloads last month: 268

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Davv4/phishing-qwen3.5-2b

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Adapter

(94)

this model