Phishing Email Detector - Qwen3.5-2B Fine-tuned
A finetuned version of Qwen/Qwen3.5-2B for email phishing detection. Given a raw email (sender, receiver, subject, body), the model returns a structured JSON analysis including phishing verdict, confidence score, threat type, risk level, and reasoning.
Input Format
The model expects a plain text email in the following format:
Sender: <sender email address>
Receiver: <receiver email address>
Date: <email date>
Subject: <email subject>
Body: <email body text>
Example input:
Sender: security@netf1ix-account-alert.com
Receiver: kevin.walsh@hotmail.com
Date: Sun, 01 Jun 2025 22:10:05 +0000
Subject: Netflix: Payment declined - Update your billing information
Body: Dear Kevin, We were unable to process your most recent payment and your Netflix membership is at risk of cancellation. Update billing: http://netf1ix-account-alert.com/billing-update. Act within 48 hours.
Output Format
The model responds with a structured JSON object:
{
"is_phishing": true,
"confidence_score": 0.98,
"threat_type": "Credential Harvesting / Financial Fraud",
"risk_level": "CRITICAL",
"reasoning": "The sender domain 'netf1ix-account-alert.com' replaces the letter 'l' with '1' to impersonate Netflix..."
}
| Field | Type | Description |
|---|---|---|
is_phishing |
boolean |
true if phishing, false if legitimate |
confidence_score |
float (0.0–1.0) |
Model's confidence in its verdict |
threat_type |
string or null |
Category of threat, or null if legitimate |
risk_level |
string |
One of LOW, MEDIUM, HIGH, CRITICAL |
reasoning |
string |
Human-readable explanation of the verdict |
Quick Start
Installation
pip install transformers torch accelerate
Usage
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_PATH = "Davv4/phishing-qwen3.5-2b"
SYSTEM_PROMPT = (
"You are an email security analyst. Analyze the provided email and determine "
"if it is a phishing attempt. Respond ONLY with a valid JSON object using this "
'exact schema: {"is_phishing": boolean, "confidence_score": number (0.0-1.0), '
'"threat_type": "string or null", "risk_level": "LOW|MEDIUM|HIGH|CRITICAL", '
'"reasoning": "string"}'
)
# Load model
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
torch_dtype=torch.bfloat16,
device_map="auto"
)
model.eval()
def analyze_email(email_text: str) -> dict:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": email_text}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
try:
result = json.loads(generated.strip())
except json.JSONDecodeError:
result = {"error": "Failed to parse model output", "raw": generated}
return result
# Example usage
email = """Sender: security@netf1ix-account-alert.com
Receiver: kevin.walsh@hotmail.com
Date: Sun, 01 Jun 2025 22:10:05 +0000
Subject: Netflix: Payment declined - Update your billing information
Body: Dear Kevin, We were unable to process your most recent payment and your Netflix membership is at risk of cancellation. Update billing: http://netf1ix-account-alert.com/billing-update. Act within 48 hours."""
result = analyze_email(email)
print(json.dumps(result, indent=2))
Expected output:
{
"is_phishing": true,
"confidence_score": 0.98,
"threat_type": "Credential Harvesting / Financial Fraud",
"risk_level": "CRITICAL",
"reasoning": "The sender domain 'netf1ix-account-alert.com' replaces the letter 'l' with '1' to impersonate Netflix. The link routes to a fraudulent domain rather than netflix.com. Loss aversion and urgency tactics are used to pressure the victim."
}
Training Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-2B |
| Fine-tuning method | LoRA (rank 8, alpha 16) |
| Training framework | LLaMA Factory |
| Training epochs | 10 |
| Batch size | 2 (gradient accumulation steps: 8) |
| Learning rate | 3e-4 |
| Quantization (training) | 8-bit (BnB) |
| Compute type | bfloat16 |
| Task | Supervised Fine-Tuning (SFT) |
Limitations
- Trained on a relatively small dataset, so it best used as a supplementary tool, not a sole decision maker.
- May not generalize well to highly novel phishing techniques not represented in training data.
- Output is always in English regardless of the input email language.
- Always validate the JSON output programmatically as the model may occasionally produce malformed responses.
License
This model inherits the Apache 2.0 license from the base Qwen3.5-2B model.
- Downloads last month
- 268