AnomalyThink-Qwen2.5-VL-7B-SFT

Supervised-fine-tuning-only baseline (DS-MVTec 80.16 / VisA 64.78).

A Qwen2.5-VL-7B model fine-tuned for explainable industrial anomaly detection (IAD). Given a product image it produces a structured reasoning trace (<think>), a defect <location> and <type> (for anomalies), and a binary <answer>. Research artefact from the MSc thesis Reasoning-Enhanced Vision-Language Models for Explainable Industrial Anomaly Detection (TU Delft, 2026).

Results (MMAD subsets, balanced accuracy)

Benchmark Balanced accuracy
DS-MVTec (1,670) 80.16%
VisA (2,141) 64.78%

Evaluated under a single common harness on the MMAD DS-MVTec and VisA subsets. This is the SFT-only headline before any RL stage.

Training

Supervised fine-tuning on the AnomalyThink-6K corpus (6,000 Gemini-2.5-Flash traces, vision encoder frozen, epoch 3 of 4).

The "AnomalyThink" reasoning traces were distilled from Gemini 2.5-Flash on Real-IAD images. Training data: aacudad/AnomalyThink.

Usage

import torch
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("aacudad/AnomalyThink-Qwen2.5-VL-7B-SFT", torch_dtype=torch.bfloat16, device_map="auto")
processor = AutoProcessor.from_pretrained("aacudad/AnomalyThink-Qwen2.5-VL-7B-SFT")
# Build the structured single-image IAD prompt + image, then generate.

Intended use and limitations

Research on explainable IAD. Known limitations: the model can confidently hallucinate a defect on a normal part (false positive), and GRPO-lineage variants can over-predict the "Missing Parts" type. As the public DS-MVTec/VisA images may appear in VLM pretraining, absolute numbers should be read with that caveat.

Citation

@mastersthesis{acudad2026anomalythink,
  title  = {Reasoning-Enhanced Vision-Language Models for Explainable Industrial Anomaly Detection},
  author = {Acudad, Adnane},
  school = {Delft University of Technology},
  year   = {2026}
}

License

Apache-2.0 (inherits the Qwen2.5-VL-7B base). Trained on Real-IAD (cite Real-IAD separately; images are not redistributed) with traces distilled from Gemini 2.5-Flash.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aacudad/AnomalyThink-Qwen2.5-VL-7B-SFT

Finetuned
(1126)
this model