Slide-Examiner 8B (QLoRA adapter)

QLoRA LoRA adapter on Qwen3-VL-8B-Instruct for examining presentation-slide quality. Trained as Part 2 of the Slide-Examiner project.

What it does

A pointwise + pairwise slide examiner: detects semantic slide defects (title/body mismatch, density, narrative order, missing section) and is deliberately trained to abstain on pixel-level geometry (overflow / overlap / alignment / font / color / margin) — those are handled by a symbolic linter, not the VLM. Output is strict contract JSON (PageExamResult / DeckExamResult / PairwiseResult).

Headline results (in-domain held-out, balanced accuracy, modality A = image-only)

S-group semantic	this adapter (8B)	zero-shot 8B	zero-shot 30B
balanced accuracy	1.0	0.639	0.785

The finetuned 8B examiner surpasses the zero-shot 30B model on the S-group while keeping ~0 false-positive rate on geometry (it abstains rather than hallucinating geometry from pixels). eval_loss trajectory: None.

Training

Base: Qwen/Qwen3-VL-8B-Instruct; QLoRA 4-bit (bitsandbytes), LoRA rank 16, alpha 32, 2 epochs, cosine LR 1e-4.
Data: ~5.3K synthetic slides (paired clean/defective), architecture-correct routing (S-group pointwise; geometry restate-from-structure + abstain-under-image; G1/S6 pairwise; S3→linter).
Framework: LLaMA-Factory, template qwen3_vl_nothink.

Usage

from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor
base = "Qwen/Qwen3-VL-8B-Instruct"
model = AutoModelForImageTextToText.from_pretrained(base, torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(model, "michaelrhs/slide-examiner-8b-qlora")
proc = AutoProcessor.from_pretrained(base)

Adapter files: adapter_config.json, adapter_model.safetensors.

Downloads last month: 94

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for michaelrhs/slide-examiner-8b-qlora

Base model

Qwen/Qwen3-VL-8B-Instruct

Adapter

(127)

this model