chartreader-0.8B-OptiQ-4bit

A 0.8B vision-language model that reads charts, fine-tuned on a Mac.

This is the OptiQ 4-bit quant of Qwen3.5-0.8B bundled with ChartReader, an image+text LoRA trained on ChartQA. The vision tower stays frozen; the LoRA adapts the language tower to answer chart questions the way ChartQA wants: short, exact, just the value. The base + sidecar keep full image+text inference, so the same repo works for general VLM tasks too.

Results

80 held-out ChartQA questions, base vs. base + ChartReader, images letterboxed to a 512 px canvas.

Metric	Base 0.8B	+ ChartReader	Δ
Relaxed accuracy	50.0%	55.0%	+5.0 pp
Exact match	26.2%	40.0%	+13.8 pp
Output similarity	0.385	0.598	+0.21

The base reads charts loosely and verbosely ("There are 10 food items shown in the bar graph"); ChartReader answers concisely ("3"). The big win is format and exact-match; the relaxed-accuracy gain is smaller but real.

Files

config.json, model.safetensors        the base Qwen3.5-0.8B OptiQ-4bit quant
optiq_vision.safetensors              vision sidecar (full image+text inference)
mtp.safetensors                       multi-token-prediction draft head
adapters/chartreader/                 the ChartQA LoRA (adapters.safetensors)

The LoRA is not merged into the weights — it rides alongside and applies at serve time, so you keep the plain base quant plus the ChartReader behavior.

Use

pip install mlx-optiq
huggingface-cli download mlx-community/chartreader-0.8B-OptiQ-4bit --local-dir ./chartreader

optiq serve --model ./chartreader --adapter ./chartreader/adapters/chartreader

Then send an image + a chart question to the OpenAI-compatible endpoint on localhost:8080. Without the --adapter, the same repo serves as the plain image+text base model.

How it was made

Image+text LoRA on the language tower, vision tower frozen:

optiq lora train mlx-community/Qwen3.5-0.8B-OptiQ-4bit \
    --vision --data ./chartqa/train.jsonl \
    --rank 8 --iters 800 --learning-rate 5e-5 --output ./chartreader

Trained on a 24 GB Apple Silicon Mac. Every image is letterboxed to a uniform 512 px canvas (uniform shape keeps training memory bounded), gradient checkpointing fits the hybrid gated-delta backward, and gradient clipping + a 5e-5 learning rate prevent the mode collapse short targets otherwise cause. The whole flow — build the dataset, run the LoRA — is also available in the OptiQ Lab.

Full write-up: Fine-tuning a vision model on a Mac.

Built with OptiQ — mixed-precision quantization, LoRA, and an OpenAI-compatible server for LLMs and VLMs on Apple Silicon. pip install mlx-optiq.