FineBooks page-router v0 (BASELINE — provisional labels)

Tiny 6-class page-layout router for the FineBooks "simple books" filter (planning #11). ConvNeXtV2-Pico fine-tune; one cheap CPU forward pass replaces a multi-stage hand-built signal suite (pp-ocrv6 boxes + PP-DocLayout + image-gutter + thresholds).

Classes: blank / single-col / multi-col / table / plate / other.

⚠️ Status: baseline on PROVISIONAL labels

Trained on ~264 images labelled by VLM subagents (NOT human-verified). Numbers are indicative, not shippable. Retrain on human-verified / expanded gold before production use.

Held-out test (66 pages, provisional gold)

metric value
accuracy 0.79
macro-F1 0.70
simple-book (single-col) P 0.81 / R 1.00 / F1 0.90

Per-class F1: blank 0.93, table 0.86, single-col 0.90, multi-col 0.75, plate 0.62, other 0.18 (hard class, few examples). Beats the hand-built suite on the simple-book gate (suite best ≈ P0.83/R0.54 or P0.49/R0.83).

Provenance

Full experiment trail: FineArchive/finebooks PR #4 (experiments/book-signal/).

Downloads last month
-
Safetensors
Model size
8.56M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for davanstrien/finebooks-page-router-v0

Finetuned
(2)
this model
Finetunes
1 model