Model Card for index-card-detector-v5

YOLO26n object detector for archival index cards. Builds on index-card-detector-v3 with a higher-quality navy training set (re-labelled via a v3+v4 model-ensemble + targeted human review).

Model Details

Model Description

Fine-tuned from NationalLibraryOfScotland/archival-index-card-detector/model.pt (YOLO26n) on small-models-for-glam/index-card-detection-v5 β€” 1,425 archival scans across four institutions, with the 25 navy multi-card scans re-labelled via ensemble + review.

Compared to predecessors:

  • vs v3 β€” same training distribution, refined navy bboxes
  • vs v4 β€” single-scale training (we found v4's multi-scale augmentation traded too much in-distribution recall for scale robustness; v4 missed 2 of 9 cards on a navy 3Γ—3 grid where v3 nailed all 9)

Single class: card.

  • Developed by: Daniel van Strien, Machine Learning Librarian, Hugging Face
  • Model type: Object detection (YOLO26n, single class, 2.5M params, ~5.5 MB)
  • Language(s): en (cards are English; model is visually language-agnostic but evaluated on English archives)
  • License: AGPL-3.0 (inherits from upstream Ultralytics / NLS baseline)
  • Finetuned from: NationalLibraryOfScotland/archival-index-card-detector

Model Sources

Uses

Direct Use

Run the model on any archival scan to locate index cards. Returns one bounding box per detected card. Pair with a downstream OCR/VLM model for content extraction.

Out-of-Scope Use

  • OCR / content extraction β€” detection only.
  • Content classification β€” single class; for blank/content filtering see small-models-for-glam/index-card-blank-detector.
  • Non-English cards β€” training data is English-only.
  • Card-style traditions outside US/UK archives

Bias, Risks, and Limitations

  • No true non-card negatives in training beyond the 10 NLS background pages. The model may over-predict on newspaper clippings, photographs, or book pages when they appear in mixed archival scans.
  • Multi-card variety concentrated in 25 Navy images. Layouts very different from those (e.g., 12+ card grids, severely overlapping or rotated cards) are out-of-distribution.
  • BPL + Rubenstein training labels are auto-generated (bbox = whole image for pre-cropped sources). Boxes on cropped-card inputs are loose by construction.

Recommendations

  • For mixed-content archives, apply a downstream content classifier or add explicit negatives in your fine-tune set.
  • For pixel-accurate cropping on pre-cropped inputs, validate before bulk processing.
  • For non-English collections, fine-tune with additional samples from that tradition.

How to Get Started

from huggingface_hub import hf_hub_download
from ultralytics import YOLO

weights = hf_hub_download(
    repo_id="small-models-for-glam/index-card-detector-v5",
    filename="best.pt",
)
model = YOLO(weights)
results = model.predict("your_scan.jpg", conf=0.25, imgsz=1024)[0]
for box in results.boxes.xyxy.cpu().tolist():
    print(box)  # [x1, y1, x2, y2] in pixel coords

ONNX (faster CPU inference)

weights = hf_hub_download(
    repo_id="small-models-for-glam/index-card-detector-v5",
    filename="best.onnx",
)
model = YOLO(weights)  # ultralytics auto-uses onnxruntime
results = model.predict("your_scan.jpg", conf=0.25, imgsz=1024)[0]

The ONNX export uses dynamic spatial axes, so it accepts variable input sizes without re-export. ~2–3Γ— faster than the PyTorch .pt checkpoint on CPU.

Training Details

Training Data

small-models-for-glam/index-card-detection-v5 β€” 1,425 images across four collections; navy re-labelled via v3+v4 ensemble + human review (see dataset card for full provenance).

Training Procedure

Training Hyperparameters

  • Base architecture: YOLO26n
  • Initial weights: NationalLibraryOfScotland/archival-index-card-detector/model.pt
  • Epochs cap: 80 (early-stopped via patience 20)
  • Image size: 1024
  • multi_scale: False (single-scale β€” we found multi-scale traded too much recall)
  • Batch: auto (ultralytics batch=-1, picks ~13 on l4x1)
  • Optimizer: AdamW (auto-tuned)
  • LR schedule: cosine (cos_lr=True)
  • Train/val split: stratified 80/20 per collection (deterministic seed 42)
  • Augmentation: mosaic=0.5, fliplr=0.5, hsv_h=0.015, hsv_s=0.4, hsv_v=0.3, degrees=3, translate=0.05, scale=0.25

Compute

  • Hardware: HF Jobs l4x1 (1Γ— NVIDIA L4)
  • Training time: ~15 minutes (early-stopped)
  • Final weights: 5.5 MB (best.pt), 10.7 MB (best.onnx)

Evaluation

Results

(See Hub model page for the live per-collection val mAP table β€” auto-populated by the training script's final model card.)

mAP@50 is essentially saturated (~0.995) across all collections, matching v3 and beating v4. mAP@50:95 is also strong because we avoided the multi-scale recall trade-off.

Comparison

Model navy mAP50:95 9-card grid recall Multi-scale Notes
v3 0.929 9/9 No Original baseline
v4 0.764 7/9 (at conf 0.25) Yes Recall regression on busy multi-card scenes
v5 (this) TBD TBD No v3 + refined navy labels

Citation

@misc{vanstrien_index_card_detector_v5_2026,
  author    = {van Strien, Daniel},
  title     = {{index-card-detector-v5: YOLO26n for archival index cards, ensemble-refined}},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/small-models-for-glam/index-card-detector-v5}
}

Model Card Authors

Daniel van Strien

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for small-models-for-glam/index-card-detector-v5

Dataset used to train small-models-for-glam/index-card-detector-v5

Space using small-models-for-glam/index-card-detector-v5 1