19c Newspaper Column YOLO Detector

This repository contains a YOLO detector fine-tuned to detect and segment columns in 19th-century American newspaper pages.

Extracting individual columns is a critical pre-processing step for historical OCR pipelines. Slicing long columns into smaller, overlapping horizontal strips resolves the text-hallucination/repetition issues that modern Vision-Language Models (like Gemma 2B/27B) encounter on full-page layouts.

Model Training & Performance

The model was trained on annotated historical newspaper pages. You can view the training metrics, curves, and validation performance directly below:

1. Training Metrics (results.png)

Shows training/validation loss decay and precision/recall improvements over epochs: Training Curves

2. Confusion Matrix

Displays normalized classification performance: Confusion Matrix

3. Model Predictions vs. Ground Truth

Compare the validation batch labels (ground truth annotations) with the actual predictions generated by the trained model:

Ground Truth Labels (val_batch0_labels.jpg) Model Predictions (val_batch0_pred.jpg)
Ground Truth Model Predictions

How to Run This Model

You can easily download and run this model in Python using the ultralytics package.

Installation

pip install ultralytics huggingface_hub

Python Inference Code

from huggingface_hub import hf_hub_download
from ultralytics import YOLO

# 1. Download the weights from Hugging Face
model_path = hf_hub_download(repo_id="ambrosfitz/19c-newspaper-column-yolo", filename="best.pt")

# 2. Load the model
model = YOLO(model_path)

# 3. Perform detection on a newspaper page image
results = model("path_to_newspaper_page.jpg")

# 4. Display or save the segmented columns
results[0].show()
# results[0].save(filename="output.jpg")
Downloads last month
70
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support