YOLO Date Stamp Detector

Source code, training pipeline, and blog post: 👉 github.com/pike00/yolo-datestamp-detector

Fine-tuned YOLO26-medium that draws a bounding box around the camera date stamp burned into the corner of a scanned photograph. Single class, one box per photo (when present).

Many consumer cameras from the 1980s-2000s imprinted date stamps directly onto film: small orange/red/amber LED digits (typically M D 'YY, e.g. 10 3 '99) burned into the bottom edge of each photo. When these photos are later bulk-scanned, the date stamp is often the only reliable source of temporal metadata. This model locates those stamp regions so a downstream OCR step can extract the actual date and write it back as EXIF metadata.

The model draws a bounding box around the orange 4 23 '95 date stamp in the bottom-right corner.

Quick use

from ultralytics import YOLO
from huggingface_hub import hf_hub_download

weights = hf_hub_download("pike00/yolo-date-stamp-detector", "best.pt")
model = YOLO(weights)

results = model("your_scanned_photo.jpg", conf=0.5)
for r in results:
    for box in r.boxes:
        xyxy = box.xyxy[0].tolist()    # [x1, y1, x2, y2]
        conf = float(box.conf[0])
        print(f"stamp @ {xyxy}  conf={conf:.2f}")

No detection means "no date stamp visible" -- about half of real scans fall into this bucket, and the model is trained on negatives so empty results are meaningful.

Metrics

Evaluated on a 663-image held-out validation split:

Metric	Value
Precision	0.959
Recall	0.961
mAP@50	0.962
mAP@50-95	0.754

The lower mAP@50-95 reflects some looseness in tight bounding-box localization, which is fine for this use case -- the box only needs to roughly locate the stamp region for downstream cropping and OCR.

Confusion matrix: 587 true positives, 17 stamps missed (recall 97.2%), and 40 background-only images where the model drew a spurious box. No-stamp photos were added as negative training examples to keep the FP rate down.

The PR curve hugs the top-right corner. Precision stays above 95% across almost the entire recall range before dropping off past recall 0.97.

Peak F1 of 0.96 at confidence threshold 0.50, with a broad plateau from 0.1 to 0.85 -- the model is robust to threshold selection.

Training

Base: YOLO26-medium (~20M params), pretrained on COCO
Data: ~3,000 hand-labeled scanned 4x6 photos from the ScanMyPhotos service, 1980s-2000s family photos, single class (target = date stamp region)
Config: imgsz=416, batch=16, epochs=40
Hardware: AWS g4dn.xlarge on-demand (single T4 GPU)
Wall time: 33.55 min (50.3 s/epoch)
Cost: ~$0.35 including instance spin-up and data staging

A prior CPU training run on a Ryzen 5 5600G reached similar precision/recall in about 9 hours. Both runs are described in detail in the GitHub README.

Drift vs. prior CPU model

Re-running the new GPU weights across the same 6,458 scans the CPU model had already seen, box-for-box:

Category	Count	Share
stable (IoU ≥ 0.5 with old box)	6,229	96.5%
drift (IoU < 0.5, new box in a different place)	197	3.0%
gone (new model finds nothing where old one did)	32	0.5%

Median IoU across stable: 0.92. Mean confidence jumped from 0.70 → 0.85, and the count of predictions clearing conf ≥ 0.7 went from 4,583 to 6,075 -- about 1,500 borderline cases got promoted into "obviously correct" territory.

Red bars: CPU model. Green bars: GPU model. The GPU run collapses the distribution into a single dominant peak above 0.85 -- the secondary cluster around 0.3 that was driving most of the manual-review work is gone.

Intended use

Preprocessing step for OCR pipelines extracting dates from bulk-scanned photo archives (e.g. ScanMyPhotos, DigMyPics, local flatbed scans)
Automated EXIF DateTimeOriginal backfill for photo management tools
Any workflow that needs to locate an orange/red LED digit cluster in the corner of a consumer photo scan

Out of scope / likely to fail:

Handwritten dates on the back of photos
Date stamps rendered in color schemes other than orange/red/amber LED
Polaroid-style white borders with printed dates
Dates on non-photographic documents

Limitations

Trained almost exclusively on 4x6 color photos, 1986-2010 era, North American consumer cameras. Other formats and regions are untested.
The single target class lumps together all digit clusters -- no separate class for "clearly a date" vs "orange blob that looks like a date."
About 6% of the validation set residual error comes from false positives on background (typically orange lettering on signs or toys).

Training code, pipeline, and blog post: https://github.com/pike00/yolo-datestamp-detector
Base model family: Ultralytics YOLO26
Downstream OCR: Claude Haiku for digit reading, Tesseract fallback

License

AGPL-3.0, inherited from the Ultralytics YOLO training framework used to fine-tune this model.

Downloads last month: 5

Model tree for pike00/yolo-date-stamp-detector

Base model

Ultralytics/YOLO26

Finetuned

(48)

this model

Evaluation results

Precision
self-reported

0.959
Recall
self-reported

0.961
mAP@50
self-reported

0.962
mAP@50-95
self-reported

0.754

pike00
/

yolo-date-stamp-detector