historical-ocr / docs /preprocessing_triage.md
milwright's picture
Consolidate segmentation improvements and code cleanup
42dc069

A newer version of the Streamlit SDK is available: 1.45.1

Upgrade

OCR Preprocessing Triage

Quick Fixes Implemented

  1. Handwritten - Disabled thresholding, uses grayscale only
  2. Newspapers - Increased block size (51) and constant (10) for softer thresholding
  3. JPEG Artifacts - Auto-detection and specialized denoising
  4. Border Issues - Crops edges after deskew to avoid threshold problems
  5. Low Resolution - Upscales small text for better recognition

Testing

python testing/test_triage_fix.py

Check output/comparison/ for results.