File size: 740 Bytes

c7e0443

---
license: apache-2.0
language:
- en
- fr
- de
---
**OCRerrcr** is a small language model specialized for the detection of OCR error.

OCRerrcr was trained by Elliot Jones for PleIAs on a sample of 1000 documents with labelled OCR errors from open data documents (Finance Commons) and cultural heritage sources (Common Corpus).

To date, OCRerrcr provide the most accurate agnostic OCR error rate estimate. PleIAs has also develop an alternative pipeline for this tasks, [OCRoscope](https://github.com/Pleias/OCRoscope), that scale significantly better but also significantly less accurate, especially for document with fewer mistakes.

The name OCRerrcr (instead of OCRerror) is a playful allusion to a common OCR misreading.

## Example