GT-Free OCR Metrics โ Datasets & Models
Collection
Dataset and model collection for GT-Free OCR Metrics: Reference-Free Visual Similarity Evaluation for OCR Systems. โข 3 items โข Updated
A LoRA-adapted CLIP+DINOv2 similarity head trained to measure visual similarity between original document page scans and their OCR reconstructions. Released as part of the OmniDocBench Render-and-Compare project.
Trained on 20,280 triplets (anchor original page, positive OCR reconstruction, negative mis-matched reconstruction) derived from the OmniDocBench Render-and-Compare dataset.
| Setting | Value |
|---|---|
| Epochs | 3 |
| Batch size | 16 |
| Learning rate | 1e-4 |
| Train triplets | 19,266 |
| Val triplets | 1,014 |
| Best val accuracy | 99.90% |
| Margin (triplet loss) | 0.1 |
| File | Description |
|---|---|
lora_adapter_best/ |
LoRA adapter at best validation accuracy |
lora_adapter_final/ |
LoRA adapter at end of training |
head_state_best.pt |
Projection head weights (best checkpoint) |
head_state_final.pt |
Projection head weights (final epoch) |
config.json |
Full architecture config |
Use lora_adapter_best/ + head_state_best.pt for inference.
Download via GT-free-ocr-metrics:
bash download_models.sh
Then run any DocSim-based method:
bash scripts/run_method.sh docsim_lora
Apache-2.0. The companion datasets (OmniDocBench Render-and-Compare) are CC-BY-NC-4.0.
Base model
openai/clip-vit-base-patch32