manchu-ocr-crnn-step1-syn
CRNN baseline for Manchu script OCR (Manchu graph only — does not produce romanization). Step-1 baseline trained on synthetic data only.
Architecture: ResNet-style CNN backbone → adaptive pool to fixed height → 4-layer BiLSTM (hidden=256) → CTC head.
Training data: 60k synthetic Manchu OCR images from mic7ch/manchu-2025-0033.
This is the real_val-peak checkpoint by manchu_word_accuracy on a held-out 1000-sample real validation split — the same selection rule used for the VLM models in the paper.
Best checkpoint
- Step:
checkpoint-251250.pth(uploaded asbest_model.pth)
Evaluation metrics
Word accuracy and character error rate at the selected step:
| Split | manchu_word_accuracy | manchu_cer |
|---|---|---|
| synthetic-val (1000) | 99.07% | 0.093% |
| real-val (1000, held-out) | 73.70% | 7.910% |
| real-test (753) | 60.03% | 14.996% |
(Roman transliteration is N/A — CRNN was trained on Manchu graph only.)
Training recipe
- Architecture: CRNN with ResNet-style backbone + 4-layer BiLSTM + CTC
- Hidden size: 256
- Input: 480×64 grayscale
- Optimizer: AdamW, lr=1e-3, weight_decay=0.05, betas=(0.9, 0.999)
- Scheduler: CosineAnnealingWarmRestarts (T_0=10, T_mult=2)
- Batch size: 16
- Epochs: 100
- Mixed precision: enabled
- Gradient clipping: max_norm=1.0
- Selection metric:
manchu_word_accuracyon held-outreal_val
Hyperparameters reproduce the prior baseline in
mic7ch/manchu-ocr-crnn-base-3m
(itself based on https://github.com/mic7ch1/ManchuAI-OCR), differing only in training data composition.
Usage
import torch
from huggingface_hub import hf_hub_download
# Requires the CRNN code at https://github.com/<your-fork>/hongtaiji_parallel
# (or use the standalone bundle in `crnn_standalone/`).
from src.CRNN.inference import CRNNInference
ckpt_path = hf_hub_download(repo_id="mic7ch/manchu-ocr-crnn-step1-syn", filename="best_model.pth")
ocr = CRNNInference(ckpt_path)
ocr.load_model()
text = ocr.predict("path/to/image.png")
print(text)
The .pth file is self-contained: it stores the model state_dict alongside char2idx, idx2char, and architectural hyperparameters (hidden_size, etc.), so no separate config is required for inference.
Citation
Paper forthcoming. Please cite the repository meanwhile:
@software{manchu_ocr_2026,
author = {Chung, H.-M. and collaborators},
title = {Vision-language-model OCR for Manchu script},
year = {2026},
url = {https://huggingface.co/mic7ch/manchu-ocr-crnn-step1-syn}
}