MIT 48px CTC OCR ONNX

Hugging Face GitHub

简体中文

This repository provides an ONNX conversion of the 48px CTC OCR model used by manga-image-translator.

The ONNX artifact is derived from the upstream PyTorch checkpoint ocr-ctc.ckpt from the beta-0.3 release asset ocr-ctc.zip.

Files

mit48pxctc_ocr.onnx
alphabet-all-v5.txt
metadata.json
LICENSE
NOTICE

Source

Model Contract

Input:

  • name: image
  • dtype: float32
  • shape: [batch, 3, 48, width]
  • color order: BGR
  • normalization: (uint8_pixel - 127.5) / 127.5

Outputs:

  • char_logits: [batch, time, vocab_size]
  • color_values: [batch, time, 6]

char_logits is not softmaxed. color_values is not clamped. The first dictionary entry is the CTC blank token. The special token <SP> represents a normal space.

Validation

The ONNX export was checked with onnx.checker and compared against the PyTorch checkpoint with ONNX Runtime CPU execution.

width=512: logits diff=0.000839233; colors diff=7.86781e-05
width=1024: logits diff=0.000980377; colors diff=6.19292e-05
width=1536: logits diff=0.000984192; colors diff=2.74777e-05

Export

The model was exported with:

uv run --extra export python scripts/export.py \
  --checkpoint origin_model/ocr-ctc.ckpt \
  --alphabet origin_model/alphabet-all-v5.txt \
  --output dist/mit48pxctc_ocr.onnx

License

This ONNX conversion and the accompanying files are distributed under GPL-3.0-only. See LICENSE.

The upstream project is GPL-3.0 licensed. Upstream authorship and copyright remain with the original authors and contributors of manga-image-translator and the model authors. See NOTICE for source attribution and redistribution authorization details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support