manga-ocr-attn-onnx
manga-ocr (DeiT-tiny vision encoder + 2-layer BERT decoder) in ONNX, with the
decoder re-exported to additionally emit cross_attention — the
decoder→encoder attention meaned over layers and heads. That extra output lets a
downstream consumer recover each emitted character's spatial position on the
encoder patch grid (for column-aware text overlays). The recognized text is
byte-identical to the stock export; only a second output tensor was added to
the graph.
Derived from
l0wgear/manga-ocr-2025-onnx— encoder + tokenizer/config, used unmodified.kha-white/manga-ocr— the original manga-ocr model (Apache-2.0), the basis of the ONNX export above.
Only the decoder's ONNX graph was re-exported (one added output); no weights were changed.
Files
| File | Description |
|---|---|
encoder_model.onnx |
DeiT-tiny vision encoder, 224×224 grayscale-from-RGB input. |
decoder_model.onnx |
2-layer BERT decoder. Outputs: logits + cross_attention. |
tokenizer.json, vocab.txt, special_tokens_map.json |
Tokenizer. |
config.json, generation_config.json, preprocessor_config.json |
Model / generation / preprocessing config. |
Credit
manga-ocr by kha-white (Apache-2.0); ONNX encoder export by l0wgear. Redistributed here under Apache-2.0.
- Downloads last month
- 27