manga-ocr-attn-onnx

manga-ocr (DeiT-tiny vision encoder + 2-layer BERT decoder) in ONNX, with the decoder re-exported to additionally emit cross_attention — the decoder→encoder attention meaned over layers and heads. That extra output lets a downstream consumer recover each emitted character's spatial position on the encoder patch grid (for column-aware text overlays). The recognized text is byte-identical to the stock export; only a second output tensor was added to the graph.

Derived from

Only the decoder's ONNX graph was re-exported (one added output); no weights were changed.

Files

File Description
encoder_model.onnx DeiT-tiny vision encoder, 224×224 grayscale-from-RGB input.
decoder_model.onnx 2-layer BERT decoder. Outputs: logits + cross_attention.
tokenizer.json, vocab.txt, special_tokens_map.json Tokenizer.
config.json, generation_config.json, preprocessor_config.json Model / generation / preprocessing config.

Credit

manga-ocr by kha-white (Apache-2.0); ONNX encoder export by l0wgear. Redistributed here under Apache-2.0.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support