manga-ocr-attn-onnx

manga-ocr (DeiT-tiny vision encoder + 2-layer BERT decoder) in ONNX, with the decoder re-exported to additionally emit cross_attention — the decoder→encoder attention meaned over layers and heads. That extra output lets a downstream consumer recover each emitted character's spatial position on the encoder patch grid (for column-aware text overlays). The recognized text is byte-identical to the stock export; only a second output tensor was added to the graph.

Derived from

l0wgear/manga-ocr-2025-onnx — encoder + tokenizer/config, used unmodified.
kha-white/manga-ocr — the original manga-ocr model (Apache-2.0), the basis of the ONNX export above.

Only the decoder's ONNX graph was re-exported (one added output); no weights were changed.

Files

File	Description
`encoder_model.onnx`	DeiT-tiny vision encoder, 224×224 grayscale-from-RGB input.
`decoder_model.onnx`	2-layer BERT decoder. Outputs: `logits` + `cross_attention`.
`tokenizer.json`, `vocab.txt`, `special_tokens_map.json`	Tokenizer.
`config.json`, `generation_config.json`, `preprocessor_config.json`	Model / generation / preprocessing config.

Credit

manga-ocr by kha-white (Apache-2.0); ONNX encoder export by l0wgear. Redistributed here under Apache-2.0.

Downloads last month: 27