I've released https://github.com/NopeNopeGuy/hayai-ocr please use that instead as it performs MUCH better. Note that model is actually still undertrained since I ran out of kaggle hours so may get updated to be MUCH better.

manga-ocr-finetuned

This model is a fine-tuned version of jzhang533/manga-ocr-base-2025

This is the evaluation of the manga-ocr models on the Evaluation set:

Model Name Full Eval Set CER (%)
kha-white/manga-ocr-base 37.45%
jzhang533/manga-ocr-base-2025 37.38%
JustANormalTinkerer/manga-ocr-finetuned (old) 26.25%
JustANormalTinkerer/manga-ocr-finetuned (new) 15%

Intended uses & limitations

For manga ocr with better english and SFX support (it's still bad, but less bad)

Training and evaluation data

The model was trained on a private dataset consisting of modern ENGLISH translated manga's like Kaguya-sama - Love Is War, Komi Can't Communicate, Nichijou, Akuyaku Reijou no Naka no Hito, Solo Leveling, Witch Hat Atelier (~6k image crops). It was also trained on a subset of the AnimeText dataset consisting of ~ 100k Japanese image crops and 10k English image crops and was also trained on 45k image crops of Manga109-s COO. It was further trained on a private dataset consisting of some volumes of the Shonen Jump, Young Jump, Made In Abyss, Stand By Me, Shonen Jump, Choukadou Girl, Tsurezure Children, La Vie en Doll, Shadows House, Komi-san, Yuri Hime, Isekai Ojisan, Kaguya-sama and some Indie Mangas from Pixiv, making up ~80k image crops that were psuedo-labeled by gemini-3.1-flash-lite. It was trained for 3 epochs on this data.

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Citations

@inproceedings{baek2026mangav26,
  author    = {Baek, Jeonghun and Miyai, Atsuyuki and Onohara, Shota and Ikuta, Hikaru and Aizawa, Kiyoharu},
  title     = {Revisiting Manga109 Annotations for Modern Manga Understanding},
  booktitle = {Culture × AI Workshop at ICML 2026},
  year      = {2026}
}

@article{aizawa2020building,
  author  = {Aizawa, Kiyoharu and Fujimoto, Azuma and Otsubo, Atsushi and Ogawa, Toru and Matsui, Yusuke and Tsubota, Koki and Ikuta, Hikaru},
  title   = {Building a Manga Dataset ``Manga109'' with Annotations for Multimedia Applications},
  journal = {IEEE MultiMedia},
  year    = {2020},
  volume  = {27},
  number  = {2},
  pages   = {8--18},
  doi     = {10.1109/mmul.2020.2987895}
}

@article{matsui2017sketch,
  author  = {Matsui, Yusuke and Ito, Kota and Aramaki, Yuji and Fujimoto, Azuma and Ogawa, Toru and Yamasaki, Toshihiko and Aizawa, Kiyoharu},
  title   = {Sketch-based Manga Retrieval using Manga109 Dataset},
  journal = {Multimedia Tools and Applications},
  year    = {2017},
  volume  = {76},
  number  = {20},
  pages   = {21811--21838},
  doi     = {10.1007/s11042-016-4020-z}
}

@inproceedings{baek2022coo,
  author    = {Baek, Jeonghun and Matsui, Yusuke and Aizawa, Kiyoharu},
  title     = {COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022}
}
Downloads last month
300
Safetensors
Model size
30.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JustANormalTinkerer/manga-ocr-finetuned

Finetuned
(1)
this model
Quantizations
1 model

Space using JustANormalTinkerer/manga-ocr-finetuned 1