AlienKevin
/

canto_ocr

Model card Files Files and versions Community

预训练模型

#1

by fengsong - opened Apr 19, 2023

Apr 19, 2023

请问采用的是哪家的预训练模型？TrOCR的预训练模型吗？有大一点的模型可以用不？谢谢！

Owner May 8, 2023

•

edited May 8, 2023

是基于 https://github.com/chineseocr/trocr-chinese 的简体中文预训练模型，他README有写百度网盘链接下载weights。我目前没找到更大的中文模型，不过我的广东话数据全是自动生成的，你如果有想要的使用场景，可以自己找语料和字体，然后生成训练集。LIHKG数据可以从我的repo下载：https://huggingface.co/datasets/AlienKevin/LIHKG 。其他的广东话数据还有Cantonese wikipedia: https://github.com/AlienKevin/cantonese_wikipedia_dump 。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment