raynardj's picture
Create README.md
da1b1fa
metadata
language:
  - zh
tags:
  - ner
  - punctuation
  - 古文
  - 文言文
  - ancient
  - classical
widget:
  - text: 伐薪烧炭南山中满面灰尘烟火色

Classical Chinese Punctuation

欢迎前往我的github文言诗词项目页面探讨、加⭐️ , Please check the github repository for more about the model, hit 🌟 if you like

  • This model punctuates Classical(ancient) Chinese, you might feel strange about this task, but many of my ancestors think writing articles without punctuation is brilliant idea 🧐. What we have here are articles from books, letters or carved on stones where you can see no punctuation, just a long string of characters. As you can guess, NLP tech is usually a good tool to tackle this problem, and the entire pipeline can be borrowed from usual NER task.

  • Since there are also many articles are punctuated, hence with some regex operations, labeled data is more than abundant 📚. That's why this problem is pretty much a low hanging fruit.

  • so I guess who's interested in the problem set can speak at least modern Chinese, hence... let me continue the documentation in Chinese.

文言文(古文) 断句模型

输入一串未断句文言文, 可以断句, 目前支持二十多种标点符号

其他拙劣的模型, 也捧个场