This model punctuates Classical(ancient) Chinese, you might feel strange about this task, but many of my ancestors think writing articles without punctuation is brilliant idea 🧐. What we have here are articles from books, letters or carved on stones where you can see no punctuation, just a long string of characters. As you can guess, NLP tech is usually a good tool to tackle this problem, and the entire pipeline can be borrowed from usual NER task.
Since there are also many articles are punctuated, hence with some regex operations, labeled data is more than abundant 📚. That's why this problem is pretty much a low hanging fruit.
so I guess who's interested in the problem set can speak at least modern Chinese, hence... let me continue the documentation in Chinese.
输入一串未断句文言文， 可以断句， 目前支持二十多种标点符号
- Downloads last month