--- license: cc-by-nc-4.0 --- Chinese tokenizer trained on Baike by using BPE algorithm