ziqingyang commited on
Commit
5c8fbe1
1 Parent(s): 9373373

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ - bo
5
+ - kk
6
+ - ko
7
+ - mn
8
+ - ug
9
+ - yue
10
+ license: "apache-2.0"
11
+ ---
12
+
13
+ ## CINO: Pre-trained Language Models for Chinese Minority Languages(中国少数民族预训练模型)
14
+
15
+ Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding.
16
+ We have seen rapid progress on building multilingual PLMs in recent year.
17
+ However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems.
18
+
19
+ To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as
20
+ - Chinese,中文(zh)
21
+ - Tibetan,藏语(bo)
22
+ - Mongolian (Uighur form),蒙语(mn)
23
+ - Uyghur,维吾尔语(ug)
24
+ - Kazakh (Arabic form),哈萨克语(kk)
25
+ - Korean,朝鲜语(ko)
26
+ - Zhuang,壮语
27
+ - Cantonese,粤语(yue)
28
+
29
+ Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM
30
+
31
+ You may also interested in,
32
+
33
+ Chinese MacBERT: https://github.com/ymcui/MacBERT
34
+ Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm
35
+ Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA
36
+ Chinese XLNet: https://github.com/ymcui/Chinese-XLNet
37
+ Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer
38
+
39
+ More resources by HFL: https://github.com/ymcui/HFL-Anthology
40
+