seanghay commited on
Commit
1eca14a
1 Parent(s): 5f7e131

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -13,6 +13,12 @@ Training from scratch using **Masked Language Modeling** task on 5M Khmer senten
13
  Training data is created by crawling publicly available publicly news sites and Wikipedia.
14
 
15
 
 
 
 
 
 
 
16
  ## Usage
17
 
18
 
 
13
  Training data is created by crawling publicly available publicly news sites and Wikipedia.
14
 
15
 
16
+ ## Why?
17
+
18
+ 1. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) is big. (279M paramerters, while this is only 49M parameters).
19
+ 2. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) is not optimized for Khmer language.
20
+ 3. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) Vocab size is bigger (250,002) and this model uses 8000 vocab size.
21
+
22
  ## Usage
23
 
24