Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,12 @@ Training from scratch using **Masked Language Modeling** task on 5M Khmer senten
|
|
13 |
Training data is created by crawling publicly available publicly news sites and Wikipedia.
|
14 |
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
## Usage
|
17 |
|
18 |
|
|
|
13 |
Training data is created by crawling publicly available publicly news sites and Wikipedia.
|
14 |
|
15 |
|
16 |
+
## Why?
|
17 |
+
|
18 |
+
1. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) is big. (279M paramerters, while this is only 49M parameters).
|
19 |
+
2. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) is not optimized for Khmer language.
|
20 |
+
3. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) Vocab size is bigger (250,002) and this model uses 8000 vocab size.
|
21 |
+
|
22 |
## Usage
|
23 |
|
24 |
|