seanghay
/

xlm-roberta-khmer-small

Inference Endpoints

Model card Files Files and versions Community

seanghay commited on Jul 22

Commit

1eca14a

•

1 Parent(s): 5f7e131

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -13,6 +13,12 @@ Training from scratch using **Masked Language Modeling** task on 5M Khmer senten
 Training data is created by crawling publicly available publicly news sites and Wikipedia.
 ## Usage

 Training data is created by crawling publicly available publicly news sites and Wikipedia.
+## Why?
+1. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) is big. (279M paramerters, while this is only 49M parameters).
+2. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) is not optimized for Khmer language.
+3. [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) Vocab size is bigger (250,002) and this model uses 8000 vocab size.
 ## Usage