--- language: mn --- # ALBERT-Mongolian [pretraining repo link](https://github.com/bayartsogt-ya/albert-mongolian) ## Model description Here we provide pretrained ALBERT model and trained SentencePiece model for Mongolia text. Training data is the Mongolian wikipedia corpus from Wikipedia Downloads and Mongolian News corpus. ## Evaluation Result: ``` loss = 1.7478163 masked_lm_accuracy = 0.6838185 masked_lm_loss = 1.6687671 sentence_order_accuracy = 0.998125 sentence_order_loss = 0.007942731 ``` ## Fine-tuning Result on Eduge Dataset: ``` precision recall f1-score support байгал орчин 0.85 0.83 0.84 999 боловсрол 0.80 0.80 0.80 873 спорт 0.98 0.98 0.98 2736 технологи 0.88 0.93 0.91 1102 улс төр 0.92 0.85 0.89 2647 урлаг соёл 0.93 0.94 0.94 1457 хууль 0.89 0.87 0.88 1651 эдийн засаг 0.83 0.88 0.86 2509 эрүүл мэнд 0.89 0.92 0.90 1159 accuracy 0.90 15133 macro avg 0.89 0.89 0.89 15133 weighted avg 0.90 0.90 0.90 15133 ``` ## Reference 1. [ALBERT - official repo](https://github.com/google-research/albert) 2. [WikiExtrator](https://github.com/attardi/wikiextractor) 3. [Mongolian BERT](https://github.com/tugstugi/mongolian-bert) 4. [ALBERT - Japanese](https://github.com/alinear-corp/albert-japanese) 5. [Mongolian Text Classification](https://github.com/sharavsambuu/mongolian-text-classification) 6. [You's paper](https://arxiv.org/abs/1904.00962) ## Citation ``` @misc{albert-mongolian, author = {Bayartsogt Yadamsuren}, title = {ALBERT Pretrained Model on Mongolian Datasets}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/bayartsogt-ya/albert-mongolian/}} } ``` ## For More Information Please contact by bayartsogtyadamsuren@icloud.com