File size: 2,281 Bytes
6c2059b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33be497
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c2059b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
language: mn
---

# ALBERT-Mongolian
[pretraining repo link](https://github.com/bayartsogt-ya/albert-mongolian)
## Model description
Here we provide pretrained ALBERT model and trained SentencePiece model for Mongolia text. Training data is the Mongolian wikipedia corpus from Wikipedia Downloads and Mongolian News corpus.

## Evaluation Result:
```
loss = 1.7478163
masked_lm_accuracy = 0.6838185
masked_lm_loss = 1.6687671
sentence_order_accuracy = 0.998125
sentence_order_loss = 0.007942731
```

## Fine-tuning Result on Eduge Dataset:
```
                          precision    recall  f1-score   support

            байгал орчин       0.85      0.83      0.84       999
               боловсрол       0.80      0.80      0.80       873
                   спорт       0.98      0.98      0.98      2736
               технологи       0.88      0.93      0.91      1102
                 улс төр       0.92      0.85      0.89      2647
              урлаг соёл       0.93      0.94      0.94      1457
                   хууль       0.89      0.87      0.88      1651
             эдийн засаг       0.83      0.88      0.86      2509
              эрүүл мэнд       0.89      0.92      0.90      1159

                accuracy                           0.90     15133
               macro avg       0.89      0.89      0.89     15133
            weighted avg       0.90      0.90      0.90     15133
```
## Reference
1. [ALBERT - official repo](https://github.com/google-research/albert)
2. [WikiExtrator](https://github.com/attardi/wikiextractor)
3. [Mongolian BERT](https://github.com/tugstugi/mongolian-bert)
4. [ALBERT - Japanese](https://github.com/alinear-corp/albert-japanese)
5. [Mongolian Text Classification](https://github.com/sharavsambuu/mongolian-text-classification)
6. [You's paper](https://arxiv.org/abs/1904.00962)

## Citation
```
@misc{albert-mongolian,
  author = {Bayartsogt Yadamsuren},
  title = {ALBERT Pretrained Model on Mongolian Datasets},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/bayartsogt-ya/albert-mongolian/}}
}
```

## For More Information
Please contact by bayartsogtyadamsuren@icloud.com