File size: 801 Bytes
8efc325
f3fa478
 
 
9893af9
 
 
f3fa478
 
 
4725e29
8efc325
f3fa478
 
 
 
6098c80
f3fa478
 
4725e29
 
 
 
93af5c8
4725e29
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
language: my
tags:
- MyanBERTa
- Myanmar
- BERT
- RoBERTa
license: apache-2.0
datasets:
- MyCorpus
- Web
---

## Model description

This model is a BERT based Myanmar pre-trained language model.
MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words).
As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.

Cite this work as:

```
Aye Mya Hlaing, Win Pa Pa, "MyanBERTa: A Pre-trained Language Model For
Myanmar", In Proceedings of 2022 International Conference on Communication and Computer Research (ICCR2022), November 2022, Seoul, Republic of Korea
```

[Download Paper](https://journal-home.s3.ap-northeast-2.amazonaws.com/site/iccr2022/abs/QOHFI-0004.pdf)