|
--- |
|
language: my |
|
tags: |
|
- MyanBERTa |
|
- Myanmar |
|
- BERT |
|
- RoBERTa |
|
license: apache-2.0 |
|
datasets: |
|
- MyCorpus |
|
- Web |
|
--- |
|
|
|
## Model description |
|
|
|
This model is a BERT based Myanmar pre-trained language model. |
|
MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words). |
|
As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied. |
|
|
|
Cite this work as: |
|
|
|
``` |
|
Aye Mya Hlaing, Win Pa Pa, "MyanBERTa: A Pre-trained Language Model For |
|
Myanmar", In Proceedings of 2022 International Conference on Communication and Computer Research (ICCR2022), November 2022, Seoul, Republic of Korea |
|
``` |
|
|
|
[Download Paper](https://journal-home.s3.ap-northeast-2.amazonaws.com/site/iccr2022/abs/QOHFI-0004.pdf) |
|
|
|
|
|
|
|
|