---
language: my
tags:
- MyanBERTa
- Myanmar
- BERT
- RoBERTa
license: apache-2.0
datasets:
- MyCorpus
- publicly available blogs and websites
---

## Model description

This model is a BERT based Myanmar pre-trained language model.
MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words).
As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.

```
Contributed by:
Aye Mya Hlaing
Win Pa Pa
```