Edit model card

Model description

This model is a BERT based Myanmar pre-trained language model. MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words). As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.

Cite this work as:

Aye Mya Hlaing, Win Pa Pa, "MyanBERTa: A Pre-trained Language Model For
Myanmar", In Proceedings of 2022 International Conference on Communication and Computer Research (ICCR2022), November 2022, Seoul, Republic of Korea

Download Paper

Downloads last month
Hosted inference API
Mask token: <mask>
This model can be loaded on the Inference API on-demand.