--- language: my tags: - MyanBERTa - Myanmar - BERT - RoBERTa license: apache-2.0 datasets: - MyCorpus - publicly available blogs and websites --- ## Model description This model is a BERT based Myanmar pre-trained language model. MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words). As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied. ``` Contributed by: Aye Mya Hlaing Win Pa Pa ```