README.md · UCSYNLP/MyanBERTa at 6098c80e79f73fea11fb56ef3448b11f2d8156f2

metadata

language: my
tags:
  - MyanBERTa
  - Myanmar
  - BERT
  - RoBERTa
license: apache-2.0
datasets:
  - MyCorpus
  - publicly available blogs and websites

Model description

This model is a BERT based Myanmar pre-trained language model. MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words). As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.

Contributed by:
Aye Mya Hlaing
Win Pa Pa