UCSYNLP
/

MyanBERTa

Inference Endpoints

Model card Files Files and versions Community

MyanBERTa / README.md

UCSYNLP's picture

Update README.md

93af5c8 almost 2 years ago

|

history blame contribute delete

No virus

801 Bytes

	---
	language: my
	tags:
	- MyanBERTa
	- Myanmar
	- BERT
	- RoBERTa
	license: apache-2.0
	datasets:
	- MyCorpus
	- Web
	---

	## Model description

	This model is a BERT based Myanmar pre-trained language model.
	MyanBERTa was pre-trained for 528K steps on a word segmented Myanmar dataset consisting of 5,992,299 sentences (136M words).
	As the tokenizer, byte-leve BPE tokenizer of 30,522 subword units which is learned after word segmentation is applied.

	Cite this work as:

	```
	Aye Mya Hlaing, Win Pa Pa, "MyanBERTa: A Pre-trained Language Model For
	Myanmar", In Proceedings of 2022 International Conference on Communication and Computer Research (ICCR2022), November 2022, Seoul, Republic of Korea
	```

	[Download Paper](https://journal-home.s3.ap-northeast-2.amazonaws.com/site/iccr2022/abs/QOHFI-0004.pdf)