README.md · simbolo-ai/Myanmarsar-GPT at 1b51419bbd692913577b39eaa4b26175e1507de5

metadata

license: mit
language:
  - my
pipeline_tag: text-generation
metrics:
  - code_eval
library_name: transformers
tags:
  - burmese
  - gpt2
  - pre-trained

The Simbolo's Myanmarsar-GPT symbol is trained on a dataset of 1 million Burmese data and pre-trained using the GPT-2 architecture. Its purpose is to serve as a foundational pre-trained model for the Burmese language, facilitating fine-tuning for specific applications of different tasks such as creative writing, chatbot, machine translation etc.

How to use

!pip install transformers

from transformers import pipeline

pipe = pipeline('text-generation',model='Simbolo-Servicio/myanmar-burmese-gpt', tokenizer='Simbolo-Servicio/myanmar-burmese-gpt',config={'max_length':500})
pipe('မြန်မာဘာသာစကား')
#

Data

The data utilized comprises 1 million sentences sourced from Wikipedia.

Contributors

Main Contributor: Sa Phyo Thu Htet (https://github.com/SaPhyoThuHtet) Wikipedia Data Crawling: Kaung Kaung Ko Ko, Phuu Pwint Thinzar Kyaing Releasing the Model: Eithandaraung, Ye Yint Htut, Thet Chit Su, Naing Phyo Aung

Limitations and bias

We have yet to thoroughly investigate the potential bias inherent in this model. Regarding transparency, it's important to note that the model is primarily trained on data from the Unicode Burmese(Myanmar) language.

References

Jiang, Shengyi & Huang, Xiuwen & Cai, Xiaonan & Lin, Nankai. (2021). Pre-trained Models and Evaluation Data for the Myanmar Language. 10.1007/978-3-030-92310-5_52.
Lin, N., Fu, Y., Chen, C., Yang, Z., & Jiang, S. (2021). LaoPLM: Pre-trained Language Models for Lao. ArXiv. /abs/2110.05896
MinSithu, MyanmarGPT, https://huggingface.co/jojo-ai-mst/MyanmarGPT, 1.1-SweptWood
Dr. Wai Yan Nyein Naing, WYNN747/Burmese-GPT, https://huggingface.co/WYNN747/Burmese-GPT
Sai Htaung Kham,saihtaungkham/BurmeseRoBERTaCLM