MyanmarGPT / README.md
jojo-ai-mst's picture
Update README.md
247ebbb verified
---
license: creativeml-openrail-m
language:
- my
tags:
- Myanmar
- Burmese
- GPT2
- MyanmarGPT
- Nautral Language Processing
widget:
- text: "အီတလီ"
example_title: "Example 1"
- text: "အနုပညာ"
example_title: "Example 2"
- text: "တရုတ်"
example_title: "Example 3"
- text: "ကျောက်ခေတ်"
example_title: "Example 4"
- text: "မြန်မာနိုင်ငံ"
example_title: "Example 5"
---
# Myanmar-GPT
မြန်မာ(ဗမာ)လိုနားလည်သော GPT - Myanmar GPT
Myanmar GPT is a model trained on a private Myanmar language dataset made by MinSiThu.
The project aims to make the Myanmar language available in the GPT2 Model.
Fine-tuning the MyanmarGPT model makes it easier to build a custom Myanmar language model than using alternative language models.
Reports on training the MyanmarGPT model are visualized at [MyanmarGPT Report](https://api.wandb.ai/links/minsithu/wn8yul90).
Variants of the Burmese Language-Enabled Models can be found at [https://github.com/MinSiThu/MyanmarGPT](https://github.com/MinSiThu/MyanmarGPT).
There is also 1.42 billion parameters MyanmarGPT-Big model with multilanguage support.
You are find [MyanmarGPT-Big Here](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Big).
Currently, Myanmar GPT has four main variant versions.
- [MyanmarGPT](https://huggingface.co/jojo-ai-mst/MyanmarGPT)
- [MyanmarGPT-Big](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Big)
- [MyanmarGPT-Chat](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Chat)
- [MyanmarGPTX](https://huggingface.co/jojo-ai-mst/MyanmarGPTX)
## How to use in your project
```
!pip install transformers
```
```python
from transformers import pipeline
generator = pipeline(model="jojo-ai-mst/MyanmarGPT")
outputs = generator("အီတလီ",do_sample=False)
print(outputs)
# [{'generated_text': 'အီတလီနိုင်ငံသည် ဥရောပတိုက်၏ တောင်ဘက်တွင် မြေထဲပင်လယ်ထဲသို့ ထိုးထွက်နေသော ကျွန်းဆွယ်ကြီးတစ်ခုဖြစ်၍ ပုံသဏ္ဌာန်အားဖြင့် မြင်းစီးဖိနပ်နှင့် တူလေသည်။ မြောက်ဘက်မှ တောင်ဘက်အငူစွန်းအထိ မိုင်ပေါင်း ၇၅ဝ ခန့် ရှည်လျား၍၊ ပျမ်းမျှမိုင် ၁ဝဝ မှ ၁၂ဝ ခန့်ကျယ်သည်။ အီတလီနိုင်ငံ၏ အကျယ်အဝန်းမှာ ဆာဒင်းနီးယားကျွန်း၊ စစ္စလီကျွန်းနှင့် အနီးပတ်ဝန်းကျင်ရှိ ကျွန်းကလေးများ အပါအဝင် ၁၁၆,၃၅၀ စတုရန်းမိုင်ရှိသည်။ '}]
```
### alternative ways
```python
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained("jojo-ai-mst/MyanmarGPT")
tokenizer = GPT2Tokenizer.from_pretrained("jojo-ai-mst/MyanmarGPT")
def generate_text(prompt, max_length=300, temperature=0.8, top_k=50):
input_ids = tokenizer.encode(prompt, return_tensors="pt").cuda() # remove .cude() if only cpu
output = model.generate(
input_ids,
max_length=max_length,
temperature=temperature,
top_k=top_k,
pad_token_id=tokenizer.eos_token_id,
do_sample=True
)
for result in output:
generated_text = tokenizer.decode(result, skip_special_tokens=True)
print(generated_text)
generate_text("အီတလီ ")
```
## RoadMap for Burmese Language and Artificial Intelligence
I started MyanmarGPT, it has had a huge impact on Myanmar, thus I continue to move this project as a movement called [MyanmarGPT Movement](https://github.com/MyanmarGPT-Movement).
MyanmarGPT Movement is for everyone to initiate AI projects in Myanmar.
## Here are the guidelines for using the MyanmarGPT license,
- MyanmarGPT is free to use for everyone,
- **Must Do**
- any project derived/finetuned from MyanmarGPT, used MyanmarGPT internally,
- or modified MyanmarGPT, related to MyanmarGPT **must mention the citation below** in the corresponding project's page.
- the citation
```latex
@software{MyanmarGPT,
author = {{MinSiThu}},
title = {MyanmarGPT},
version={1.1-SweptWood}
url = {https://huggingface.co/jojo-ai-mst/MyanmarGPT},
urldate = {2023-12-14}
date = {2023-12-14},
}
```
For contact, reach me via [https://www.linkedin.com/in/min-si-thu/](https://www.linkedin.com/in/min-si-thu/)