--- license: creativeml-openrail-m language: - my tags: - Myanmar - Burmese - GPT2 - MyanmarGPT - Nautral Language Processing widget: - text: "အီတလီ" example_title: "Example 1" - text: "အနုပညာ" example_title: "Example 2" - text: "တရုတ်" example_title: "Example 3" - text: "ကျောက်ခေတ်" example_title: "Example 4" - text: "မြန်မာနိုင်ငံ" example_title: "Example 5" --- # Myanmar-GPT မြန်မာ(ဗမာ)လိုနားလည်သော GPT - Myanmar GPT Myanmar GPT is a model trained on a private Myanmar language dataset made by MinSiThu. The project aims to make the Myanmar language available in the GPT2 Model. Fine-tuning the MyanmarGPT model makes it easier to build a custom Myanmar language model than using alternative language models. Reports on training the MyanmarGPT model are visualized at [MyanmarGPT Report](https://api.wandb.ai/links/minsithu/wn8yul90). Variants of the Burmese Language-Enabled Models can be found at [https://github.com/MinSiThu/MyanmarGPT](https://github.com/MinSiThu/MyanmarGPT). There is also 1.42 billion parameters MyanmarGPT-Big model with multilanguage support. You are find [MyanmarGPT-Big Here](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Big). Currently, Myanmar GPT has four main variant versions. - [MyanmarGPT](https://huggingface.co/jojo-ai-mst/MyanmarGPT) - [MyanmarGPT-Big](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Big) - [MyanmarGPT-Chat](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Chat) - [MyanmarGPTX](https://huggingface.co/jojo-ai-mst/MyanmarGPTX) ## How to use in your project ``` !pip install transformers ``` ```python from transformers import pipeline generator = pipeline(model="jojo-ai-mst/MyanmarGPT") outputs = generator("အီတလီ",do_sample=False) print(outputs) # [{'generated_text': 'အီတလီနိုင်ငံသည် ဥရောပတိုက်၏ တောင်ဘက်တွင် မြေထဲပင်လယ်ထဲသို့ ထိုးထွက်နေသော ကျွန်းဆွယ်ကြီးတစ်ခုဖြစ်၍ ပုံသဏ္ဌာန်အားဖြင့် မြင်းစီးဖိနပ်နှင့် တူလေသည်။ မြောက်ဘက်မှ တောင်ဘက်အငူစွန်းအထိ မိုင်ပေါင်း ၇၅ဝ ခန့် ရှည်လျား၍၊ ပျမ်းမျှမိုင် ၁ဝဝ မှ ၁၂ဝ ခန့်ကျယ်သည်။ အီတလီနိုင်ငံ၏ အကျယ်အဝန်းမှာ ဆာဒင်းနီးယားကျွန်း၊ စစ္စလီကျွန်းနှင့် အနီးပတ်ဝန်းကျင်ရှိ ကျွန်းကလေးများ အပါအဝင် ၁၁၆,၃၅၀ စတုရန်းမိုင်ရှိသည်။ '}] ``` ### alternative ways ```python import torch from transformers import GPT2Tokenizer, GPT2LMHeadModel model = GPT2LMHeadModel.from_pretrained("jojo-ai-mst/MyanmarGPT") tokenizer = GPT2Tokenizer.from_pretrained("jojo-ai-mst/MyanmarGPT") def generate_text(prompt, max_length=300, temperature=0.8, top_k=50): input_ids = tokenizer.encode(prompt, return_tensors="pt").cuda() # remove .cude() if only cpu output = model.generate( input_ids, max_length=max_length, temperature=temperature, top_k=top_k, pad_token_id=tokenizer.eos_token_id, do_sample=True ) for result in output: generated_text = tokenizer.decode(result, skip_special_tokens=True) print(generated_text) generate_text("အီတလီ ") ``` ## RoadMap for Burmese Language and Artificial Intelligence I started MyanmarGPT, it has had a huge impact on Myanmar, thus I continue to move this project as a movement called [MyanmarGPT Movement](https://github.com/MyanmarGPT-Movement). MyanmarGPT Movement is for everyone to initiate AI projects in Myanmar. ## Here are the guidelines for using the MyanmarGPT license, - MyanmarGPT is free to use for everyone, - **Must Do** - any project derived/finetuned from MyanmarGPT, used MyanmarGPT internally, - or modified MyanmarGPT, related to MyanmarGPT **must mention the citation below** in the corresponding project's page. - the citation ```latex @software{MyanmarGPT, author = {{MinSiThu}}, title = {MyanmarGPT}, version={1.1-SweptWood} url = {https://huggingface.co/jojo-ai-mst/MyanmarGPT}, urldate = {2023-12-14} date = {2023-12-14}, } ``` For contact, reach me via [https://www.linkedin.com/in/min-si-thu/](https://www.linkedin.com/in/min-si-thu/)