File size: 3,240 Bytes
e43e9ba
 
 
 
 
 
 
 
 
 
 
 
 
 
e5f312c
 
e43e9ba
 
 
 
 
 
 
abf979e
1609154
abf979e
e43e9ba
 
 
 
 
 
 
 
 
 
 
 
 
f114e5a
e43e9ba
 
049b4ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e43e9ba
f114e5a
e43e9ba
 
 
 
 
 
 
f114e5a
b9f448a
e43e9ba
f114e5a
e43e9ba
 
b9f448a
e43e9ba
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: creativeml-openrail-m
language:
- my
tags:
- Myanmar
- Burmese
- GPT2
- MyanmarGPT
- Nautral Language Processing
---

# Myanmar-GPT

မြန်မာ(ဗမာ)လိုနားလည်သော GPT - Myanmar GPT 

Myanmar GPT is a model trained on a private Myanmar language dataset made by MinSiThu.
The project aims to make the Myanmar language available in the GPT2 Model.

Fine-tuning the MyanmarGPT model makes it easier to build a custom Myanmar language model than using alternative language models.

Reports on training the MyanmarGPT model are visualized at [MyanmarGPT Report](https://api.wandb.ai/links/minsithu/wn8yul90).

There is also 1.42 billion parameters MyanmarGPT-Big model with multilanguage support.
You are find [MyanmarGPT-Big Here](https://huggingface.co/jojo-ai-mst/MyanmarGPT-Big).

## How to use in your project

```
!pip install transformers
```

```python
from transformers import pipeline

generator = pipeline(model="jojo-ai-mst/MyanmarGPT")
outputs = generator("အီတလီ",do_sample=False)

print(outputs)
# [{'generated_text': 'အီတလီနိုင်ငံသည် ဥရောပတိုက်၏ အမျိုးသားရေးရာ ကိစ္စများကို ရပ်ဖက်အာဏာရှိသော စီချလျက်ရှိနေခဲ့ရာ မှတ်တမ်းများပါဝင်ကြသည်။ ထိုခေတ် အခါက ရောမနိုင်ငံတော်၏ အမွေအနှစ်နေရာများတွင် ဥရောပတိုက်တွင် ဥရောပတိုက်တွင် ဥပဒေစနစ်နှစ်ခု အဖြစ် စေလွှတ်သော ပြဋ္ဌာန်းသတ်ရန် ဥပဒေစနစ်ကို ပြန်လည်ပြုစုခြင်းကို '}]
```

### alternative ways

```python

model = GPT2LMHeadModel.from_pretrained("jojo-ai-mst/MyanmarGPT")
tokenizer = GPT2Tokenizer.from_pretrained("jojo-ai-mst/MyanmarGPT")

def generate_text(prompt, max_length=300, temperature=0.8, top_k=50):
    input_ids = tokenizer.encode(prompt, return_tensors="pt").cuda()
    output = model.generate(
        input_ids,
        max_length=max_length,
        temperature=temperature,
        top_k=top_k,
        pad_token_id=tokenizer.eos_token_id,
        do_sample=True
    )
    for result in output:
      generated_text = tokenizer.decode(result, skip_special_tokens=True)
      print(generated_text)

generate_text("အီတလီ ")
```


## Here are the guidelines for using the MyanmarGPT license,
- MyanmarGPT is free to use for everyone,
  
- **Must Do**
  - any project derived/finetuned from MyanmarGPT, used MyanmarGPT internally,
  - or modified MyanmarGPT, related to MyanmarGPT **must mention the citation below** in the corresponding project's page.
- the citation
```latex
@software{MyanmarGPT,
  author = {{MinSiThu}},
  title = {MyanmarGPT},
  version={1.1-SweptWood}
  url = {https://huggingface.co/jojo-ai-mst/MyanmarGPT},
  urldate = {2023-12-14}
  date = {2023-12-14},
}
```

For contact, reach me via [https://www.linkedin.com/in/min-si-thu/](https://www.linkedin.com/in/min-si-thu/)