File size: 1,136 Bytes
99907ef
 
dcf5b2f
 
 
99907ef
dcf5b2f
2cb28eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38eaf08
2cb28eb
 
38eaf08
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: mit
language:
- my
pipeline_tag: text-generation
---

The Simbolo's Myanmar SAR GPT symbol is trained on a dataset of 1 million Burmese data and pre-trained using the GPT-2 architecture. Its purpose is to serve as a foundational pre-trained model for the Burmese language, facilitating fine-tuning for specific applications of different tasks such as creative writing, chatbot, machine translation etc.



### How to use

```python
!pip install transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Simbolo-Servicio/myanmar-sar-gpt")
model = AutoModelForCausalLM.from_pretrained("Simbolo-Servicio/myanmar-sar-gpt")

input_text = ""
input_ids = tokenizer.encode(input_text, return_tensors='pt')
output = model.generate(input_ids, max_length=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

### Limitations and bias
We have yet to thoroughly investigate the potential bias inherent in this model. Regarding transparency, it's important to note that the model is primarily trained on data from the Unicode Burmese(Myanmar) language.