Simbolo commited on
Commit
ce6e15b
1 Parent(s): 6d3bc6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -12,7 +12,7 @@ tags:
12
  - pre-trained
13
  ---
14
 
15
- The Simbolo's Myanmarsar-GPT symbol is trained on a dataset of 100,000 Burmese data and pre-trained using the GPT-2 architecture. Its purpose is to serve as a foundational pre-trained model for the Burmese language, facilitating fine-tuning for specific applications of different tasks such as creative writing, chatbot, machine translation etc.
16
 
17
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6598b82502c4796342239a35/rFId3-xyzWW-juDq_er9k.jpeg)
18
 
@@ -33,7 +33,7 @@ output = model.generate(input_ids, max_length=50)
33
  print(tokenizer.decode(output[0], skip_special_tokens=True))
34
  ```
35
  ### Data
36
- The [data](https://huggingface.co/datasets/Simbolo-Servicio/wiki-burmese-sentences) utilized comprises 100,000 sentences sourced from Wikipedia.
37
 
38
  ### Contributors
39
  Main Contributor: [Sa Phyo Thu Htet](https://github.com/SaPhyoThuHtet)
 
12
  - pre-trained
13
  ---
14
 
15
+ The Simbolo's Myanmarsar-GPT symbol is trained on a dataset of 20,000 Burmese data and pre-trained using the GPT-2 architecture. Its purpose is to serve as a foundational pre-trained model for the Burmese language, facilitating fine-tuning for specific applications of different tasks such as creative writing, chatbot, machine translation etc.
16
 
17
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6598b82502c4796342239a35/rFId3-xyzWW-juDq_er9k.jpeg)
18
 
 
33
  print(tokenizer.decode(output[0], skip_special_tokens=True))
34
  ```
35
  ### Data
36
+ We use 20,000 Burmese sentences from our open source [data](https://huggingface.co/datasets/Simbolo-Servicio/wiki-burmese-sentences) which contains 100,000 sentences sourced from Wikipedia.
37
 
38
  ### Contributors
39
  Main Contributor: [Sa Phyo Thu Htet](https://github.com/SaPhyoThuHtet)