Simbolo commited on
Commit
6341760
1 Parent(s): 38eaf08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -23,13 +23,16 @@ input_text = ""
23
  input_ids = tokenizer.encode(input_text, return_tensors='pt')
24
  output = model.generate(input_ids, max_length=100)
25
  print(tokenizer.decode(output[0], skip_special_tokens=True))
 
26
  ```
 
 
27
 
28
  ### Limitations and bias
29
  We have yet to thoroughly investigate the potential bias inherent in this model. Regarding transparency, it's important to note that the model is primarily trained on data from the Unicode Burmese(Myanmar) language.
30
 
31
-
32
-
33
 
34
 
35
 
 
23
  input_ids = tokenizer.encode(input_text, return_tensors='pt')
24
  output = model.generate(input_ids, max_length=100)
25
  print(tokenizer.decode(output[0], skip_special_tokens=True))
26
+
27
  ```
28
+ ### Data
29
+ The data utilized comprises 1 million sentences sourced from Wikipedia.
30
 
31
  ### Limitations and bias
32
  We have yet to thoroughly investigate the potential bias inherent in this model. Regarding transparency, it's important to note that the model is primarily trained on data from the Unicode Burmese(Myanmar) language.
33
 
34
+ ### References and Citations
35
+ Jiang, Shengyi & Huang, Xiuwen & Cai, Xiaonan & Lin, Nankai. (2021). Pre-trained Models and Evaluation Data for the Myanmar Language. 10.1007/978-3-030-92310-5_52.
36
 
37
 
38