jed351 commited on
Commit
391f513
1 Parent(s): 903e918

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -35,7 +35,19 @@ The tool can be found [here](https://github.com/ayaka14732/lihkg-scraper).
35
  Please also check out the [Bart model](https://huggingface.co/Ayaka/bart-base-cantonese) created by her.
36
 
37
 
38
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  Please refer to the [script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling)
41
  provided by Huggingface.
@@ -44,8 +56,6 @@ provided by Huggingface.
44
  The model was trained for 400,000 steps with batch size 5 (~2epoches) on 2 NVIDIA Quadro RTX6000 for around 40 hours at the Research Computing Services of Imperial College London.
45
 
46
 
47
-
48
-
49
  ### How to use it?
50
  ```
51
  from transformers import AutoTokenizer
@@ -62,6 +72,7 @@ string = output[0]['generated_text'].replace(' ', '')
62
  print(string)
63
  ```
64
 
 
65
  ### Framework versions
66
 
67
  - Transformers 4.26.0.dev0
 
35
  Please also check out the [Bart model](https://huggingface.co/Ayaka/bart-base-cantonese) created by her.
36
 
37
 
38
+
39
+ ### Limitations
40
+ The model was trained on ~10GB of data scrapped from LIHKG.
41
+ It might contain violent and rude languages so as the text generated by the model.
42
+ Please do not use it for anything other than research or entertainment.
43
+
44
+
45
+ The comments on LIHKG also tend to be very short.
46
+ Thus the model cannot generate anything more than a line. In a lot of occasions might not even generate new tokens.
47
+
48
+
49
+
50
+ ### Training procedure
51
 
52
  Please refer to the [script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling)
53
  provided by Huggingface.
 
56
  The model was trained for 400,000 steps with batch size 5 (~2epoches) on 2 NVIDIA Quadro RTX6000 for around 40 hours at the Research Computing Services of Imperial College London.
57
 
58
 
 
 
59
  ### How to use it?
60
  ```
61
  from transformers import AutoTokenizer
 
72
  print(string)
73
  ```
74
 
75
+
76
  ### Framework versions
77
 
78
  - Transformers 4.26.0.dev0