tianyuz commited on
Commit
1fae383
1 Parent(s): 8e2be84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -17,9 +17,9 @@ datasets:
17
 
18
  ![rinna-icon](./rinna.png)
19
 
20
- This repository provides a medium-sized Japanese GPT-2 model trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz). The model is provided by [rinna](https://corp.rinna.co.jp/).
21
 
22
- # Use the model
23
 
24
  *NOTE:* Use `T5Tokenizer` to initiate the tokenizer.
25
 
@@ -27,6 +27,19 @@ This repository provides a medium-sized Japanese GPT-2 model trained on [Japanes
27
  from transformers import T5Tokenizer, AutoModelForCausalLM
28
 
29
  tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
 
30
 
31
  model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
32
- ~~~~
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ![rinna-icon](./rinna.png)
19
 
20
+ This repository provides a medium-sized Japanese GPT-2 model. The model is provided by [rinna](https://corp.rinna.co.jp/).
21
 
22
+ # How to use the model
23
 
24
  *NOTE:* Use `T5Tokenizer` to initiate the tokenizer.
25
 
27
  from transformers import T5Tokenizer, AutoModelForCausalLM
28
 
29
  tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
30
+ tokenizer.do_lower_case = True # due to some bug of tokenizer config loading
31
 
32
  model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
33
+ ~~~~
34
+
35
+ # Model architecture
36
+ A 24-layer, 1024-hidden-size transformer-based language model.
37
+
38
+ # Training
39
+ The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz) to optimize a traditional language modelling objective on 8\*V100 GPUs for around 30 days. It reaches around 18 perplexity on a chosen validation set from the same data.
40
+
41
+ # Tokenization
42
+ The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer, the vocabulary is also directly adopted from the pre-trained tokenizer in the [link](https://github.com/google/sentencepiece).
43
+
44
+ # Licenese
45
+ [The MIT license](https://opensource.org/licenses/MIT)