Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,9 @@ datasets:
|
|
17 |
|
18 |
![rinna-icon](./rinna.png)
|
19 |
|
20 |
-
This repository provides a medium-sized Japanese GPT-2 model
|
21 |
|
22 |
-
#
|
23 |
|
24 |
*NOTE:* Use `T5Tokenizer` to initiate the tokenizer.
|
25 |
|
@@ -27,6 +27,19 @@ This repository provides a medium-sized Japanese GPT-2 model trained on [Japanes
|
|
27 |
from transformers import T5Tokenizer, AutoModelForCausalLM
|
28 |
|
29 |
tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
|
|
|
30 |
|
31 |
model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
|
32 |
-
~~~~
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
![rinna-icon](./rinna.png)
|
19 |
|
20 |
+
This repository provides a medium-sized Japanese GPT-2 model. The model is provided by [rinna](https://corp.rinna.co.jp/).
|
21 |
|
22 |
+
# How to use the model
|
23 |
|
24 |
*NOTE:* Use `T5Tokenizer` to initiate the tokenizer.
|
25 |
|
|
|
27 |
from transformers import T5Tokenizer, AutoModelForCausalLM
|
28 |
|
29 |
tokenizer = T5Tokenizer.from_pretrained("rinna/japanese-gpt2-medium")
|
30 |
+
tokenizer.do_lower_case = True # due to some bug of tokenizer config loading
|
31 |
|
32 |
model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
|
33 |
+
~~~~
|
34 |
+
|
35 |
+
# Model architecture
|
36 |
+
A 24-layer, 1024-hidden-size transformer-based language model.
|
37 |
+
|
38 |
+
# Training
|
39 |
+
The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz) to optimize a traditional language modelling objective on 8\*V100 GPUs for around 30 days. It reaches around 18 perplexity on a chosen validation set from the same data.
|
40 |
+
|
41 |
+
# Tokenization
|
42 |
+
The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer, the vocabulary is also directly adopted from the pre-trained tokenizer in the [link](https://github.com/google/sentencepiece).
|
43 |
+
|
44 |
+
# Licenese
|
45 |
+
[The MIT license](https://opensource.org/licenses/MIT)
|