tianyuz commited on
Commit
2140541
1 Parent(s): 60ed51e

update readme

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -46,11 +46,19 @@ This repository provides a Japanese GPT-NeoX model of 3.6 billion parameters. Th
46
  The RL data is the subset of the following dataset and has been translated into Japanese.
47
  * [Anthropic HH RLHF data](https://huggingface.co/datasets/Anthropic/hh-rlhf)
48
 
 
 
 
 
 
 
 
 
 
49
  * **Authors**
50
 
51
  [Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
52
 
53
-
54
  # Limitations
55
  * We found this verison of PPO model tends to generate repeated text more often than its SFT counterpart, and thus we set `repetition_penalty=1.1` for better generation performance. (*The same generation hyper-parameters are applied to the SFT model in aforementioned evaluation experiments.*) You can also explore other hyperparameter combinations that yield higher generation randomness/diversity for better generation quality, e.g. `temperature=0.9, repetition_penalty=1.0`.
56
 
 
46
  The RL data is the subset of the following dataset and has been translated into Japanese.
47
  * [Anthropic HH RLHF data](https://huggingface.co/datasets/Anthropic/hh-rlhf)
48
 
49
+ * **Model Series**
50
+
51
+ | Variant | Link |
52
+ | :-- | :--|
53
+ | 3.6B PPO | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo |
54
+ | 3.6B SFT-v2 | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft-v2 |
55
+ | 3.6B SFT | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft |
56
+ | 3.6B pretrained | https://huggingface.co/rinna/japanese-gpt-neox-3.6b |
57
+
58
  * **Authors**
59
 
60
  [Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
61
 
 
62
  # Limitations
63
  * We found this verison of PPO model tends to generate repeated text more often than its SFT counterpart, and thus we set `repetition_penalty=1.1` for better generation performance. (*The same generation hyper-parameters are applied to the SFT model in aforementioned evaluation experiments.*) You can also explore other hyperparameter combinations that yield higher generation randomness/diversity for better generation quality, e.g. `temperature=0.9, repetition_penalty=1.0`.
64