tianyuz commited on
Commit
086c08d
1 Parent(s): b826dbc
Files changed (7) hide show
  1. README.md +159 -0
  2. config.json +30 -0
  3. pytorch_model.bin +3 -0
  4. rinna.png +0 -0
  5. spiece.model +3 -0
  6. spiece.vocab +0 -0
  7. tokenizer_config.json +1 -0
README.md CHANGED
@@ -1,3 +1,162 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
3
  license: mit
4
+ datasets:
5
+ - mc4
6
+ - cc100
7
+ - wikipedia
8
+ - EleutherAI/pile
9
+ - togethercomputer/RedPajama-Data-1T
10
+ language:
11
+ - ja
12
+ - en
13
+ inference: false
14
  ---
15
+
16
+ # bilingual-gpt-neox-4b-8k
17
+
18
+ ![rinna-icon](./rinna.png)
19
+
20
+ # Overview
21
+
22
+ **Notice: This model requires `transformers>=4.31.0` to work properly.**
23
+
24
+ This repository provides an English-Japanese bilingual GPT-NeoX model of 3.8 billion parameters.
25
+
26
+ We extend [`rinna/bilingual-gpt-neox-4b`](https://huggingface.co/rinna/bilingual-gpt-neox-4b)'s context length from 2048 to 8192 by fine-tuning on 1.5B extra tokens using [RoPE positional interpolation](https://arxiv.org/abs/2306.15595).
27
+
28
+ * **Library**
29
+
30
+ The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
31
+
32
+ * **Model architecture**
33
+
34
+ A 36-layer, 2816-hidden-size transformer-based language model.
35
+
36
+ * **Fine-tuning**
37
+
38
+ The model was trained on long sequences (longer than 4000 tokens) sampled from its pre-training corpora as follows. The fine-tuning data contains **1.5B** tokens in total.
39
+
40
+ - [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz)
41
+ - [Japanese C4](https://huggingface.co/datasets/mc4)
42
+ - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
43
+ - [Redpajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
44
+ - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
45
+
46
+ * **Model Series**
47
+
48
+ | Variant | Link |
49
+ | :-- | :--|
50
+ | Bilingual 4B MiniGPT4 | https://huggingface.co/rinna/bilingual-gpt-neox-4b-minigpt4 |
51
+ | Bilingual 4B SFT | https://huggingface.co/rinna/bilingual-gpt-neox-4b-instruction-sft |
52
+ | Bilingual 4B 8K | https://huggingface.co/rinna/bilingual-gpt-neox-4b-8k |
53
+ | Bilingual 4B | https://huggingface.co/rinna/bilingual-gpt-neox-4b |
54
+ | Japanese 3.6B PPO | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo |
55
+ | Japanese 3.6B SFT-v2 | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft-v2 |
56
+ | Japanese 3.6B SFT | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft |
57
+ | Japanese 3.6B | https://huggingface.co/rinna/japanese-gpt-neox-3.6b |
58
+
59
+ * **Authors**
60
+
61
+ [Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
62
+
63
+
64
+ # How to use the model
65
+
66
+ **Notice:** Since the model is **sensitive to decoding hyper-parameters** (e.g. `temperature`, `top_p`, `top_k`, `repetition_penalty`), it is suggested to explore the best setting for your task.
67
+
68
+ ~~~~python
69
+ import torch
70
+ from transformers import AutoTokenizer, AutoModelForCausalLM
71
+
72
+ tokenizer = AutoTokenizer.from_pretrained("rinna/bilingual-gpt-neox-4b-8k", use_fast=False)
73
+ model = AutoModelForCausalLM.from_pretrained("rinna/bilingual-gpt-neox-4b-8k")
74
+
75
+ if torch.cuda.is_available():
76
+ model = model.to("cuda")
77
+
78
+ text = "Socrates says"
79
+ token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
80
+
81
+ with torch.no_grad():
82
+ output_ids = model.generate(
83
+ token_ids.to(model.device),
84
+ max_new_tokens=4000,
85
+ min_new_tokens=4000,
86
+ do_sample=True,
87
+ temperature=1.0,
88
+ top_p=0.95,
89
+ pad_token_id=tokenizer.pad_token_id,
90
+ bos_token_id=tokenizer.bos_token_id,
91
+ eos_token_id=tokenizer.eos_token_id
92
+ )
93
+
94
+ output = tokenizer.decode(output_ids.tolist()[0])
95
+ print(output)
96
+ """
97
+ Socrates says that he is not a bad man because the people of his city-state want to kill him. For a just man, if someone gives them their life over, they will lose it by violence. If this happens at the hands of another, that person will be as bad as Plato's Socratic slave, and Socrates will suffer accordingly (B 134 ff).
98
+
99
+ The Stranger's final remark concerns the distinction between knowledge and wisdom. While the Stranger seems to claim that all people can learn moral lessons through observation of how the world actually works, Socrates responds by saying:
100
+
101
+ "What? Am I a skilful painter?" I replied [to his question] (499). "No, indeed I am not, Socrates; but you are one who knows how to paint. You have painted a little picture and I know nothing about art. In this respect what do I know or can learn from you?" (D 1015)
102
+
103
+ Socrates suggests that it makes sense to define the knowledge required of a good person as any skill which we can acquire by observing real behavior. However, there appears to be a problem in this definition: it seems unlikely that everyone can have such a skill. Certainly, if he were able to see the actions of other people, he would understand how we should act, even though his own response to these actions would not necessarily satisfy moral rules. Even less sophisticated people might reasonably conclude that their own actions must conform with accepted moral standards of behavior. Hence, it seems that all people, at least some of us, need _some_ form of training.
104
+
105
+ ## **The nature of education and character**
106
+
107
+ Having set out our ideas of justice and virtue, and the ways in which they relate to political society, Socrates now brings the story of his pupil Phaedrus to a close. He tells Phaedrus that "my teaching you is as simple as that of your own body. If you were to lay it out for yourself, you would not discover its form" (B 287–8). The two men had originally been introduced as students undertaking an exercise called pedagogy. Now, however, Socrates has presented Phaedrus with the idea that his task involves making predictions concerning events yet to come (B 169). A better understanding of these events will be more useful than mere memorization. To achieve this purpose, the young philosopher must be careful not to waste his time doing the unnecessary things that ordinary humans tend to do.
108
+
109
+ Socrates asks Phaedrus whether a good philosopher really needs to do no work. The answer given is "yes", meaning that he does not need to study the classics and develop a philosophical tradition in order to make himself a good person, nor to go through a lengthy course of philosophy and other education. Rather, he should simply practice being an active, creative, and imaginative thinker ( _eikasōma_ ). Such persons are well qualified to judge situations on their own terms, rather than on those provided by tradition (B 296). Once again, Socrates emphasizes the difference between the intellectual achievements which follow naturally from education and those which require intellectual effort alone.
110
+
111
+ When asked whether this sort of education can produce a good man, Socrates replies in the affirmative:
112
+
113
+ "Surely it would appear impossible that someone could attain the most important parts of wisdom, unless he was a student of human affairs" (B 364). Socrates also points out that having been educated properly helps a person to make good choices when faced with difficult decisions:
114
+
115
+ So for this same reason, if you did not take up your craft with me, that is, your profession, when you were young, you would not be a fit person to judge how you ought to vote; because you would not consider each thing in accordance with its true nature" (B 366).
116
+
117
+ As Plato often asserts throughout the _Apology_, Socrates regards learning as essential to the acquisition of wisdom but education can never substitute for the inborn capacities of a child. This is not to say that children lack wisdom or that they cannot mature. Indeed, Socrates explains that education is sometimes needed even by individuals who can solve problems for themselves (B 343–67), and Socrates later refers to this activity (C 738 ff) as _technēsēs_. However, there is always something special about childhood initiating certain capacities. We usually give up the right to participate in education at puberty so as to prepare us for adult life, for example, without being informed that our bodies and intelligence can also grow old (B 1165–70).
118
+
119
+ ## **Socrates's defence of democracy and Socratic method**
120
+
121
+ Following a lengthy description of Socrates's educational programme, Plato moves directly into the matter of democratic politics and citizenship in Book III. On the first day of the trial, Socrates takes up the theme of democracy once again:
122
+
123
+ "For you are looking for this thing, my friends, that is to say, the good citizenship to which every person stands entitled" (389).
124
+
125
+ Before continuing, Socrates introduces three principles that he believes form the very heart of good citizenship: the good gods, respect for nature, and love of beauty. Socrates describes these principles in various ways:
126
+
127
+ 1. All citizens of a democracy are expected to behave honourably (390). The citizen should avoid doing anything harmful (to others or to himself) and everything good. There is therefore no way to avoid acting dishonourably (391); but no one can avoid harming himself, for his actions will harm the community as a whole (392–5).
128
+
129
+ 2. Each individual is equally in a position of power and authority, and this means that the citizens must share responsibility for the good government of the state (395).
130
+
131
+ 3. Respect for nature means that citizens will observe that both laws of nature and the opinions of other people control their actions, so that they must choose between the best available alternatives. Anyone who fails to adopt reasoned opinion will be wrong in principle (399). This entails that citizens will have to choose among the best policies that prevail within the community (ibid.).
132
+
133
+ So, while the citizens will have authority and power, this only exists so long as the laws and opinions of which they approve prevail in general over those of which they disapprove. The only way they can get any power at all over their fellow-citizens is either through punishment, or through elections. These provide the means by which citizens can express their approval of a policy or disapproval of a policy. The latter occurs when citizens elect the individuals responsible for making the laws.
134
+
135
+ While democracy may be described as a'mixed' government, it is not possible for citizens to choose those whom they wish to vote for (399). Instead, they decide who should have a voice. Those elected speak for themselves, they do not listen to the advice of their colleagues, and ultimately the result will be chosen by the people themselves (399–401). Once again, Socrates is clearly trying to convince his interrogators that the best interests of the city-state depend on giving a larger voice to the public in running its affairs.
136
+
137
+ ## **Plato's reply to Socrates**
138
+
139
+ Plato's rejoinder shows his great skill in dialogue. He presents the argument in familiar forms: analogy, discussion, and so on. Although Socrates makes some valid points at times along the way, he usually finds reasons for disagreeing with the arguments that he offers to support his claims. As he repeatedly does throughout Book II, the Stranger then uses Socrates's own words against him. To begin with, the Stranger dismisses the claim that each person
140
+
141
+ ...
142
+
143
+ """
144
+ ~~~~
145
+
146
+ ---
147
+
148
+ # Tokenization
149
+ The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer.
150
+ * The tokenizer has a vocabulary size of 65,536.
151
+ * It uses *byte fallback* to decompose unknown text pieces into UTF-8 byte pieces to avoid producing `<UNK>` tokens.
152
+ * It can recognize *consecutive whitespaces*, *newlines*, and *tabs* to handle structured texts better.
153
+ * We turned off the default behaviour of prepending leading whitespace because it is not beneficial for processing Japanese.
154
+ * Specifically, single whitespace is always processed as one token so that any English word won't have a preceding whitespace like in many other tokenizers (e.g. `_Hello`).
155
+ * This decision trades the English processing efficiency for a unified way to treat whitespaces.
156
+ * It leads to a significantly lower loss of next token prediction on English data because whitespaces are easy to predict.
157
+ * **Don't forget to set `use_fast=False` to make the above features function correctly.**
158
+
159
+ ---
160
+
161
+ # Licenese
162
+ [The MIT license](https://opensource.org/licenses/MIT)
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "GPTNeoXForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.1,
6
+ "bos_token_id": 2,
7
+ "classifier_dropout": 0.1,
8
+ "eos_token_id": 3,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout": 0.1,
11
+ "hidden_size": 2816,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 11264,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 2048,
16
+ "model_type": "gpt_neox",
17
+ "num_attention_heads": 22,
18
+ "num_hidden_layers": 36,
19
+ "rope_scaling": {
20
+ "type": "linear",
21
+ "factor": 4.0
22
+ },
23
+ "rotary_emb_base": 10000,
24
+ "rotary_pct": 1.0,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "float16",
27
+ "use_cache": true,
28
+ "use_parallel_residual": false,
29
+ "vocab_size": 65536
30
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3008a1182a61bd628c0c5f656e426f31d4e6820fc567eab07e5c197acc3bd557
3
+ size 7743419069
rinna.png ADDED
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85a0205d37a98bb3b97cf4ca3f507c78873cf8f6cefa3b51d8d6a15006dc889d
3
+ size 1341798
spiece.vocab ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eos_token": "</s>", "unk_token": "[UNK]", "pad_token": "[PAD]", "extra_ids": 0, "additional_special_tokens": [], "sp_model_kwargs": {}, "bos_token": "<s>", "cls_token": "[CLS]", "sep_token": "[SEP]", "mask_token": "[MASK]", "do_lower_case": false, "tokenizer_class": "T5Tokenizer"}