RichardErkhov commited on
Commit
61ca8d2
·
verified ·
1 Parent(s): c9fe020

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +296 -0
README.md ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ youri-7b - GGUF
11
+ - Model creator: https://huggingface.co/rinna/
12
+ - Original model: https://huggingface.co/rinna/youri-7b/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [youri-7b.Q2_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q2_K.gguf) | Q2_K | 2.36GB |
18
+ | [youri-7b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.IQ3_XS.gguf) | IQ3_XS | 2.6GB |
19
+ | [youri-7b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.IQ3_S.gguf) | IQ3_S | 2.75GB |
20
+ | [youri-7b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q3_K_S.gguf) | Q3_K_S | 2.75GB |
21
+ | [youri-7b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.IQ3_M.gguf) | IQ3_M | 2.9GB |
22
+ | [youri-7b.Q3_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q3_K.gguf) | Q3_K | 3.07GB |
23
+ | [youri-7b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q3_K_M.gguf) | Q3_K_M | 3.07GB |
24
+ | [youri-7b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q3_K_L.gguf) | Q3_K_L | 3.35GB |
25
+ | [youri-7b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.IQ4_XS.gguf) | IQ4_XS | 3.4GB |
26
+ | [youri-7b.Q4_0.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q4_0.gguf) | Q4_0 | 3.56GB |
27
+ | [youri-7b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.IQ4_NL.gguf) | IQ4_NL | 3.58GB |
28
+ | [youri-7b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q4_K_S.gguf) | Q4_K_S | 3.59GB |
29
+ | [youri-7b.Q4_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q4_K.gguf) | Q4_K | 3.8GB |
30
+ | [youri-7b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q4_K_M.gguf) | Q4_K_M | 3.8GB |
31
+ | [youri-7b.Q4_1.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q4_1.gguf) | Q4_1 | 3.95GB |
32
+ | [youri-7b.Q5_0.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q5_0.gguf) | Q5_0 | 4.33GB |
33
+ | [youri-7b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q5_K_S.gguf) | Q5_K_S | 4.33GB |
34
+ | [youri-7b.Q5_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q5_K.gguf) | Q5_K | 4.45GB |
35
+ | [youri-7b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q5_K_M.gguf) | Q5_K_M | 4.45GB |
36
+ | [youri-7b.Q5_1.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q5_1.gguf) | Q5_1 | 4.72GB |
37
+ | [youri-7b.Q6_K.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q6_K.gguf) | Q6_K | 5.15GB |
38
+ | [youri-7b.Q8_0.gguf](https://huggingface.co/RichardErkhov/rinna_-_youri-7b-gguf/blob/main/youri-7b.Q8_0.gguf) | Q8_0 | 6.67GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ language:
46
+ - ja
47
+ - en
48
+ license: llama2
49
+ datasets:
50
+ - mc4
51
+ - wikipedia
52
+ - EleutherAI/pile
53
+ - oscar-corpus/colossal-oscar-1.0
54
+ - cc100
55
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
56
+ inference: false
57
+ model-index:
58
+ - name: youri-7b
59
+ results:
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: AI2 Reasoning Challenge (25-Shot)
65
+ type: ai2_arc
66
+ config: ARC-Challenge
67
+ split: test
68
+ args:
69
+ num_few_shot: 25
70
+ metrics:
71
+ - type: acc_norm
72
+ value: 49.06
73
+ name: normalized accuracy
74
+ source:
75
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b
76
+ name: Open LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: HellaSwag (10-Shot)
82
+ type: hellaswag
83
+ split: validation
84
+ args:
85
+ num_few_shot: 10
86
+ metrics:
87
+ - type: acc_norm
88
+ value: 74.89
89
+ name: normalized accuracy
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: MMLU (5-Shot)
98
+ type: cais/mmlu
99
+ config: all
100
+ split: test
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ value: 42.22
106
+ name: accuracy
107
+ source:
108
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b
109
+ name: Open LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: TruthfulQA (0-shot)
115
+ type: truthful_qa
116
+ config: multiple_choice
117
+ split: validation
118
+ args:
119
+ num_few_shot: 0
120
+ metrics:
121
+ - type: mc2
122
+ value: 36.03
123
+ source:
124
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b
125
+ name: Open LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: Winogrande (5-shot)
131
+ type: winogrande
132
+ config: winogrande_xl
133
+ split: validation
134
+ args:
135
+ num_few_shot: 5
136
+ metrics:
137
+ - type: acc
138
+ value: 71.82
139
+ name: accuracy
140
+ source:
141
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b
142
+ name: Open LLM Leaderboard
143
+ - task:
144
+ type: text-generation
145
+ name: Text Generation
146
+ dataset:
147
+ name: GSM8k (5-shot)
148
+ type: gsm8k
149
+ config: main
150
+ split: test
151
+ args:
152
+ num_few_shot: 5
153
+ metrics:
154
+ - type: acc
155
+ value: 8.64
156
+ name: accuracy
157
+ source:
158
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rinna/youri-7b
159
+ name: Open LLM Leaderboard
160
+ ---
161
+
162
+ # `rinna/youri-7b`
163
+
164
+ ![rinna-icon](./rinna.png)
165
+
166
+ # Overview
167
+ We conduct continual pre-training of [llama2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) on **40B** tokens from a mixture of Japanese and English datasets. The continual pre-training significantly improves the model's performance on Japanese tasks.
168
+
169
+ The name `youri` comes from the Japanese word [`妖狸/ようり/Youri`](https://ja.wikipedia.org/wiki/%E5%8C%96%E3%81%91%E7%8B%B8), which is a kind of Japanese mythical creature ([`妖怪/ようかい/Youkai`](https://ja.wikipedia.org/wiki/%E5%A6%96%E6%80%AA)).
170
+
171
+ * **Library**
172
+
173
+ The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
174
+
175
+ * **Model architecture**
176
+
177
+ A 32-layer, 4096-hidden-size transformer-based language model. Refer to the [llama2 paper](https://arxiv.org/abs/2307.09288) for architecture details.
178
+
179
+ * **Continual pre-training**
180
+
181
+ The model was initialized with the [llama2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf) model and continually trained on around **40B** tokens from a mixture of the following corpora
182
+ - [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz)
183
+ - [Japanese C4](https://huggingface.co/datasets/mc4)
184
+ - [Japanese OSCAR](https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0)
185
+ - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
186
+ - [Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
187
+ - rinna curated Japanese dataset
188
+
189
+ * **Contributors**
190
+
191
+ - [Tianyu Zhao](https://huggingface.co/tianyuz)
192
+ - [Akio Kaga](https://huggingface.co/rakaga)
193
+ - [Kei Sawada](https://huggingface.co/keisawada)
194
+
195
+ ---
196
+
197
+ # Benchmarking
198
+
199
+ Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
200
+
201
+ ---
202
+
203
+ # How to use the model
204
+
205
+ ~~~~python
206
+ import torch
207
+ from transformers import AutoTokenizer, AutoModelForCausalLM
208
+
209
+ tokenizer = AutoTokenizer.from_pretrained("rinna/youri-7b")
210
+ model = AutoModelForCausalLM.from_pretrained("rinna/youri-7b")
211
+
212
+ if torch.cuda.is_available():
213
+ model = model.to("cuda")
214
+
215
+ text = "西田幾多郎は、"
216
+ token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
217
+
218
+ with torch.no_grad():
219
+ output_ids = model.generate(
220
+ token_ids.to(model.device),
221
+ max_new_tokens=200,
222
+ min_new_tokens=200,
223
+ do_sample=True,
224
+ temperature=1.0,
225
+ top_p=0.95,
226
+ pad_token_id=tokenizer.pad_token_id,
227
+ bos_token_id=tokenizer.bos_token_id,
228
+ eos_token_id=tokenizer.eos_token_id
229
+ )
230
+
231
+ output = tokenizer.decode(output_ids.tolist()[0])
232
+ print(output)
233
+ """
234
+ 西田幾多郎は、プラトンの復権を主張し、対する従来の西洋哲学は、近代の合理主義哲学に委ね、「従来の哲学は破 壊されてしまった」と述べている。 西田幾多郎は、西洋近代哲学の「徹底的な検討」を拒んだ。それは、「現代的理解の脆弱性を補う筈の、従来のヨーロッパに伝わる哲学的な方法では到底それができなかったからである」とい
235
+ """
236
+ ~~~~
237
+
238
+ ---
239
+
240
+ # Tokenization
241
+ The model uses the original llama-2 tokenizer.
242
+
243
+ ---
244
+
245
+ # How to cite
246
+ ```bibtex
247
+ @misc{rinna-youri-7b,
248
+ title = {rinna/youri-7b},
249
+ author = {Zhao, Tianyu and Kaga, Akio and Sawada, Kei},
250
+ url = {https://huggingface.co/rinna/youri-7b}
251
+ }
252
+
253
+ @inproceedings{sawada2024release,
254
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
255
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
256
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
257
+ month = {5},
258
+ year = {2024},
259
+ pages = {13898--13905},
260
+ url = {https://aclanthology.org/2024.lrec-main.1213},
261
+ note = {\url{https://arxiv.org/abs/2404.01657}}
262
+ }
263
+ ```
264
+ ---
265
+
266
+ # References
267
+ ```bibtex
268
+ @software{gpt-neox-library,
269
+ title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
270
+ author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
271
+ doi = {10.5281/zenodo.5879544},
272
+ month = {8},
273
+ year = {2021},
274
+ version = {0.0.1},
275
+ url = {https://www.github.com/eleutherai/gpt-neox}
276
+ }
277
+ ```
278
+ ---
279
+
280
+ # License
281
+ [The llama2 license](https://ai.meta.com/llama/license/)
282
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
283
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rinna__youri-7b)
284
+
285
+ | Metric |Value|
286
+ |---------------------------------|----:|
287
+ |Avg. |47.11|
288
+ |AI2 Reasoning Challenge (25-Shot)|49.06|
289
+ |HellaSwag (10-Shot) |74.89|
290
+ |MMLU (5-Shot) |42.22|
291
+ |TruthfulQA (0-shot) |36.03|
292
+ |Winogrande (5-shot) |71.82|
293
+ |GSM8k (5-shot) | 8.64|
294
+
295
+
296
+