Upload README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ license_name: yi-license
|
|
7 |
model_creator: 01-ai
|
8 |
model_name: Yi 34B
|
9 |
model_type: yi
|
10 |
-
prompt_template: '{prompt}
|
11 |
|
12 |
'
|
13 |
quantized_by: TheBloke
|
@@ -70,13 +70,13 @@ Here is an incomplete list of clients and libraries that are known to support GG
|
|
70 |
<!-- repositories-available end -->
|
71 |
|
72 |
<!-- prompt-template start -->
|
73 |
-
## Prompt template:
|
74 |
|
75 |
```
|
76 |
Human: {prompt} Assistant:
|
77 |
|
78 |
```
|
79 |
-
|
80 |
<!-- prompt-template end -->
|
81 |
|
82 |
|
@@ -192,7 +192,7 @@ Windows Command Line users: You can set the environment variable by running `set
|
|
192 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
193 |
|
194 |
```shell
|
195 |
-
./main -ngl 32 -m yi-34b.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
|
196 |
```
|
197 |
|
198 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
@@ -295,13 +295,19 @@ And thank you again to a16z for their generous grant.
|
|
295 |
|
296 |
The **Yi** series models are large language models trained from scratch by
|
297 |
developers at [01.AI](https://01.ai/). The first public release contains two
|
298 |
-
bilingual(English/Chinese) base models with the parameter sizes of 6B
|
299 |
-
Both of them are trained
|
300 |
-
during inference time.
|
|
|
|
|
|
|
301 |
|
302 |
## News
|
303 |
|
304 |
-
- 🎯 **2023/11/
|
|
|
|
|
|
|
305 |
|
306 |
|
307 |
## Model Performance
|
@@ -318,8 +324,9 @@ during inference time.
|
|
318 |
| Aquila-34B | 67.8 | 71.4 | 63.1 | - | - | - | - | - |
|
319 |
| Falcon-180B | 70.4 | 58.0 | 57.8 | 59.0 | 54.0 | 77.3 | 68.8 | 34.0 |
|
320 |
| Yi-6B | 63.2 | 75.5 | 72.0 | 72.2 | 42.8 | 72.3 | 68.7 | 19.8 |
|
321 |
-
|
|
322 |
-
|
|
|
323 |
|
324 |
While benchmarking open-source models, we have observed a disparity between the
|
325 |
results generated by our pipeline and those reported in public sources (e.g.
|
|
|
7 |
model_creator: 01-ai
|
8 |
model_name: Yi 34B
|
9 |
model_type: yi
|
10 |
+
prompt_template: 'Human: {prompt} Assistant:
|
11 |
|
12 |
'
|
13 |
quantized_by: TheBloke
|
|
|
70 |
<!-- repositories-available end -->
|
71 |
|
72 |
<!-- prompt-template start -->
|
73 |
+
## Prompt template: Yi
|
74 |
|
75 |
```
|
76 |
Human: {prompt} Assistant:
|
77 |
|
78 |
```
|
79 |
+
|
80 |
<!-- prompt-template end -->
|
81 |
|
82 |
|
|
|
192 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
193 |
|
194 |
```shell
|
195 |
+
./main -ngl 32 -m yi-34b.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Human: {prompt} Assistant:"
|
196 |
```
|
197 |
|
198 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
295 |
|
296 |
The **Yi** series models are large language models trained from scratch by
|
297 |
developers at [01.AI](https://01.ai/). The first public release contains two
|
298 |
+
bilingual(English/Chinese) base models with the parameter sizes of 6B([`Yi-6B`](https://huggingface.co/01-ai/Yi-6B))
|
299 |
+
and 34B([`Yi-34B`](https://huggingface.co/01-ai/Yi-34B)). Both of them are trained
|
300 |
+
with 4K sequence length and can be extended to 32K during inference time.
|
301 |
+
The [`Yi-6B-200K`](https://huggingface.co/01-ai/Yi-6B-200K)
|
302 |
+
and [`Yi-34B-200K`](https://huggingface.co/01-ai/Yi-34B-200K) are base model with
|
303 |
+
200K context length.
|
304 |
|
305 |
## News
|
306 |
|
307 |
+
- 🎯 **2023/11/06**: The base model of [`Yi-6B-200K`](https://huggingface.co/01-ai/Yi-6B-200K)
|
308 |
+
and [`Yi-34B-200K`](https://huggingface.co/01-ai/Yi-34B-200K) with 200K context length.
|
309 |
+
- 🎯 **2023/11/02**: The base model of [`Yi-6B`](https://huggingface.co/01-ai/Yi-6B) and
|
310 |
+
[`Yi-34B`](https://huggingface.co/01-ai/Yi-34B).
|
311 |
|
312 |
|
313 |
## Model Performance
|
|
|
324 |
| Aquila-34B | 67.8 | 71.4 | 63.1 | - | - | - | - | - |
|
325 |
| Falcon-180B | 70.4 | 58.0 | 57.8 | 59.0 | 54.0 | 77.3 | 68.8 | 34.0 |
|
326 |
| Yi-6B | 63.2 | 75.5 | 72.0 | 72.2 | 42.8 | 72.3 | 68.7 | 19.8 |
|
327 |
+
| Yi-6B-200K | 64.0 | 75.3 | 73.5 | 73.9 | 42.0 | 72.0 | 69.1 | 19.0 |
|
328 |
+
| **Yi-34B** | **76.3** | **83.7** | 81.4 | 82.8 | **54.3** | **80.1** | 76.4 | 37.1 |
|
329 |
+
| Yi-34B-200K | 76.1 | 83.6 | **81.9** | **83.4** | 52.7 | 79.7 | **76.6** | 36.3 |
|
330 |
|
331 |
While benchmarking open-source models, we have observed a disparity between the
|
332 |
results generated by our pipeline and those reported in public sources (e.g.
|