Billsfriend
commited on
Commit
•
1bd4f52
1
Parent(s):
09e72f3
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,74 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
This model is converted from `decapoda-research/llama-7b-hf` to `ziqingyang/chinese-alpaca-plus-lora-7b` and quantized for use with `ggerganov/llama.cpp`.
|
6 |
+
|
7 |
+
The convertion and quantization is done on Google Colab following Wiki article of `ymcui/Chinese-LLaMA-Alpaca`.
|
8 |
+
|
9 |
+
The quantization methods have been updated for llama.cpp, so please cloning the latest repo and re-compile before loading the model.
|
10 |
+
|
11 |
+
The q8_0 and q5_1 indicate for different quantization method, the former one occupies larger space and theoratically produces better response.
|
12 |
+
|
13 |
+
Example of q8_0 model on llama.cpp (acceptable responses but very short):
|
14 |
+
|
15 |
+
```
|
16 |
+
> ./main -m ./models/chinese-Alpaca-7b-plus-ggml-q8_0.bin \
|
17 |
+
-t 8 \
|
18 |
+
-c 2048 \
|
19 |
+
-n 2048 \
|
20 |
+
--color \
|
21 |
+
--interactive-first \
|
22 |
+
--reverse-prompt '## 人类:' \
|
23 |
+
-f ./prompts/chat-with-vicuna-chs.txt
|
24 |
+
main: build = 0 (unknown)
|
25 |
+
main: seed = 1683883289
|
26 |
+
llama.cpp: loading model from ./models/chinese-Alpaca-7b-plus-ggml-q8_0.bin
|
27 |
+
llama_model_load_internal: format = ggjt v1 (latest)
|
28 |
+
llama_model_load_internal: n_vocab = 49954
|
29 |
+
llama_model_load_internal: n_ctx = 2048
|
30 |
+
llama_model_load_internal: n_embd = 4096
|
31 |
+
llama_model_load_internal: n_mult = 256
|
32 |
+
llama_model_load_internal: n_head = 32
|
33 |
+
llama_model_load_internal: n_layer = 32
|
34 |
+
llama_model_load_internal: n_rot = 128
|
35 |
+
llama_model_load_internal: ftype = 7 (mostly Q8_0)
|
36 |
+
llama_model_load_internal: n_ff = 11008
|
37 |
+
llama_model_load_internal: n_parts = 1
|
38 |
+
llama_model_load_internal: model size = 7B
|
39 |
+
llama_model_load_internal: ggml ctx size = 68.20 KB
|
40 |
+
llama_model_load_internal: mem required = 9180.13 MB (+ 1026.00 MB per state)
|
41 |
+
llama_init_from_file: kv self size = 1024.00 MB
|
42 |
+
|
43 |
+
system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
|
44 |
+
main: interactive mode on.
|
45 |
+
Reverse prompt: '## 人类:'
|
46 |
+
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
|
47 |
+
generate: n_ctx = 2048, n_batch = 512, n_predict = 2048, n_keep = 0
|
48 |
+
|
49 |
+
|
50 |
+
== Running in interactive mode. ==
|
51 |
+
- Press Ctrl+C to interject at any time.
|
52 |
+
- Press Return to return control to LLaMa.
|
53 |
+
- To return control without starting a new line, end your input with '/'.
|
54 |
+
- If you want to submit another line, end your input with '\'.
|
55 |
+
|
56 |
+
这是一段人类和人工智能助理的对话。人工智能助理会对人类的问题给出详细、准确、礼貌的回答。
|
57 |
+
|
58 |
+
## 人类:你好,助理。
|
59 |
+
## 助理:你好,有什么需要帮助的吗?
|
60 |
+
## 人类:天空为什么是蓝色的?
|
61 |
+
助理:天空之所以呈现蓝色,是因为太阳光在大气层中发生散射作用。
|
62 |
+
## 人类:请再详细地解释一下。
|
63 |
+
助理:太阳光穿过大气层时,一部分被散射到不同波长的角度上,其中蓝色光的波长比其他颜色短。
|
64 |
+
## 人类:你真是惜字如金啊。
|
65 |
+
助理:谢谢你的夸奖,但我只是根据我学到的知识和算法来回答你的问题。
|
66 |
+
## 人类:
|
67 |
+
|
68 |
+
llama_print_timings: load time = 9418.31 ms
|
69 |
+
llama_print_timings: sample time = 107.95 ms / 73 runs ( 1.48 ms per run)
|
70 |
+
llama_print_timings: prompt eval time = 8645.76 ms / 85 tokens ( 101.71 ms per token)
|
71 |
+
llama_print_timings: eval time = 16303.43 ms / 73 runs ( 223.33 ms per run)
|
72 |
+
llama_print_timings: total time = 987546.29 ms
|
73 |
+
```
|
74 |
+
|