Billsfriend
/

chinese-Alpaca-7b-plus-ggml-q8_0

Model card Files Files and versions Community

Billsfriend commited on May 12, 2023

Commit

1bd4f52

1 Parent(s): 09e72f3

Update README.md

Browse files

Files changed (1) hide show

README.md +71 -0

README.md CHANGED Viewed

@@ -1,3 +1,74 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+This model is converted from `decapoda-research/llama-7b-hf` to `ziqingyang/chinese-alpaca-plus-lora-7b` and quantized for use with `ggerganov/llama.cpp`.
+The convertion and quantization is done on Google Colab following Wiki article of `ymcui/Chinese-LLaMA-Alpaca`.
+The quantization methods have been updated for llama.cpp, so please cloning the latest repo and re-compile before loading the model.
+The q8_0 and q5_1 indicate for different quantization method, the former one occupies larger space and theoratically produces better response.
+Example of q8_0 model on llama.cpp (acceptable responses but very short):
+```
+> ./main -m ./models/chinese-Alpaca-7b-plus-ggml-q8_0.bin \
+-t 8 \
+-c 2048 \
+-n 2048 \
+--color \
+--interactive-first \
+--reverse-prompt '## 人类：' \
+-f ./prompts/chat-with-vicuna-chs.txt
+main: build = 0 (unknown)
+main: seed  = 1683883289
+llama.cpp: loading model from ./models/chinese-Alpaca-7b-plus-ggml-q8_0.bin
+llama_model_load_internal: format     = ggjt v1 (latest)
+llama_model_load_internal: n_vocab    = 49954
+llama_model_load_internal: n_ctx      = 2048
+llama_model_load_internal: n_embd     = 4096
+llama_model_load_internal: n_mult     = 256
+llama_model_load_internal: n_head     = 32
+llama_model_load_internal: n_layer    = 32
+llama_model_load_internal: n_rot      = 128
+llama_model_load_internal: ftype      = 7 (mostly Q8_0)
+llama_model_load_internal: n_ff       = 11008
+llama_model_load_internal: n_parts    = 1
+llama_model_load_internal: model size = 7B
+llama_model_load_internal: ggml ctx size =  68.20 KB
+llama_model_load_internal: mem required  = 9180.13 MB (+ 1026.00 MB per state)
+llama_init_from_file: kv self size  = 1024.00 MB
+system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
+main: interactive mode on.
+Reverse prompt: '## 人类：'
+sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
+generate: n_ctx = 2048, n_batch = 512, n_predict = 2048, n_keep = 0
+== Running in interactive mode. ==
+ - Press Ctrl+C to interject at any time.
+ - Press Return to return control to LLaMa.
+ - To return control without starting a new line, end your input with '/'.
+ - If you want to submit another line, end your input with '\'.
+ 这是一段人类和人工智能助理的对话。人工智能助理会对人类的问题给出详细、准确、礼貌的回答。
+## 人类：你好，助理。
+## 助理：你好，有什么需要帮助的吗？
+## 人类：天空为什么是蓝色的？
+助理：天空之所以呈现蓝色，是因为太阳光在大气层中发生散射作用。
+## 人类：请再详细地解释一下。
+助理：太阳光穿过大气层时，一部分被散射到不同波长的角度上，其中蓝色光的波长比其他颜色短。
+## 人类：你真是惜字如金啊。
+助理：谢谢你的夸奖，但我只是根据我学到的知识和算法来回答你的问题。
+## 人类：
+llama_print_timings:        load time =  9418.31 ms
+llama_print_timings:      sample time =   107.95 ms /    73 runs   (    1.48 ms per run)
+llama_print_timings: prompt eval time =  8645.76 ms /    85 tokens (  101.71 ms per token)
+llama_print_timings:        eval time = 16303.43 ms /    73 runs   (  223.33 ms per run)
+llama_print_timings:       total time = 987546.29 ms
+```