davidkim205 commited on
Commit
ffa5028
โ€ข
1 Parent(s): dde015d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ko
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - facebook
9
+ - meta
10
+ - pytorch
11
+ - llama
12
+ - llama-2
13
+ - llama-2-chat
14
+ license: apache-2.0
15
+ library_name: peft
16
+ ---
17
+ # komt-Llama-2-13b-hf-ggml
18
+
19
+ https://github.com/davidkim205/komt
20
+
21
+ This model quantized the [korean Llama 2 13B](https://huggingface.co/davidkim205/komt-Llama-2-13b-hf) using [llama.cpp](https://github.com/ggerganov/llama.cpp) to 4-bit quantization.
22
+
23
+
24
+ Our model, being in the same format as TheBloke's ggml, supports the following libraries or UI.
25
+
26
+
27
+ The following content references [TheBloke/Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML#metas-llama-2-13b-chat-ggml).
28
+
29
+ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
30
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with full GPU acceleration out of the box. Especially good for story telling.
31
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with GPU acceleration via the c_transformers backend.
32
+ * [LM Studio](https://lmstudio.ai/), a fully featured local GUI. Supports full GPU accel on macOS. Also supports Windows, without GPU accel.
33
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Requires extra steps to enable GPU accel via llama.cpp backend.
34
+ * [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server.
35
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server.
36
+
37
+
38
+ ## Model Details
39
+
40
+ * **Model Developers** : davidkim(changyeon kim)
41
+ * **Repository** : https://github.com/davidkim205/komt
42
+ * **quant methods** : q4_0, q4_1, q5_0, q5_1, q2_k, q3_k, q3_k_m, q3_k_l, q4_k, q4_k_s, q4_k_m, q5_k, q5_k_s, q5_k_m, q8_0, q4_0
43
+
44
+ ## Prompt Template
45
+ ```
46
+ ### instruction: {prompt}
47
+
48
+ ### Response:
49
+ ```
50
+ Examples:
51
+ ```
52
+ ### instruction: ์ž๋™์ฐจ ์ข…ํ•ฉ(์ •๊ธฐ)๊ฒ€์‚ฌ ์˜๋ฌด๊ธฐ๊ฐ„์€ ์–ผ๋งˆ์ธ๊ฐ€์š”?
53
+
54
+ ### Response:
55
+
56
+ ```
57
+ response:
58
+ ```
59
+ ### instruction: ์ž๋™์ฐจ ์ข…ํ•ฉ(์ •๊ธฐ)๊ฒ€์‚ฌ ์˜๋ฌด๊ธฐ๊ฐ„์€ ์–ผ๋งˆ์ธ๊ฐ€์š”?
60
+
61
+ ### Response:์ž๋™์ฐจ ์ข…ํ•ฉ(์ •๊ธฐ)๊ฒ€์‚ฌ๋Š” 2๋…„
62
+ 1991๋…„ 7์›” 1์ผ์— ๊ณ ์‹œ๋œ '์ž๋™์ฐจ ๋ณดํ—˜๋ฃŒ ์กฐ์ •๊ธฐ์ค€'์—์„œ ์ทจ๋ฆฌ๋กœ๋ถ€ํ„ฐ ์ œ์ •๋œ ๊ธฐ์ค€ ์ƒ ๊ฒฝ๋Ÿ‰ ์‚ด์ˆ˜์ฐจ๋ฅผ ์ œ์™ธํ•œ ์ž๋™์ฐจ ๋ชจ๋“  ์Šน์šฉ์ž๋™์ฐจ๋Š” 2๋…„๋งˆ๋‹ค ํ•„์š”ํ•˜๋‹ค. ์ด ๋ฒ•์€ ์ฐจ๋Ÿ‰์— ๊ด€๊ณ„์—†์ด 2๋…„๋งˆ๋‹ค ์ •๊ธฐ๊ฒ€์‚ฌ๋ฅผ ํ•ด์•ผํ•œ๋‹ค๊ณ  ๊ทœ์ œํ–ˆ๋‹ค.
63
+ ```
64
+
65
+
66
+ ## Usage
67
+
68
+ When using the original [llama.cpp](https://github.com/ggerganov/llama.cpp)
69
+ ```
70
+ make -j && ./main -m models/komt-Llama-2-13b-hf-ggml/ggml-model-q8_0.bin -p "### instruction: ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ\n\n### Response:"
71
+ ```
72
+ When using the modified llama.cpp for korean multi-task (recommended):
73
+ Refer https://github.com/davidkim205/komt/tree/main/llama.cpp
74
+ ```
75
+ make -j && ./main -m models/komt-Llama-2-13b-hf-ggml/ggml-model-q8_0.bin -p "์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ"
76
+ ```
77
+ response:
78
+ ```
79
+ $ make -j && ./main -m models/komt-Llama-2-13b-hf-ggml/ggml-model-q8_0.bin -p "์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ"
80
+ I llama.cpp build info:
81
+ I UNAME_S: Linux
82
+ I UNAME_P: x86_64
83
+ I UNAME_M: x86_64
84
+ I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
85
+ I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS
86
+ I LDFLAGS:
87
+ I CC: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
88
+ I CXX: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
89
+
90
+ make: Nothing to be done for 'default'.
91
+ main: build = 6 (01a61bf)
92
+ main: seed = 1692190774
93
+ llama.cpp: loading model from models/komt-Llama-2-13b-hf-ggml/ggml-model-q8_0.bin
94
+ llama_model_load_internal: format = ggjt v3 (latest)
95
+ llama_model_load_internal: n_vocab = 32000
96
+ llama_model_load_internal: n_ctx = 512
97
+ llama_model_load_internal: n_embd = 5120
98
+ llama_model_load_internal: n_mult = 6912
99
+ llama_model_load_internal: n_head = 40
100
+ llama_model_load_internal: n_head_kv = 40
101
+ llama_model_load_internal: n_layer = 40
102
+ llama_model_load_internal: n_rot = 128
103
+ llama_model_load_internal: n_gqa = 1
104
+ llama_model_load_internal: rnorm_eps = 5.0e-06
105
+ llama_model_load_internal: n_ff = 13824
106
+ llama_model_load_internal: freq_base = 10000.0
107
+ llama_model_load_internal: freq_scale = 1
108
+ llama_model_load_internal: ftype = 7 (mostly Q8_0)
109
+ llama_model_load_internal: model size = 13B
110
+ llama_model_load_internal: ggml ctx size = 0.11 MB
111
+ llama_model_load_internal: mem required = 13152.13 MB (+ 400.00 MB per state)
112
+ llama_new_context_with_model: kv self size = 400.00 MB
113
+ llama_new_context_with_model: compute buffer total size = 75.35 MB
114
+
115
+ system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
116
+ sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
117
+ generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
118
+
119
+
120
+ ### instruction: ์˜ํ™” ํ•ด๋ฆฌํฌํ„ฐ ์‹œ๋ฆฌ์ฆˆ ๋ฐฐ๊ธ‰์‚ฌ๊ฐ€ ์–ด๋””์•ผ
121
+
122
+ ### Response:์›Œ๋„ˆ ๋ธŒ๋ผ๋”์Šค
123
+ ํ•ด๋ฆฌํฌํ„ฐ(Harry Potter)๋Š” J. K. ๋กค๋ง์ด ์“ด ํŒํƒ€์ง€ ์†Œ์„ค์ด๋‹ค. 1997๋…„๋ถ€ํ„ฐ 2007๋…„๊นŒ์ง€ ์ด 7๊ถŒ์œผ๋กœ ๋ฐœํ–‰๋˜์—ˆ๊ณ , ์ „ ์„ธ๊ณ„์ ์œผ๋กœ ๋งŽ์€ ์ธ๊ธฐ๋ฅผ ๋Œ์—ˆ๋‹ค. ์˜๊ตญ์—์„œ๋Š” ๋ธ”๋ฃธ๋ฒ„๊ทธ(Bloomsbury), ๋ฏธ๊ตญ์—์„œ๋Š” ์›Œ๋„ˆ ๋ธŒ๋ผ๋”์Šค(Warner Brothers)๊ฐ€ ๊ฐ๊ฐ ์ถœํŒํ•˜์˜€๋‹ค. ํ˜„์žฌ ์ „ ์„ธ๊ณ„์ ์œผ๋กœ 2์–ต 4,000๋งŒ ๋ถ€ ์ด์ƒ์˜ ํŒ๋งค๊ณ ๋ฅผ ์˜ฌ๋ฆฌ๊ณ  ์žˆ์œผ๋ฉฐ, ์ „ ์„ธ๊ณ„ ๋Œ€๋ถ€๋ถ„์˜ ๋ฌธํ•™๊ฐ€๋“ค์—๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ์—ˆ๋‹ค. ### check_end_of_text [end of text]
124
+
125
+ llama_print_timings: load time = 801.73 ms
126
+ llama_print_timings: sample time = 108.54 ms / 308 runs ( 0.35 ms per token, 2837.66 tokens per second)
127
+ llama_print_timings: prompt eval time = 2651.47 ms / 43 tokens ( 61.66 ms per token, 16.22 tokens per second)
128
+ llama_print_timings: eval time = 120629.25 ms / 307 runs ( 392.93 ms per token, 2.54 tokens per second)
129
+ llama_print_timings: total time = 123440.86 ms
130
+
131
+ ```