StarFox7 commited on
Commit
9879b2c
โ€ข
1 Parent(s): 4605e13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -4,13 +4,17 @@ language:
4
  - ko
5
  ---
6
  # Llama-2-ko-7b-ggml
 
 
7
  Llama-2-ko-7b-ggml ์€ [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) ์˜ **GGML** ํฌ๋งท ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
8
 
9
  - Llama2 tokenizer ์— [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) ์—์„œ ์‚ฌ์šฉ๋œ ํ•œ๊ตญ์–ด Additaional Token ์„ ๋ฐ˜์˜ํ•˜์—ฌ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
10
  - **GGML** ํฌ๋งท ๋ชจ๋ธ์€ [llama.cpp](https://github.com/ggerganov/llama.cpp) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ C/C++ ๊ธฐ๋ฐ˜์œผ๋กœ Inference ํ•ฉ๋‹ˆ๋‹ค.
 
11
  - [llama.cpp](https://github.com/ggerganov/llama.cpp) ์˜ Python Binding ํŒจํ‚ค์ง€์ธ [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) ์„ ์‚ฌ์šฉํ•˜๋ฉด python ํ™˜๊ฒฝ์—์„œ๋„ Inference ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
 
12
 
13
- ์ฐธ๊ณ ๋กœ, [Llama-2-ko-7b-chat-ggml](https://huggingface.co/StarFox7/Llama-2-ko-7B-ggml) ์—์„œ [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) ์— [nlpai-lab/kullm-v2](https://huggingface.co/datasets/nlpai-lab/kullm-v2) ์„ ์ถ”๊ฐ€ ํ•™์Šตํ•œ [Llama-2-ko-7b-chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat) ์˜ **GGML** ํฌ๋งท ๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
14
 
15
  ---
16
  # ์–‘์žํ™” (Quantization)
@@ -24,10 +28,13 @@ Llama-2-ko-7b-ggml ์€ [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-
24
  # Inference Code Example (Python)
25
  ๋‹ค์Œ์€ Inference ๋ฅผ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ Example Code ์ž…๋‹ˆ๋‹ค. [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) ๊ทธ๋ฆฌ๊ณ  ์ด Repository ์˜ Llama-2-ko-7b-ggml-q4_0.bin ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
26
  ```python
 
27
  from llama_cpp import Llama
28
 
29
  llm = Llama(model_path = 'Llama-2-ko-7b-ggml-q4_0.bin',
30
- n_ctx=1024)
 
 
31
 
32
  output = llm("Q: ์ธ์ƒ์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•˜์‹œ์˜ค. A: ", max_tokens=1024, stop=["Q:", "\n"], echo=True)
33
 
 
4
  - ko
5
  ---
6
  # Llama-2-ko-7b-ggml
7
+ <img src=https://huggingface.co/StarFox7/Llama-2-ko-7B-ggml/resolve/main/cute.png style="max-width: 200px; width: 100%" />
8
+
9
  Llama-2-ko-7b-ggml ์€ [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) ์˜ **GGML** ํฌ๋งท ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
10
 
11
  - Llama2 tokenizer ์— [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) ์—์„œ ์‚ฌ์šฉ๋œ ํ•œ๊ตญ์–ด Additaional Token ์„ ๋ฐ˜์˜ํ•˜์—ฌ ์ƒ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
12
  - **GGML** ํฌ๋งท ๋ชจ๋ธ์€ [llama.cpp](https://github.com/ggerganov/llama.cpp) ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ C/C++ ๊ธฐ๋ฐ˜์œผ๋กœ Inference ํ•ฉ๋‹ˆ๋‹ค.
13
+ - **GGML** ํฌ๋งท ๋ชจ๋ธ์€ ๋น„๊ต์  ๋‚ฎ์€ ์‚ฌ์–‘์˜ ์ปดํ“จํŒ… ์ž์›์—์„œ๋„ Inference ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ( ์˜ˆ: 4๋น„ํŠธ ์–‘์žํ™” ๋ชจ๋ธ (q4) ์€ CPU,7-8GB RAM ํ™˜๊ฒฝ์—์„œ Inference ๊ฐ€๋Šฅ )
14
  - [llama.cpp](https://github.com/ggerganov/llama.cpp) ์˜ Python Binding ํŒจํ‚ค์ง€์ธ [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) ์„ ์‚ฌ์šฉํ•˜๋ฉด python ํ™˜๊ฒฝ์—์„œ๋„ Inference ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
15
+
16
 
17
+ ์ฐธ๊ณ ๋กœ, [Llama-2-ko-7b-chat-ggml](https://huggingface.co/StarFox7/Llama-2-ko-7B-chat-ggml) ์—์„œ [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) ์— [nlpai-lab/kullm-v2](https://huggingface.co/datasets/nlpai-lab/kullm-v2) ์„ ์ถ”๊ฐ€ ํ•™์Šตํ•œ [kfkas/Llama-2-ko-7b-chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat) ์˜ **GGML** ํฌ๋งท ๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
18
 
19
  ---
20
  # ์–‘์žํ™” (Quantization)
 
28
  # Inference Code Example (Python)
29
  ๋‹ค์Œ์€ Inference ๋ฅผ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ Example Code ์ž…๋‹ˆ๋‹ค. [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) ๊ทธ๋ฆฌ๊ณ  ์ด Repository ์˜ Llama-2-ko-7b-ggml-q4_0.bin ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
30
  ```python
31
+ # !pip install llama-cpp-python # llama-cpp-python ์ด ์„ค์น˜๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค๋ฉด ์ฃผ์„์„ ํ•ด์ œํ•˜์—ฌ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
32
  from llama_cpp import Llama
33
 
34
  llm = Llama(model_path = 'Llama-2-ko-7b-ggml-q4_0.bin',
35
+ n_ctx=1024,
36
+ # n_gpu_layers=1 #gpu ๊ฐ€์†์„ ์›ํ•˜๋Š” ๊ฒฝ์šฐ ์ฃผ์„์„ ํ•ด์ œํ•˜๊ณ  Metal(Apple M1) ์€ 1, Cuda(Nvidia) ๋Š” Video RAM Size ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์ ์ •ํ•œ ์ˆ˜์น˜๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.)
37
+ )
38
 
39
  output = llm("Q: ์ธ์ƒ์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•˜์‹œ์˜ค. A: ", max_tokens=1024, stop=["Q:", "\n"], echo=True)
40