limhyeonseok commited on
Commit
f1a2785
β€’
1 Parent(s): da6eee8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -26
README.md CHANGED
@@ -85,34 +85,48 @@ Refer to the [original model card](https://huggingface.co/Bllossom/llama-3-Korea
85
 
86
  ## Example code
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
- ## Use with llama.cpp
90
-
91
- Install llama.cpp through brew.
92
-
93
- ```bash
94
- brew install ggerganov/ggerganov/llama.cpp
95
- ```
96
- Invoke the llama.cpp server or the CLI.
97
-
98
- CLI:
99
-
100
- ```bash
101
- llama-cli --hf-repo Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M --model bllossom_llama3_70b.Q4_K_M.gguf -p "μ„œμšΈκ³Όν•™κΈ°μˆ λŒ€ν•™κ΅ μž„κ²½νƒœ κ΅μˆ˜λŠ” 어떀연ꡬλ₯Όν•˜λ‹ˆ?"
102
- ```
103
-
104
- Server:
105
-
106
- ```bash
107
- llama-server --hf-repo Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M --model bllossom_llama3_70b.Q4_K_M.gguf -c 2048
108
- ```
109
-
110
- Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
111
-
112
  ```
113
- git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make && ./main -m bllossom_llama3_70b.Q4_K_M.gguf -n 128
114
- ```
115
-
116
 
117
 
118
 
 
85
 
86
  ## Example code
87
 
88
+ ```python
89
+ !CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
90
+ !huggingface-cli download Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M --local-dir='YOUR-LOCAL-FOLDER-PATH'
91
+
92
+ from llama_cpp import Llama
93
+ from transformers import AutoTokenizer
94
+
95
+ model_id = 'Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M'
96
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
97
+ model = Llama(
98
+ model_path='YOUR-LOCAL-FOLDER-PATH/llama-3-Korean-Bllossom-70B-RAG-gguf-Q4_K_M.gguf',
99
+ n_ctx=512,
100
+ n_gpu_layers=-1 # Number of model layers to offload to GPU
101
+ )
102
+
103
+ PROMPT = \
104
+ '''당신은 μœ μš©ν•œ AI μ–΄μ‹œμŠ€ν„΄νŠΈμž…λ‹ˆλ‹€. μ‚¬μš©μžμ˜ μ§ˆμ˜μ— λŒ€ν•΄ μΉœμ ˆν•˜κ³  μ •ν™•ν•˜κ²Œ λ‹΅λ³€ν•΄μ•Ό ν•©λ‹ˆλ‹€.
105
+ You are a helpful AI assistant, you'll need to answer users' queries in a friendly and accurate manner.'''
106
+
107
+ instruction = 'Your Instruction'
108
+
109
+ messages = [
110
+ {"role": "system", "content": f"{PROMPT}"},
111
+ {"role": "user", "content": f"{instruction}"}
112
+ ]
113
+
114
+ prompt = tokenizer.apply_chat_template(
115
+ messages,
116
+ tokenize = False,
117
+ add_generation_prompt=True
118
+ )
119
+
120
+ generation_kwargs = {
121
+ "max_tokens":512,
122
+ "stop":["<|eot_id|>"],
123
+ "echo":True, # Echo the prompt in the output
124
+ "top_k":1 # This is essentially greedy decoding, since the model will always return the highest-probability token. Set this value > 1 for sampling decoding
125
+ }
126
 
127
+ resonse_msg = model(prompt, **generation_kwargs)
128
+ print(resonse_msg['choices'][0]['text'][len(prompt):])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ```
 
 
 
130
 
131
 
132