newsletter
commited on
Commit
•
aebcda0
1
Parent(s):
acda84d
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
---
|
|
|
2 |
license: apache-2.0
|
3 |
tags:
|
4 |
- LLMs
|
@@ -7,7 +8,6 @@ tags:
|
|
7 |
- Intel
|
8 |
- llama-cpp
|
9 |
- gguf-my-repo
|
10 |
-
base_model: Intel/neural-chat-7b-v3-1
|
11 |
model-index:
|
12 |
- name: neural-chat-7b-v3-3
|
13 |
results:
|
@@ -147,29 +147,43 @@ model-index:
|
|
147 |
# newsletter/neural-chat-7b-v3-3-Q6_K-GGUF
|
148 |
This model was converted to GGUF format from [`Intel/neural-chat-7b-v3-3`](https://huggingface.co/Intel/neural-chat-7b-v3-3) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
149 |
Refer to the [original model card](https://huggingface.co/Intel/neural-chat-7b-v3-3) for more details on the model.
|
150 |
-
## Use with llama.cpp
|
151 |
|
152 |
-
|
|
|
153 |
|
154 |
```bash
|
155 |
-
brew install
|
|
|
156 |
```
|
157 |
Invoke the llama.cpp server or the CLI.
|
158 |
|
159 |
-
CLI:
|
160 |
-
|
161 |
```bash
|
162 |
-
llama-cli --hf-repo newsletter/neural-chat-7b-v3-3-Q6_K-GGUF --
|
163 |
```
|
164 |
|
165 |
-
Server:
|
166 |
-
|
167 |
```bash
|
168 |
-
llama-server --hf-repo newsletter/neural-chat-7b-v3-3-Q6_K-GGUF --
|
169 |
```
|
170 |
|
171 |
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
|
172 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
173 |
```
|
174 |
-
|
175 |
```
|
|
|
1 |
---
|
2 |
+
base_model: Intel/neural-chat-7b-v3-3
|
3 |
license: apache-2.0
|
4 |
tags:
|
5 |
- LLMs
|
|
|
8 |
- Intel
|
9 |
- llama-cpp
|
10 |
- gguf-my-repo
|
|
|
11 |
model-index:
|
12 |
- name: neural-chat-7b-v3-3
|
13 |
results:
|
|
|
147 |
# newsletter/neural-chat-7b-v3-3-Q6_K-GGUF
|
148 |
This model was converted to GGUF format from [`Intel/neural-chat-7b-v3-3`](https://huggingface.co/Intel/neural-chat-7b-v3-3) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
149 |
Refer to the [original model card](https://huggingface.co/Intel/neural-chat-7b-v3-3) for more details on the model.
|
|
|
150 |
|
151 |
+
## Use with llama.cpp
|
152 |
+
Install llama.cpp through brew (works on Mac and Linux)
|
153 |
|
154 |
```bash
|
155 |
+
brew install llama.cpp
|
156 |
+
|
157 |
```
|
158 |
Invoke the llama.cpp server or the CLI.
|
159 |
|
160 |
+
### CLI:
|
|
|
161 |
```bash
|
162 |
+
llama-cli --hf-repo newsletter/neural-chat-7b-v3-3-Q6_K-GGUF --hf-file neural-chat-7b-v3-3-q6_k.gguf -p "The meaning to life and the universe is"
|
163 |
```
|
164 |
|
165 |
+
### Server:
|
|
|
166 |
```bash
|
167 |
+
llama-server --hf-repo newsletter/neural-chat-7b-v3-3-Q6_K-GGUF --hf-file neural-chat-7b-v3-3-q6_k.gguf -c 2048
|
168 |
```
|
169 |
|
170 |
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
|
171 |
|
172 |
+
Step 1: Clone llama.cpp from GitHub.
|
173 |
+
```
|
174 |
+
git clone https://github.com/ggerganov/llama.cpp
|
175 |
+
```
|
176 |
+
|
177 |
+
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
|
178 |
+
```
|
179 |
+
cd llama.cpp && LLAMA_CURL=1 make
|
180 |
+
```
|
181 |
+
|
182 |
+
Step 3: Run inference through the main binary.
|
183 |
+
```
|
184 |
+
./llama-cli --hf-repo newsletter/neural-chat-7b-v3-3-Q6_K-GGUF --hf-file neural-chat-7b-v3-3-q6_k.gguf -p "The meaning to life and the universe is"
|
185 |
+
```
|
186 |
+
or
|
187 |
```
|
188 |
+
./llama-server --hf-repo newsletter/neural-chat-7b-v3-3-Q6_K-GGUF --hf-file neural-chat-7b-v3-3-q6_k.gguf -c 2048
|
189 |
```
|