Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -73,7 +73,7 @@ The following clients/libraries will automatically download models for you, prov
|
|
73 |
* Faraday.dev
|
74 |
|
75 |
- **Option A** - Downloading in `text-generation-webui`:
|
76 |
-
- **Step 1**: Under Download Model, you can enter the model repo:
|
77 |
- **Step 2**: Then click Download.
|
78 |
|
79 |
- **Option B** - Downloading on the command line (including multiple files at once):
|
@@ -83,14 +83,14 @@ pip3 install huggingface-hub
|
|
83 |
```
|
84 |
- **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this:
|
85 |
```shell
|
86 |
-
huggingface-cli download
|
87 |
```
|
88 |
<details>
|
89 |
<summary>More advanced huggingface-cli download usage (click to read)</summary>
|
90 |
Alternatively, you can also download multiple files at once with a pattern:
|
91 |
|
92 |
```shell
|
93 |
-
huggingface-cli download
|
94 |
```
|
95 |
|
96 |
For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
|
@@ -104,7 +104,7 @@ pip3 install hf_transfer
|
|
104 |
And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
|
105 |
|
106 |
```shell
|
107 |
-
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download
|
108 |
```
|
109 |
|
110 |
Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
|
@@ -119,7 +119,7 @@ Windows Command Line users: You can set the environment variable by running `set
|
|
119 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
120 |
|
121 |
```shell
|
122 |
-
./main -ngl 35 -m Llama3-ChatQA-1.5-8B.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {prompt\} [/INST]"
|
123 |
```
|
124 |
|
125 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
@@ -180,7 +180,7 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
|
|
180 |
|
181 |
# Simple inference example
|
182 |
output = llm(
|
183 |
-
"<s>[INST] {prompt} [/INST]", # Prompt
|
184 |
max_tokens=512, # Generate up to 512 tokens
|
185 |
stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
|
186 |
echo=True # Whether to echo the prompt
|
@@ -191,11 +191,11 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
|
|
191 |
llm = Llama(model_path="./Llama3-ChatQA-1.5-8B.IQ3_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
|
192 |
llm.create_chat_completion(
|
193 |
messages = [
|
194 |
-
{"role": "system", "content": "You are a story writing assistant."},
|
195 |
-
{
|
196 |
"role": "user",
|
197 |
"content": "Write a story about llamas."
|
198 |
-
}
|
199 |
]
|
200 |
)
|
201 |
```
|
@@ -218,4 +218,4 @@ The license of the smashed model follows the license of the original model. Plea
|
|
218 |
## Want to compress other models?
|
219 |
|
220 |
- Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
|
221 |
-
- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
|
|
|
73 |
* Faraday.dev
|
74 |
|
75 |
- **Option A** - Downloading in `text-generation-webui`:
|
76 |
+
- **Step 1**: Under Download Model, you can enter the model repo: nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed and below it, a specific filename to download, such as: phi-2.IQ3_M.gguf.
|
77 |
- **Step 2**: Then click Download.
|
78 |
|
79 |
- **Option B** - Downloading on the command line (including multiple files at once):
|
|
|
83 |
```
|
84 |
- **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this:
|
85 |
```shell
|
86 |
+
huggingface-cli download nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed Llama3-ChatQA-1.5-8B.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
|
87 |
```
|
88 |
<details>
|
89 |
<summary>More advanced huggingface-cli download usage (click to read)</summary>
|
90 |
Alternatively, you can also download multiple files at once with a pattern:
|
91 |
|
92 |
```shell
|
93 |
+
huggingface-cli download nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
|
94 |
```
|
95 |
|
96 |
For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
|
|
|
104 |
And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
|
105 |
|
106 |
```shell
|
107 |
+
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed Llama3-ChatQA-1.5-8B.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
|
108 |
```
|
109 |
|
110 |
Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
|
|
|
119 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
120 |
|
121 |
```shell
|
122 |
+
./main -ngl 35 -m Llama3-ChatQA-1.5-8B.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {{prompt\}} [/INST]"
|
123 |
```
|
124 |
|
125 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
180 |
|
181 |
# Simple inference example
|
182 |
output = llm(
|
183 |
+
"<s>[INST] {{prompt}} [/INST]", # Prompt
|
184 |
max_tokens=512, # Generate up to 512 tokens
|
185 |
stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
|
186 |
echo=True # Whether to echo the prompt
|
|
|
191 |
llm = Llama(model_path="./Llama3-ChatQA-1.5-8B.IQ3_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
|
192 |
llm.create_chat_completion(
|
193 |
messages = [
|
194 |
+
{{"role": "system", "content": "You are a story writing assistant."}},
|
195 |
+
{{
|
196 |
"role": "user",
|
197 |
"content": "Write a story about llamas."
|
198 |
+
}}
|
199 |
]
|
200 |
)
|
201 |
```
|
|
|
218 |
## Want to compress other models?
|
219 |
|
220 |
- Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
|
221 |
+
- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
|