sharpenb commited on
Commit
26bcb44
·
verified ·
1 Parent(s): 88be3b7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -73,7 +73,7 @@ The following clients/libraries will automatically download models for you, prov
73
  * Faraday.dev
74
 
75
  - **Option A** - Downloading in `text-generation-webui`:
76
- - **Step 1**: Under Download Model, you can enter the model repo: PrunaAI/Llama3-ChatQA-1.5-8B-GGUF-smashed and below it, a specific filename to download, such as: phi-2.IQ3_M.gguf.
77
  - **Step 2**: Then click Download.
78
 
79
  - **Option B** - Downloading on the command line (including multiple files at once):
@@ -83,14 +83,14 @@ pip3 install huggingface-hub
83
  ```
84
  - **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this:
85
  ```shell
86
- huggingface-cli download PrunaAI/Llama3-ChatQA-1.5-8B-GGUF-smashed Llama3-ChatQA-1.5-8B.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
87
  ```
88
  <details>
89
  <summary>More advanced huggingface-cli download usage (click to read)</summary>
90
  Alternatively, you can also download multiple files at once with a pattern:
91
 
92
  ```shell
93
- huggingface-cli download PrunaAI/Llama3-ChatQA-1.5-8B-GGUF-smashed --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
94
  ```
95
 
96
  For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
@@ -104,7 +104,7 @@ pip3 install hf_transfer
104
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
105
 
106
  ```shell
107
- HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download PrunaAI/Llama3-ChatQA-1.5-8B-GGUF-smashed Llama3-ChatQA-1.5-8B.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
108
  ```
109
 
110
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -119,7 +119,7 @@ Windows Command Line users: You can set the environment variable by running `set
119
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
120
 
121
  ```shell
122
- ./main -ngl 35 -m Llama3-ChatQA-1.5-8B.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {prompt\} [/INST]"
123
  ```
124
 
125
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -180,7 +180,7 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
180
 
181
  # Simple inference example
182
  output = llm(
183
- "<s>[INST] {prompt} [/INST]", # Prompt
184
  max_tokens=512, # Generate up to 512 tokens
185
  stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
186
  echo=True # Whether to echo the prompt
@@ -191,11 +191,11 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
191
  llm = Llama(model_path="./Llama3-ChatQA-1.5-8B.IQ3_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
192
  llm.create_chat_completion(
193
  messages = [
194
- {"role": "system", "content": "You are a story writing assistant."},
195
- {
196
  "role": "user",
197
  "content": "Write a story about llamas."
198
- }
199
  ]
200
  )
201
  ```
@@ -218,4 +218,4 @@ The license of the smashed model follows the license of the original model. Plea
218
  ## Want to compress other models?
219
 
220
  - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
221
- - Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
 
73
  * Faraday.dev
74
 
75
  - **Option A** - Downloading in `text-generation-webui`:
76
+ - **Step 1**: Under Download Model, you can enter the model repo: nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed and below it, a specific filename to download, such as: phi-2.IQ3_M.gguf.
77
  - **Step 2**: Then click Download.
78
 
79
  - **Option B** - Downloading on the command line (including multiple files at once):
 
83
  ```
84
  - **Step 2**: Then you can download any individual model file to the current directory, at high speed, with a command like this:
85
  ```shell
86
+ huggingface-cli download nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed Llama3-ChatQA-1.5-8B.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
87
  ```
88
  <details>
89
  <summary>More advanced huggingface-cli download usage (click to read)</summary>
90
  Alternatively, you can also download multiple files at once with a pattern:
91
 
92
  ```shell
93
+ huggingface-cli download nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'
94
  ```
95
 
96
  For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
 
104
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
105
 
106
  ```shell
107
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download nvidia-Llama3-ChatQA-1.5-8B-GGUF-smashed Llama3-ChatQA-1.5-8B.IQ3_M.gguf --local-dir . --local-dir-use-symlinks False
108
  ```
109
 
110
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 
119
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
120
 
121
  ```shell
122
+ ./main -ngl 35 -m Llama3-ChatQA-1.5-8B.IQ3_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {{prompt\}} [/INST]"
123
  ```
124
 
125
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 
180
 
181
  # Simple inference example
182
  output = llm(
183
+ "<s>[INST] {{prompt}} [/INST]", # Prompt
184
  max_tokens=512, # Generate up to 512 tokens
185
  stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
186
  echo=True # Whether to echo the prompt
 
191
  llm = Llama(model_path="./Llama3-ChatQA-1.5-8B.IQ3_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
192
  llm.create_chat_completion(
193
  messages = [
194
+ {{"role": "system", "content": "You are a story writing assistant."}},
195
+ {{
196
  "role": "user",
197
  "content": "Write a story about llamas."
198
+ }}
199
  ]
200
  )
201
  ```
 
218
  ## Want to compress other models?
219
 
220
  - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
221
+ - Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).