TheBloke commited on
Commit
7ff4326
1 Parent(s): fa82664

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -17
README.md CHANGED
@@ -56,7 +56,7 @@ This repo contains GGUF format model files for [medalpaca's Medalpaca 13B](https
56
  <!-- README_GGUF.md-about-gguf start -->
57
  ### About GGUF
58
 
59
- GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
60
 
61
  Here is an incomplate list of clients and libraries that are known to support GGUF:
62
 
@@ -99,7 +99,7 @@ Below is an instruction that describes a task. Write a response that appropriate
99
  <!-- compatibility_gguf start -->
100
  ## Compatibility
101
 
102
- These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
103
 
104
  They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
105
 
@@ -192,25 +192,25 @@ pip3 install hf_transfer
192
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
193
 
194
  ```shell
195
- HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/medalpaca-13B-GGUF medalpaca-13b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
196
  ```
197
 
198
- Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running the download command.
199
  </details>
200
  <!-- README_GGUF.md-how-to-download end -->
201
 
202
  <!-- README_GGUF.md-how-to-run start -->
203
  ## Example `llama.cpp` command
204
 
205
- Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
206
 
207
  ```shell
208
- ./main -ngl 32 -m medalpaca-13b.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:"
209
  ```
210
 
211
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
212
 
213
- Change `-c 4096` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
214
 
215
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
216
 
@@ -224,22 +224,24 @@ Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://git
224
 
225
  You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
226
 
227
- ### How to load this model from Python using ctransformers
228
 
229
  #### First install the package
230
 
231
- ```bash
 
 
232
  # Base ctransformers with no GPU acceleration
233
- pip install ctransformers>=0.2.24
234
  # Or with CUDA GPU acceleration
235
- pip install ctransformers[cuda]>=0.2.24
236
- # Or with ROCm GPU acceleration
237
- CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
238
- # Or with Metal GPU acceleration for macOS systems
239
- CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
240
  ```
241
 
242
- #### Simple example code to load one of these GGUF models
243
 
244
  ```python
245
  from ctransformers import AutoModelForCausalLM
@@ -252,7 +254,7 @@ print(llm("AI is going to"))
252
 
253
  ## How to use with LangChain
254
 
255
- Here's guides on using llama-cpp-python or ctransformers with LangChain:
256
 
257
  * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
258
  * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
 
56
  <!-- README_GGUF.md-about-gguf start -->
57
  ### About GGUF
58
 
59
+ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
60
 
61
  Here is an incomplate list of clients and libraries that are known to support GGUF:
62
 
 
99
  <!-- compatibility_gguf start -->
100
  ## Compatibility
101
 
102
+ These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwards, as of commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221)
103
 
104
  They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
105
 
 
192
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
193
 
194
  ```shell
195
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/medalpaca-13B-GGUF medalpaca-13b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
196
  ```
197
 
198
+ Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
199
  </details>
200
  <!-- README_GGUF.md-how-to-download end -->
201
 
202
  <!-- README_GGUF.md-how-to-run start -->
203
  ## Example `llama.cpp` command
204
 
205
+ Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
206
 
207
  ```shell
208
+ ./main -ngl 32 -m medalpaca-13b.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{prompt}\n\n### Response:"
209
  ```
210
 
211
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
212
 
213
+ Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
214
 
215
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
216
 
 
224
 
225
  You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
226
 
227
+ ### How to load this model in Python code, using ctransformers
228
 
229
  #### First install the package
230
 
231
+ Run one of the following commands, according to your system:
232
+
233
+ ```shell
234
  # Base ctransformers with no GPU acceleration
235
+ pip install ctransformers
236
  # Or with CUDA GPU acceleration
237
+ pip install ctransformers[cuda]
238
+ # Or with AMD ROCm GPU acceleration (Linux only)
239
+ CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
240
+ # Or with Metal GPU acceleration for macOS systems only
241
+ CT_METAL=1 pip install ctransformers --no-binary ctransformers
242
  ```
243
 
244
+ #### Simple ctransformers example code
245
 
246
  ```python
247
  from ctransformers import AutoModelForCausalLM
 
254
 
255
  ## How to use with LangChain
256
 
257
+ Here are guides on using llama-cpp-python and ctransformers with LangChain:
258
 
259
  * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
260
  * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)