TheBloke commited on
Commit
81d3857
β€’
1 Parent(s): 3663290

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -30
README.md CHANGED
@@ -58,25 +58,22 @@ GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is
58
 
59
  The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
60
 
61
- As of August 25th, here is a list of clients and libraries that are known to support GGUF:
62
- * [llama.cpp](https://github.com/ggerganov/llama.cpp)
63
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
64
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
 
65
  * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
66
  * [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
67
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
68
  * [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
69
 
70
- The clients and libraries below are expecting to add GGUF support shortly:
71
- * [LM Studio](https://lmstudio.ai/), should be updated by end August 25th.
72
  <!-- README_GGUF.md-about-gguf end -->
73
-
74
  <!-- repositories-available start -->
75
  ## Repositories available
76
 
77
  * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GPTQ)
78
  * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF)
79
- * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference (deprecated)](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGML)
80
  * [WizardLM's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)
81
  <!-- repositories-available end -->
82
 
@@ -90,6 +87,7 @@ Below is an instruction that describes a task. Write a response that appropriate
90
  {prompt}
91
 
92
  ### Response:
 
93
  ```
94
 
95
  <!-- prompt-template end -->
@@ -98,9 +96,7 @@ Below is an instruction that describes a task. Write a response that appropriate
98
 
99
  These quantised GGUF files are compatible with llama.cpp from August 21st 2023 onwards, as of commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9)
100
 
101
- As of August 24th 2023 they are now compatible with KoboldCpp, release 1.41 and later.
102
-
103
- They are are not yet compatible with any other third-party UIS, libraries or utilities but this is expected to change very soon.
104
 
105
  ## Explanation of quantisation methods
106
  <details>
@@ -122,31 +118,36 @@ Refer to the Provided Files table below to see what files use which methods, and
122
 
123
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
124
  | ---- | ---- | ---- | ---- | ---- | ----- |
125
- | [wizardcoder-python-34b-v1.0.Q2_K.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q2_K.gguf) | Q2_K | 2 | 14.58 GB| 17.08 GB | smallest, significant quality loss - not recommended for most purposes |
126
- | [wizardcoder-python-34b-v1.0.Q3_K_S.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q3_K_S.gguf) | Q3_K_S | 3 | 14.95 GB| 17.45 GB | very small, high quality loss |
127
- | [wizardcoder-python-34b-v1.0.Q3_K_M.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q3_K_M.gguf) | Q3_K_M | 3 | 16.63 GB| 19.13 GB | very small, high quality loss |
128
- | [wizardcoder-python-34b-v1.0.Q3_K_L.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q3_K_L.gguf) | Q3_K_L | 3 | 18.12 GB| 20.62 GB | small, substantial quality loss |
129
- | [wizardcoder-python-34b-v1.0.Q4_K_S.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q4_K_S.gguf) | Q4_K_S | 4 | 19.46 GB| 21.96 GB | small, greater quality loss |
130
- | [wizardcoder-python-34b-v1.0.Q4_K_M.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q4_K_M.gguf) | Q4_K_M | 4 | 20.53 GB| 23.03 GB | medium, balanced quality - recommended |
131
- | [wizardcoder-python-34b-v1.0.Q5_K_S.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_K_S.gguf) | Q5_K_S | 5 | 23.51 GB| 26.01 GB | large, low quality loss - recommended |
132
- | [wizardcoder-python-34b-v1.0.Q5_K_M.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_K_M.gguf) | Q5_K_M | 5 | 24.12 GB| 26.62 GB | large, very low quality loss - recommended |
133
- | [wizardcoder-python-34b-v1.0.Q6_K.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q6_K.gguf) | Q6_K | 6 | 27.93 GB| 30.43 GB | very large, extremely low quality loss |
 
 
134
  | [wizardcoder-python-34b-v1.0.Q8_0.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q8_0.gguf) | Q8_0 | 8 | 35.86 GB| 38.36 GB | very large, extremely low quality loss - not recommended |
135
 
136
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 
 
 
137
  <!-- README_GGUF.md-provided-files end -->
138
 
139
  <!-- README_GGUF.md-how-to-run start -->
140
- ## How to run in `llama.cpp`
141
 
142
  Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9) or later.
143
 
144
- For compatibility with older versions of llama.cpp, or for use with third-party clients and libaries, please use GGML files instead.
145
 
146
  ```
147
- ./main -t 10 -ngl 32 -m wizardcoder-python-34b-v1.0.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
148
  ```
149
- Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
150
 
151
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
152
 
@@ -159,6 +160,44 @@ For other parameters and how to use them, please refer to [the llama.cpp documen
159
  ## How to run in `text-generation-webui`
160
 
161
  Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  <!-- README_GGUF.md-how-to-run end -->
163
 
164
  <!-- footer start -->
@@ -184,7 +223,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
184
 
185
  **Special thanks to**: Aemon Algiz.
186
 
187
- **Patreon special mentions**: Kacper WikieΕ‚, knownsqashed, Leonard Tan, Asp the Wyvern, Daniel P. Andersen, Luke Pendergrass, Stanislav Ovsiannikov, RoA, Dave, Ai Maven, Kalila, Will Dee, Imad Khwaja, Nitin Borwankar, Joseph William Delisle, Tony Hughes, Cory Kujawski, Rishabh Srivastava, Russ Johnson, Stephen Murray, Lone Striker, Johann-Peter Hartmann, Elle, J, Deep Realms, SuperWojo, Raven Klaugh, Sebastain Graf, ReadyPlayerEmma, Alps Aficionado, Mano Prime, Derek Yates, Gabriel Puliatti, Mesiah Bishop, Magnesian, Sean Connelly, biorpg, Iucharbius, Olakabola, Fen Risland, Space Cruiser, theTransient, Illia Dulskyi, Thomas Belote, Spencer Kim, Pieter, John Detwiler, Fred von Graf, Michael Davis, Swaroop Kallakuri, subjectnull, Clay Pascal, Subspace Studios, Chris Smitley, Enrico Ros, usrbinkat, Steven Wood, alfie_i, David Ziegler, Willem Michiel, Matthew Berman, Andrey, Pyrater, Jeffrey Morgan, vamX, LangChain4j, Luke @flexchar, Trenton Dambrowitz, Pierre Kircher, Alex, Sam, James Bentley, Edmond Seymore, Eugene Pentland, Pedro Madruga, Rainer Wilmers, Dan Guido, Nathan LeClaire, Spiking Neurons AB, Talal Aujan, zynix, Artur Olbinski, Michael Levine, 阿明, K, John Villwock, Nikolai Manek, Femi Adebogun, senxiiz, Deo Leter, NimbleBox.ai, Viktor Bowallius, Geoffrey Montalvo, Mandus, Ajan Kanaga, ya boyyy, Jonathan Leane, webtim, Brandon Frisco, danny, Alexandros Triantafyllidis, Gabriel Tamborski, Randy H, terasurfer, Vadim, Junyu Yang, Vitor Caleffi, Chadd, transmissions 11
188
 
189
 
190
  Thank you to all my generous patrons and donaters!
@@ -197,6 +236,13 @@ And thank you again to a16z for their generous grant.
197
  # Original model card: WizardLM's WizardCoder Python 34B V1.0
198
 
199
 
 
 
 
 
 
 
 
200
  ## News
201
 
202
  - πŸ”₯πŸ”₯πŸ”₯[2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
@@ -206,16 +252,19 @@ And thank you again to a16z for their generous grant.
206
 
207
 
208
  | Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License |
209
- | ----- |------| ---- |------|-------| ----- | ----- |
210
- | WizardCoder-Python-34B-V1.0 | πŸ€— <a href="" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 73.2 | 61.2 | [Demo](http://47.103.63.15:50085/) | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a> |
211
  | WizardCoder-15B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 59.8 |50.6 | -- | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> |
 
 
 
212
 
213
 
214
  - Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
215
  - Our **WizardMath-70B-V1.0** model achieves **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM, and achieves **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM.
216
 
217
  <font size=4>
218
-
219
  | Model | Checkpoint | Paper | GSM8k | MATH |Online Demo| License|
220
  | ----- |------| ---- |------|-------| ----- | ----- |
221
  | WizardMath-70B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **81.6** | **22.7** |[Demo](http://47.103.63.15:50083/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> |
@@ -224,13 +273,13 @@ And thank you again to a16z for their generous grant.
224
  </font>
225
 
226
 
227
- - [08/09/2023] We released **WizardLM-70B-V1.0** model. Here is [Full Model Weight](https://huggingface.co/WizardLM/WizardLM-70B-V1.0).
228
 
229
  <font size=4>
230
-
231
-
232
  | <sup>Model</sup> | <sup>Checkpoint</sup> | <sup>Paper</sup> |<sup>MT-Bench</sup> | <sup>AlpacaEval</sup> | <sup>GSM8k</sup> | <sup>HumanEval</sup> | <sup>License</sup>|
233
- | ----- |------| ---- |------|-------| ----- | ----- | ----- |
234
  | <sup>**WizardLM-70B-V1.0**</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-70B-V1.0" target="_blank">HF Link</a> </sup>|<sup>πŸ“ƒ**Coming Soon**</sup>| <sup>**7.78**</sup> | <sup>**92.91%**</sup> |<sup>**77.6%**</sup> | <sup> **50.6**</sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
235
  | <sup>WizardLM-13B-V1.2</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.2" target="_blank">HF Link</a> </sup>| | <sup>7.06</sup> | <sup>89.17%</sup> |<sup>55.3%</sup> | <sup>36.6 </sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
236
  | <sup>WizardLM-13B-V1.1</sup> |<sup> πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.1" target="_blank">HF Link</a> </sup> | | <sup>6.76</sup> |<sup>86.32%</sup> | | <sup>25.0 </sup>| <sup>Non-commercial</sup>|
@@ -248,4 +297,25 @@ And thank you again to a16z for their generous grant.
248
  <a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardCoder/imgs/compare_sota.png" alt="WizardCoder" style="width: 96%; min-width: 300px; display: block; margin: auto;"></a>
249
  </p>
250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
  <!-- original-model-card end -->
 
58
 
59
  The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.
60
 
61
+ Here are a list of clients and libraries that are known to support GGUF:
62
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp).
63
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI. Supports GGUF with GPU acceleration via the ctransformers backend - llama-cpp-python backend should work soon too.
64
  * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
65
+ * [LM Studio](https://lmstudio.ai/), version 0.2.2 and later support GGUF. A fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
66
  * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
67
  * [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
68
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), supports GGUF as of version 0.1.79. A Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
69
  * [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
70
 
 
 
71
  <!-- README_GGUF.md-about-gguf end -->
 
72
  <!-- repositories-available start -->
73
  ## Repositories available
74
 
75
  * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GPTQ)
76
  * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF)
 
77
  * [WizardLM's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)
78
  <!-- repositories-available end -->
79
 
 
87
  {prompt}
88
 
89
  ### Response:
90
+
91
  ```
92
 
93
  <!-- prompt-template end -->
 
96
 
97
  These quantised GGUF files are compatible with llama.cpp from August 21st 2023 onwards, as of commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9)
98
 
99
+ They are now also compatible with many third party UIs and libraries - please see the list at the top of the README.
 
 
100
 
101
  ## Explanation of quantisation methods
102
  <details>
 
118
 
119
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
120
  | ---- | ---- | ---- | ---- | ---- | ----- |
121
+ | [wizardcoder-python-34b-v1.0.Q2_K.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q2_K.gguf) | Q2_K | 2 | 14.21 GB| 16.71 GB | smallest, significant quality loss - not recommended for most purposes |
122
+ | [wizardcoder-python-34b-v1.0.Q3_K_S.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q3_K_S.gguf) | Q3_K_S | 3 | 14.61 GB| 17.11 GB | very small, high quality loss |
123
+ | [wizardcoder-python-34b-v1.0.Q3_K_M.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q3_K_M.gguf) | Q3_K_M | 3 | 16.28 GB| 18.78 GB | very small, high quality loss |
124
+ | [wizardcoder-python-34b-v1.0.Q3_K_L.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q3_K_L.gguf) | Q3_K_L | 3 | 17.77 GB| 20.27 GB | small, substantial quality loss |
125
+ | [wizardcoder-python-34b-v1.0.Q4_0.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q4_0.gguf) | Q4_0 | 4 | 19.05 GB| 21.55 GB | legacy; small, very high quality loss - prefer using Q3_K_M |
126
+ | [wizardcoder-python-34b-v1.0.Q4_K_S.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q4_K_S.gguf) | Q4_K_S | 4 | 19.15 GB| 21.65 GB | small, greater quality loss |
127
+ | [wizardcoder-python-34b-v1.0.Q4_K_M.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q4_K_M.gguf) | Q4_K_M | 4 | 20.22 GB| 22.72 GB | medium, balanced quality - recommended |
128
+ | [wizardcoder-python-34b-v1.0.Q5_0.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_0.gguf) | Q5_0 | 5 | 23.24 GB| 25.74 GB | legacy; medium, balanced quality - prefer using Q4_K_M |
129
+ | [wizardcoder-python-34b-v1.0.Q5_K_S.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_K_S.gguf) | Q5_K_S | 5 | 23.24 GB| 25.74 GB | large, low quality loss - recommended |
130
+ | [wizardcoder-python-34b-v1.0.Q5_K_M.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q5_K_M.gguf) | Q5_K_M | 5 | 23.84 GB| 26.34 GB | large, very low quality loss - recommended |
131
+ | [wizardcoder-python-34b-v1.0.Q6_K.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q6_K.gguf) | Q6_K | 6 | 27.68 GB| 30.18 GB | very large, extremely low quality loss |
132
  | [wizardcoder-python-34b-v1.0.Q8_0.gguf](https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-GGUF/blob/main/wizardcoder-python-34b-v1.0.Q8_0.gguf) | Q8_0 | 8 | 35.86 GB| 38.36 GB | very large, extremely low quality loss - not recommended |
133
 
134
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
135
+
136
+
137
+
138
  <!-- README_GGUF.md-provided-files end -->
139
 
140
  <!-- README_GGUF.md-how-to-run start -->
141
+ ## Example `llama.cpp` command
142
 
143
  Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9](https://github.com/ggerganov/llama.cpp/commit/6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9) or later.
144
 
145
+ For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
146
 
147
  ```
148
+ ./main -t 10 -ngl 32 -m wizardcoder-python-34b-v1.0.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWrite a story about llamas\n\n### Response:"
149
  ```
150
+ Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
151
 
152
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
153
 
 
160
  ## How to run in `text-generation-webui`
161
 
162
  Further instructions here: [text-generation-webui/docs/llama.cpp.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp.md).
163
+
164
+ ## How to run from Python code
165
+
166
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
167
+
168
+ ### How to load this model from Python using ctransformers
169
+
170
+ #### First install the package
171
+
172
+ ```bash
173
+ # Base ctransformers with no GPU acceleration
174
+ pip install ctransformers>=0.2.24
175
+ # Or with CUDA GPU acceleration
176
+ pip install ctransformers[cuda]>=0.2.24
177
+ # Or with ROCm GPU acceleration
178
+ CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
179
+ # Or with Metal GPU acceleration for macOS systems
180
+ CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
181
+ ```
182
+
183
+ #### Simple example code to load one of these GGUF models
184
+
185
+ ```python
186
+ from ctransformers import AutoModelForCausalLM
187
+
188
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
189
+ llm = AutoModelForCausalLM.from_pretrained("None", model_file="wizardcoder-python-34b-v1.0.q4_K_M.gguf", model_type="llama", gpu_layers=50)
190
+
191
+ print(llm("AI is going to"))
192
+ ```
193
+
194
+ ## How to use with LangChain
195
+
196
+ Here's guides on using llama-cpp-python or ctransformers with LangChain:
197
+
198
+ * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
199
+ * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
200
+
201
  <!-- README_GGUF.md-how-to-run end -->
202
 
203
  <!-- footer start -->
 
223
 
224
  **Special thanks to**: Aemon Algiz.
225
 
226
+ **Patreon special mentions**: Russ Johnson, J, alfie_i, Alex, NimbleBox.ai, Chadd, Mandus, Nikolai Manek, Ken Nordquist, ya boyyy, Illia Dulskyi, Viktor Bowallius, vamX, Iucharbius, zynix, Magnesian, Clay Pascal, Pierre Kircher, Enrico Ros, Tony Hughes, Elle, Andrey, knownsqashed, Deep Realms, Jerry Meng, Lone Striker, Derek Yates, Pyrater, Mesiah Bishop, James Bentley, Femi Adebogun, Brandon Frisco, SuperWojo, Alps Aficionado, Michael Dempsey, Vitor Caleffi, Will Dee, Edmond Seymore, usrbinkat, LangChain4j, Kacper WikieΕ‚, Luke Pendergrass, John Detwiler, theTransient, Nathan LeClaire, Tiffany J. Kim, biorpg, Eugene Pentland, Stanislav Ovsiannikov, Fred von Graf, terasurfer, Kalila, Dan Guido, Nitin Borwankar, 阿明, Ai Maven, John Villwock, Gabriel Puliatti, Stephen Murray, Asp the Wyvern, danny, Chris Smitley, ReadyPlayerEmma, S_X, Daniel P. Andersen, Olakabola, Jeffrey Morgan, Imad Khwaja, Caitlyn Gatomon, webtim, Alicia Loh, Trenton Dambrowitz, Swaroop Kallakuri, Erik BjΓ€reholt, Leonard Tan, Spiking Neurons AB, Luke @flexchar, Ajan Kanaga, Thomas Belote, Deo Leter, RoA, Willem Michiel, transmissions 11, subjectnull, Matthew Berman, Joseph William Delisle, David Ziegler, Michael Davis, Johann-Peter Hartmann, Talal Aujan, senxiiz, Artur Olbinski, Rainer Wilmers, Spencer Kim, Fen Risland, Cap'n Zoog, Rishabh Srivastava, Michael Levine, Geoffrey Montalvo, Sean Connelly, Alexandros Triantafyllidis, Pieter, Gabriel Tamborski, Sam, Subspace Studios, Junyu Yang, Pedro Madruga, Vadim, Cory Kujawski, K, Raven Klaugh, Randy H, Mano Prime, Sebastain Graf, Space Cruiser
227
 
228
 
229
  Thank you to all my generous patrons and donaters!
 
236
  # Original model card: WizardLM's WizardCoder Python 34B V1.0
237
 
238
 
239
+ <p align="center">
240
+ πŸ€— <a href="https://huggingface.co/WizardLM" target="_blank">HF Repo</a> β€’πŸ± <a href="https://github.com/nlpxucan/WizardLM" target="_blank">Github Repo</a> β€’ 🐦 <a href="https://twitter.com/WizardLM_AI" target="_blank">Twitter</a> β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2304.12244" target="_blank">[WizardLM]</a> β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> β€’ πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a> <br>
241
+ </p>
242
+ <p align="center">
243
+ πŸ‘‹ Join our <a href="https://discord.gg/VZjjHtWrKs" target="_blank">Discord</a>
244
+ </p>
245
+
246
  ## News
247
 
248
  - πŸ”₯πŸ”₯πŸ”₯[2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).
 
252
 
253
 
254
  | Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License |
255
+ | ----- |------| ---- |------|-------| ----- | ----- |
256
+ | WizardCoder-Python-34B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 73.2 | 61.2 | [Demo](http://47.103.63.15:50085/) | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a> |
257
  | WizardCoder-15B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-15B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 59.8 |50.6 | -- | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> |
258
+ | WizardCoder-Python-13B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 64.0 | 55.6 | -- | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama2</a> |
259
+ | WizardCoder-3B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-3B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 34.8 |37.4 | [Demo](http://47.103.63.15:50086/) | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> |
260
+ | WizardCoder-1B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardCoder-1B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2306.08568" target="_blank">[WizardCoder]</a> | 23.8 |28.6 | -- | <a href="https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement" target="_blank">OpenRAIL-M</a> |
261
 
262
 
263
  - Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
264
  - Our **WizardMath-70B-V1.0** model achieves **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM, and achieves **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM.
265
 
266
  <font size=4>
267
+
268
  | Model | Checkpoint | Paper | GSM8k | MATH |Online Demo| License|
269
  | ----- |------| ---- |------|-------| ----- | ----- |
270
  | WizardMath-70B-V1.0 | πŸ€— <a href="https://huggingface.co/WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a> | πŸ“ƒ <a href="https://arxiv.org/abs/2308.09583" target="_blank">[WizardMath]</a>| **81.6** | **22.7** |[Demo](http://47.103.63.15:50083/)| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> |
 
273
  </font>
274
 
275
 
276
+ - [08/09/2023] We released **WizardLM-70B-V1.0** model. Here is [Full Model Weight](https://huggingface.co/WizardLM/WizardLM-70B-V1.0).
277
 
278
  <font size=4>
279
+
280
+
281
  | <sup>Model</sup> | <sup>Checkpoint</sup> | <sup>Paper</sup> |<sup>MT-Bench</sup> | <sup>AlpacaEval</sup> | <sup>GSM8k</sup> | <sup>HumanEval</sup> | <sup>License</sup>|
282
+ | ----- |------| ---- |------|-------| ----- | ----- | ----- |
283
  | <sup>**WizardLM-70B-V1.0**</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-70B-V1.0" target="_blank">HF Link</a> </sup>|<sup>πŸ“ƒ**Coming Soon**</sup>| <sup>**7.78**</sup> | <sup>**92.91%**</sup> |<sup>**77.6%**</sup> | <sup> **50.6**</sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
284
  | <sup>WizardLM-13B-V1.2</sup> | <sup>πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.2" target="_blank">HF Link</a> </sup>| | <sup>7.06</sup> | <sup>89.17%</sup> |<sup>55.3%</sup> | <sup>36.6 </sup>|<sup> <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a></sup> |
285
  | <sup>WizardLM-13B-V1.1</sup> |<sup> πŸ€— <a href="https://huggingface.co/WizardLM/WizardLM-13B-V1.1" target="_blank">HF Link</a> </sup> | | <sup>6.76</sup> |<sup>86.32%</sup> | | <sup>25.0 </sup>| <sup>Non-commercial</sup>|
 
297
  <a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardCoder/imgs/compare_sota.png" alt="WizardCoder" style="width: 96%; min-width: 300px; display: block; margin: auto;"></a>
298
  </p>
299
 
300
+ ## Prompt Format
301
+ ```
302
+ "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
303
+ ```
304
+
305
+ ## Inference Demo Script
306
+
307
+ We provide the inference demo code [here](https://github.com/nlpxucan/WizardLM/tree/main/demo).
308
+
309
+ ## Citation
310
+
311
+ Please cite the repo if you use the data, method or code in this repo.
312
+
313
+ ```
314
+ @misc{luo2023wizardcoder,
315
+ title={WizardCoder: Empowering Code Large Language Models with Evol-Instruct},
316
+ author={Ziyang Luo and Can Xu and Pu Zhao and Qingfeng Sun and Xiubo Geng and Wenxiang Hu and Chongyang Tao and Jing Ma and Qingwei Lin and Daxin Jiang},
317
+ year={2023},
318
+ }
319
+ ```
320
+
321
  <!-- original-model-card end -->