Update README.md
Browse files
README.md
CHANGED
|
@@ -21,9 +21,10 @@ license: other
|
|
| 21 |
|
| 22 |
These files are **experimental** GGML format model files for [Eric Hartford's WizardLM Uncensored Falcon 40B](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b).
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
| 27 |
|
| 28 |
## Repositories available
|
| 29 |
|
|
@@ -31,11 +32,15 @@ They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cm
|
|
| 31 |
* [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-3bit-GPTQ).
|
| 32 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML)
|
| 33 |
* [Eric's unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b)
|
| 34 |
-
|
| 35 |
<!-- compatibility_ggml start -->
|
| 36 |
## Compatibility
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
```
|
| 41 |
git clone https://github.com/cmp-nct/ggllm.cpp
|
|
@@ -47,7 +52,7 @@ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VS
|
|
| 47 |
|
| 48 |
Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
|
| 49 |
```
|
| 50 |
-
bin/falcon_main -t 8 -ngl 100 -b 1 -m
|
| 51 |
```
|
| 52 |
|
| 53 |
You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
|
|
|
|
| 21 |
|
| 22 |
These files are **experimental** GGML format model files for [Eric Hartford's WizardLM Uncensored Falcon 40B](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b).
|
| 23 |
|
| 24 |
+
They can be used from:
|
| 25 |
+
* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui).
|
| 26 |
+
* The ctransformers Python library, which includes LangChain support: [ctransformers](https://github.com/marella/ctransformers).
|
| 27 |
+
* A new fork of llama.cpp that introduced this new Falcon GGML support: [cmp-nc/ggllm.cpp](https://github.com/cmp-nct/ggllm.cpp).
|
| 28 |
|
| 29 |
## Repositories available
|
| 30 |
|
|
|
|
| 32 |
* [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-3bit-GPTQ).
|
| 33 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML)
|
| 34 |
* [Eric's unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b)
|
| 35 |
+
|
| 36 |
<!-- compatibility_ggml start -->
|
| 37 |
## Compatibility
|
| 38 |
|
| 39 |
+
The recommended UI for these GGMLs is [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui). Preliminary CUDA GPU acceleration is provided.
|
| 40 |
+
|
| 41 |
+
For use from Python code, use [ctransformers](https://github.com/marella/ctransformers). Again, with preliminary CUDA GPU acceleration
|
| 42 |
+
|
| 43 |
+
Or to build cmp-nct's fork of llama.cpp with Falcon 7B support plus preliminary CUDA acceleration, please try the following steps:
|
| 44 |
|
| 45 |
```
|
| 46 |
git clone https://github.com/cmp-nct/ggllm.cpp
|
|
|
|
| 52 |
|
| 53 |
Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
|
| 54 |
```
|
| 55 |
+
bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon7b-instruct.ggmlv3.q4_0.bin -p "What is a falcon?\n### Response:"
|
| 56 |
```
|
| 57 |
|
| 58 |
You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
|