TheBloke commited on
Commit
dd948ed
·
1 Parent(s): bf5a5d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -21,9 +21,10 @@ license: other
21
 
22
  These files are **experimental** GGML format model files for [Eric Hartford's WizardLM Uncensored Falcon 40B](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b).
23
 
24
- These GGML files will **not** work in llama.cpp, and at the time of writing they will not work with any UI or library. They cannot be used from Python code.
25
-
26
- They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cmp-nc/ggllm.cpp](https://github.com/cmp-nct/ggllm.cpp)
 
27
 
28
  ## Repositories available
29
 
@@ -31,11 +32,15 @@ They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cm
31
  * [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-3bit-GPTQ).
32
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML)
33
  * [Eric's unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b)
34
-
35
  <!-- compatibility_ggml start -->
36
  ## Compatibility
37
 
38
- To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please try the following steps:
 
 
 
 
39
 
40
  ```
41
  git clone https://github.com/cmp-nct/ggllm.cpp
@@ -47,7 +52,7 @@ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VS
47
 
48
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
49
  ```
50
- bin/falcon_main -t 8 -ngl 100 -b 1 -m wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
51
  ```
52
 
53
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
 
21
 
22
  These files are **experimental** GGML format model files for [Eric Hartford's WizardLM Uncensored Falcon 40B](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b).
23
 
24
+ They can be used from:
25
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui).
26
+ * The ctransformers Python library, which includes LangChain support: [ctransformers](https://github.com/marella/ctransformers).
27
+ * A new fork of llama.cpp that introduced this new Falcon GGML support: [cmp-nc/ggllm.cpp](https://github.com/cmp-nct/ggllm.cpp).
28
 
29
  ## Repositories available
30
 
 
32
  * [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-3bit-GPTQ).
33
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML)
34
  * [Eric's unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b)
35
+
36
  <!-- compatibility_ggml start -->
37
  ## Compatibility
38
 
39
+ The recommended UI for these GGMLs is [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui). Preliminary CUDA GPU acceleration is provided.
40
+
41
+ For use from Python code, use [ctransformers](https://github.com/marella/ctransformers). Again, with preliminary CUDA GPU acceleration
42
+
43
+ Or to build cmp-nct's fork of llama.cpp with Falcon 7B support plus preliminary CUDA acceleration, please try the following steps:
44
 
45
  ```
46
  git clone https://github.com/cmp-nct/ggllm.cpp
 
52
 
53
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
54
  ```
55
+ bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon7b-instruct.ggmlv3.q4_0.bin -p "What is a falcon?\n### Response:"
56
  ```
57
 
58
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.