TheBloke
/

WizardLM-Uncensored-Falcon-40B-GGML

Model card Files Files and versions Community

TheBloke commited on Jun 18, 2023

Commit

65f47c0

•

1 Parent(s): 7af28d0

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cm
 <!-- compatibility_ggml start -->
 ## Compatibility
-To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please follow the following steps:
 ```
 git clone https://github.com/cmp-nct/ggllm.cpp
@@ -44,12 +44,16 @@ git checkout cuda-integration
 rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
 ```
 You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
 ```
-bin/falcon_main -t 1 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
 ```
-As with llama.cpp, if you can fully offload the model to VRAM you should use `-t 1` for maximum performance.  If not, use more threads, eg `-t 8`.
 <!-- compatibility_ggml end -->

 <!-- compatibility_ggml start -->
 ## Compatibility
+To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please try the following steps:
 ```
 git clone https://github.com/cmp-nct/ggllm.cpp
 rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
 ```
+Note that I have only tested compilation on Linux.
 You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
 ```
+bin/falcon_main -t 8 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
 ```
+Using `-ngl 100` will offload all layers to GPU. If you do not have enough VRAM for this, either lower the number or try a smaller quant size as otherwise performance will be severely affected.
+Adjust `-t 8` according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
 <!-- compatibility_ggml end -->