TheBloke commited on
Commit
65f47c0
1 Parent(s): 7af28d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -35,7 +35,7 @@ They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cm
35
  <!-- compatibility_ggml start -->
36
  ## Compatibility
37
 
38
- To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please follow the following steps:
39
 
40
  ```
41
  git clone https://github.com/cmp-nct/ggllm.cpp
@@ -44,12 +44,16 @@ git checkout cuda-integration
44
  rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
45
  ```
46
 
 
 
47
  You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
48
  ```
49
- bin/falcon_main -t 1 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
50
  ```
51
 
52
- As with llama.cpp, if you can fully offload the model to VRAM you should use `-t 1` for maximum performance. If not, use more threads, eg `-t 8`.
 
 
53
 
54
  <!-- compatibility_ggml end -->
55
 
 
35
  <!-- compatibility_ggml start -->
36
  ## Compatibility
37
 
38
+ To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please try the following steps:
39
 
40
  ```
41
  git clone https://github.com/cmp-nct/ggllm.cpp
 
44
  rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
45
  ```
46
 
47
+ Note that I have only tested compilation on Linux.
48
+
49
  You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
50
  ```
51
+ bin/falcon_main -t 8 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
52
  ```
53
 
54
+ Using `-ngl 100` will offload all layers to GPU. If you do not have enough VRAM for this, either lower the number or try a smaller quant size as otherwise performance will be severely affected.
55
+
56
+ Adjust `-t 8` according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
57
 
58
  <!-- compatibility_ggml end -->
59