Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cm
|
|
35 |
<!-- compatibility_ggml start -->
|
36 |
## Compatibility
|
37 |
|
38 |
-
To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please
|
39 |
|
40 |
```
|
41 |
git clone https://github.com/cmp-nct/ggllm.cpp
|
@@ -44,12 +44,16 @@ git checkout cuda-integration
|
|
44 |
rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
|
45 |
```
|
46 |
|
|
|
|
|
47 |
You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
|
48 |
```
|
49 |
-
bin/falcon_main -t
|
50 |
```
|
51 |
|
52 |
-
|
|
|
|
|
53 |
|
54 |
<!-- compatibility_ggml end -->
|
55 |
|
|
|
35 |
<!-- compatibility_ggml start -->
|
36 |
## Compatibility
|
37 |
|
38 |
+
To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please try the following steps:
|
39 |
|
40 |
```
|
41 |
git clone https://github.com/cmp-nct/ggllm.cpp
|
|
|
44 |
rm -rf build && mkdir build && cd build && cmake -DGGML_CUBLAS=1 .. && cmake --build . --config Release
|
45 |
```
|
46 |
|
47 |
+
Note that I have only tested compilation on Linux.
|
48 |
+
|
49 |
You can then use `bin/falcon_main` just like you would use llama.cpp. For example:
|
50 |
```
|
51 |
+
bin/falcon_main -t 8 -ngl 100 -m /workspace/wizard-falcon40b.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
|
52 |
```
|
53 |
|
54 |
+
Using `-ngl 100` will offload all layers to GPU. If you do not have enough VRAM for this, either lower the number or try a smaller quant size as otherwise performance will be severely affected.
|
55 |
+
|
56 |
+
Adjust `-t 8` according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
|
57 |
|
58 |
<!-- compatibility_ggml end -->
|
59 |
|