TheBloke
/

falcon-40b-instruct-GGML

text-generation-inference

Model card Files Files and versions Community

TheBloke commited on Jun 19, 2023

Commit

bd282c1

•

1 Parent(s): b58cc00

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -54,9 +54,9 @@ Once compiled you can then use `bin/falcon_main` just like you would use llama.c
 bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon40b-instruct.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
 ```
-You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available can be used.
-Adjust `-t 8` according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
 `-b 1` reduces batch size to 1. This slightly lowers prompt evaluation time, but frees up VRAM to load more of the model on to your GPU.  If you find prompt evaluation too slow and have enough spare VRAM, you can remove this parameter.

 bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon40b-instruct.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
 ```
+You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
+Adjust `-t 8` (the number of CPU cores to use) according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
 `-b 1` reduces batch size to 1. This slightly lowers prompt evaluation time, but frees up VRAM to load more of the model on to your GPU.  If you find prompt evaluation too slow and have enough spare VRAM, you can remove this parameter.