Transformers
falcon
sft
TheBloke commited on
Commit
9e90084
1 Parent(s): 19c1f3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -40,7 +40,13 @@ Currently these files will also not work with code that previously supported Fal
40
 
41
  * [2, 3, 4, 5, 6, 8-bit GGCT models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-sft-top1-560-GGML)
42
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/falcon-40b-sft-top1-560)
43
-
 
 
 
 
 
 
44
  <!-- compatibility_ggml start -->
45
  ## Compatibility
46
 
@@ -56,7 +62,7 @@ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VS
56
 
57
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
58
  ```
59
- bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon7b-instruct.ggmlv3.q4_0.bin -p "What is a falcon?\n### Response:"
60
  ```
61
 
62
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
 
40
 
41
  * [2, 3, 4, 5, 6, 8-bit GGCT models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-sft-top1-560-GGML)
42
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/OpenAssistant/falcon-40b-sft-top1-560)
43
+
44
+ ## Prompt template
45
+
46
+ ```
47
+ <|prompter|>prompt<|endoftext|><|assistant|>
48
+ ```
49
+
50
  <!-- compatibility_ggml start -->
51
  ## Compatibility
52
 
 
62
 
63
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
64
  ```
65
+ bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon-40b-top1-560.ggccv1.q4_K.bin -p "<|prompter|>write a story about llamas<|endoftext|><|assistant|>"
66
  ```
67
 
68
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.