TheBloke commited on
Commit
a5b5814
1 Parent(s): 73353ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -1,6 +1,13 @@
1
  ---
2
  inference: false
3
  license: other
 
 
 
 
 
 
 
4
  ---
5
 
6
  <!-- header start -->
@@ -34,6 +41,26 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
34
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/fin-llama-33B-GGML)
35
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bavest/fin-llama-33b-merged)
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  <!-- compatibility_ggml start -->
38
  ## Compatibility
39
 
@@ -80,7 +107,6 @@ Refer to the Provided Files table below to see what files use which methods, and
80
  | fin-llama-33b.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
81
  | fin-llama-33b.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
82
 
83
-
84
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
85
 
86
  ## How to run in `llama.cpp`
@@ -88,7 +114,7 @@ Refer to the Provided Files table below to see what files use which methods, and
88
  I use the following command line; adjust for your tastes and needs:
89
 
90
  ```
91
- ./main -t 10 -ngl 32 -m fin-llama-33b.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
92
  ```
93
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
94
 
 
1
  ---
2
  inference: false
3
  license: other
4
+ datasets:
5
+ - bavest/fin-llama-dataset
6
+ tags:
7
+ - finance
8
+ - llm
9
+ - llama
10
+ - trading
11
  ---
12
 
13
  <!-- header start -->
 
41
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/fin-llama-33B-GGML)
42
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bavest/fin-llama-33b-merged)
43
 
44
+ ## Prompt template
45
+
46
+ Standard Alpaca, meaning:
47
+
48
+ ```
49
+ A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's question.
50
+ ### Instruction: prompt
51
+
52
+ ### Response:
53
+ ```
54
+ or
55
+ ```
56
+ A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's question.
57
+ ### Instruction: prompt
58
+
59
+ ### Input:
60
+
61
+ ### Response:
62
+ ```
63
+
64
  <!-- compatibility_ggml start -->
65
  ## Compatibility
66
 
 
107
  | fin-llama-33b.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
108
  | fin-llama-33b.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
109
 
 
110
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
111
 
112
  ## How to run in `llama.cpp`
 
114
  I use the following command line; adjust for your tastes and needs:
115
 
116
  ```
117
+ ./main -t 10 -ngl 32 -m fin-llama-33b.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: I want you to act as an accountant and come up with creative ways to manage finances. You'll need to consider budgeting, investment strategies and risk management when creating a financial plan for your client. In some cases, you may also need to provide advice on taxation laws and regulations in order to help them maximize their profits. My first suggestion request is “Create a financial plan for a small business that focuses on cost savings and long-term investments".\n### Response:"
118
  ```
119
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
120