TheBloke commited on
Commit
ea07ac9
1 Parent(s): f941f38

Initial GGML model commit

Browse files
Files changed (1) hide show
  1. README.md +6 -10
README.md CHANGED
@@ -56,15 +56,9 @@ ASSISTANT:
56
  <!-- compatibility_ggml start -->
57
  ## Compatibility
58
 
59
- ### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
60
 
61
- These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
62
-
63
- ### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
64
-
65
- These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
66
-
67
- They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
68
 
69
  ## Explanation of the new k-quant methods
70
  <details>
@@ -114,8 +108,12 @@ Change `-t 10` to the number of physical CPU cores you have. For example if your
114
 
115
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
116
 
 
 
117
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
118
 
 
 
119
  ## How to run in `text-generation-webui`
120
 
121
  Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
@@ -152,8 +150,6 @@ Thank you to all my generous patrons and donaters!
152
  # Original model card: lmsys's Vicuna 13B v1.5 16K
153
 
154
 
155
- **Note:** This is a preview version. A slightly better checkpoint will be uploaded soon.
156
-
157
  # Vicuna Model Card
158
 
159
  ## Model Details
 
56
  <!-- compatibility_ggml start -->
57
  ## Compatibility
58
 
59
+ These quantised GGML files are compatible with llama.cpp as of June 6th, commit `2d43387`.
60
 
61
+ They should also be compatible with all UIs, libraries and utilities which use GGML.
 
 
 
 
 
 
62
 
63
  ## Explanation of the new k-quant methods
64
  <details>
 
108
 
109
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
110
 
111
+ Change `-c 2048` to the desired sequence length for this model. For example, `-c 4096` for a Llama 2 model. For models that use RoPE, add `--rope-freq-base 10000 --rope-freq-scale 0.5` for doubled context, or `--rope-freq-base 10000 --rope-freq-scale 0.25` for 4x context.
112
+
113
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
114
 
115
+ For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
116
+
117
  ## How to run in `text-generation-webui`
118
 
119
  Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
 
150
  # Original model card: lmsys's Vicuna 13B v1.5 16K
151
 
152
 
 
 
153
  # Vicuna Model Card
154
 
155
  ## Model Details