TheBloke commited on
Commit
84cdc56
1 Parent(s): e5c21bc

Initial GGML model commit

Browse files
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -68,15 +68,9 @@ Vigogne strictly avoids discussing sensitive, offensive, illegal, ethical, or po
68
  <!-- compatibility_ggml start -->
69
  ## Compatibility
70
 
71
- ### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
72
 
73
- These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
74
-
75
- ### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
76
-
77
- These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
78
-
79
- They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
80
 
81
  ## Explanation of the new k-quant methods
82
  <details>
@@ -126,8 +120,12 @@ Change `-t 10` to the number of physical CPU cores you have. For example if your
126
 
127
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
128
 
 
 
129
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
130
 
 
 
131
  ## How to run in `text-generation-webui`
132
 
133
  Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
 
68
  <!-- compatibility_ggml start -->
69
  ## Compatibility
70
 
71
+ These quantised GGML files are compatible with llama.cpp as of June 6th, commit `2d43387`.
72
 
73
+ They should also be compatible with all UIs, libraries and utilities which use GGML.
 
 
 
 
 
 
74
 
75
  ## Explanation of the new k-quant methods
76
  <details>
 
120
 
121
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
122
 
123
+ Change `-c 2048` to the desired sequence length for this model. For example, `-c 4096` for a Llama 2 model. For models that use RoPE, add `--rope-freq-base 10000 --rope-freq-scale 0.5` for doubled context, or `--rope-freq-base 10000 --rope-freq-scale 0.25` for 4x context.
124
+
125
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
126
 
127
+ For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
128
+
129
  ## How to run in `text-generation-webui`
130
 
131
  Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).