Initial GGML model commit
Browse files
README.md
CHANGED
@@ -68,15 +68,9 @@ Vigogne strictly avoids discussing sensitive, offensive, illegal, ethical, or po
|
|
68 |
<!-- compatibility_ggml start -->
|
69 |
## Compatibility
|
70 |
|
71 |
-
|
72 |
|
73 |
-
|
74 |
-
|
75 |
-
### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
|
76 |
-
|
77 |
-
These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
|
78 |
-
|
79 |
-
They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
|
80 |
|
81 |
## Explanation of the new k-quant methods
|
82 |
<details>
|
@@ -126,8 +120,12 @@ Change `-t 10` to the number of physical CPU cores you have. For example if your
|
|
126 |
|
127 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
128 |
|
|
|
|
|
129 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
130 |
|
|
|
|
|
131 |
## How to run in `text-generation-webui`
|
132 |
|
133 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
|
|
68 |
<!-- compatibility_ggml start -->
|
69 |
## Compatibility
|
70 |
|
71 |
+
These quantised GGML files are compatible with llama.cpp as of June 6th, commit `2d43387`.
|
72 |
|
73 |
+
They should also be compatible with all UIs, libraries and utilities which use GGML.
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
## Explanation of the new k-quant methods
|
76 |
<details>
|
|
|
120 |
|
121 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
122 |
|
123 |
+
Change `-c 2048` to the desired sequence length for this model. For example, `-c 4096` for a Llama 2 model. For models that use RoPE, add `--rope-freq-base 10000 --rope-freq-scale 0.5` for doubled context, or `--rope-freq-base 10000 --rope-freq-scale 0.25` for 4x context.
|
124 |
+
|
125 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
126 |
|
127 |
+
For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
|
128 |
+
|
129 |
## How to run in `text-generation-webui`
|
130 |
|
131 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|