Update README.md
Browse files
README.md
CHANGED
@@ -31,11 +31,17 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
|
|
31 |
|
32 |
**This is an experimental new GPTQ which offers up to 8K context size**
|
33 |
|
34 |
-
The increased context is
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
Please read carefully below to see how to use it.
|
37 |
|
38 |
-
**NOTE**: Using the full 8K context will exceed 24GB VRAM.
|
39 |
|
40 |
GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
|
41 |
|
|
|
31 |
|
32 |
**This is an experimental new GPTQ which offers up to 8K context size**
|
33 |
|
34 |
+
The increased context is tested to work with [ExLlama](https://github.com/turboderp/exllama), via the latest release of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
35 |
+
|
36 |
+
It has also been tested from Python code using AutoGPTQ, and `trust_remote_code=True`.
|
37 |
+
|
38 |
+
Code credits:
|
39 |
+
- Original concept and code for inreasing context length: [kaiokendev](https://huggingface.co/kaiokendev)
|
40 |
+
- Updated Llama modelling code that includes this automatically via trust_remote_code: [emozilla](https://huggingface.co/emozilla).
|
41 |
|
42 |
Please read carefully below to see how to use it.
|
43 |
|
44 |
+
**NOTE**: Using the full 8K context on a 30B model will exceed 24GB VRAM.
|
45 |
|
46 |
GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
|
47 |
|