TheBloke
/

Wizard-Vicuna-30B-Superhot-8K-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jun 26, 2023

Commit

0ef6ee6

•

1 Parent(s): 4256aa6

Update README.md

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -31,11 +31,17 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
 **This is an experimental new GPTQ which offers up to 8K context size**
-The increased context is currently only tested to work with [ExLlama](https://github.com/turboderp/exllama), via the latest release of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 Please read carefully below to see how to use it.
-**NOTE**: Using the full 8K context will exceed 24GB VRAM.
 GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.

 **This is an experimental new GPTQ which offers up to 8K context size**
+The increased context is tested to work with [ExLlama](https://github.com/turboderp/exllama), via the latest release of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
+It has also been tested from Python code using AutoGPTQ, and `trust_remote_code=True`.
+Code credits:
+- Original concept and code for inreasing context length: [kaiokendev](https://huggingface.co/kaiokendev)
+- Updated Llama modelling code that includes this automatically via trust_remote_code: [emozilla](https://huggingface.co/emozilla).
 Please read carefully below to see how to use it.
+**NOTE**: Using the full 8K context on a 30B model will exceed 24GB VRAM.
 GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.