flashvenom
/

Airoboros-13B-SuperHOT-8K-4bit-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

flashvenom commited on Jun 22, 2023

Commit

6826deb

•

1 Parent(s): dd53423

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
 Model upload in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
 You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 Patch file present in repo or can be accessed here: https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test/raw/main/llama_rope_scaled_monkey_patch.py

 Model upload in 4-bit GPTQ version, converted using GPTQ-for-LLaMa; Source model from https://huggingface.co/Peeepy/Airoboros-13b-SuperHOT-8k.
 You will need a monkey-patch at inference to use the 8k context, please see patch file present, if you are using a different inference engine (like llama.cpp / exllama) you will need to add the monkey patch there.
 Patch file present in repo or can be accessed here: https://huggingface.co/kaiokendev/superhot-13b-8k-no-rlhf-test/raw/main/llama_rope_scaled_monkey_patch.py