bhenrym14
/

airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Jul 14, 2023

Commit

779e1de

·

1 Parent(s): 0346c28

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -5,6 +5,8 @@ datasets:
 Mostly untested!
 # RoPE Scaled QLoRA Fine-tune of Llama-33b on airoboros-gpt4-1.4.1 (fp16)
 ## Overview
@@ -20,7 +22,9 @@ Pretraining took 10 hours. Finetuning took ~41 hours on 1x RTX 6000 Ada.
 ## How to Use
-REQUIRED: you'll need to patch in the appropriate RoPE scaling module. see: [replace_llama_rope_with_scaled_rope](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_rope_scaled_monkey_patch-16k.py). You will need to call `replace_llama_rope_with_scaled_rope` in ooba somewhere. Calling this at the top of the training module after the imports works for me.
 ## Motivation
 Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. My prior experiments have found the following:

 Mostly untested!
+Find GPTQ quantized weights here: https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-GPTQ
 # RoPE Scaled QLoRA Fine-tune of Llama-33b on airoboros-gpt4-1.4.1 (fp16)
 ## Overview
 ## How to Use
+The easiest way is to use the GPTQ weights (linked above) with [oobabooga text-generation-webui](https://github.com/oobabooga/text-generation-webui) and ExLlama. You'll need to set max_seq_len to 8192 and compress_pos_emb to 4. Otherwise use the transformers module.
+**IMPORTANT: To use these weights you'll need to patch in the appropriate RoPE scaling module. see: [replace_llama_rope_with_scaled_rope](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_rope_scaled_monkey_patch-16k.py)**
 ## Motivation
 Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. My prior experiments have found the following: