Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,8 @@ REQUIRED: you'll need to patch in the appropriate RoPE scaling module. see: [rep
|
|
25 |
|
26 |
Hopefully there is a quick fix to exllama that can make >8k work soon.
|
27 |
|
|
|
|
|
28 |
## Motivation
|
29 |
Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. Finetuning has shown to be necessary to properly leverage the longer context. Here I attempt to take a smaller model and extend the context to 16k tokens. This, however, proved problematic as stability suffered in the 8-10k+ range. The Meta paper demonstrated that decreasing perplexities can still be acheived at these context lengths; however, their approach involved tuning all variables on the maximum sequence length after incorporating the RoPE scaling adjustment.
|
30 |
|
|
|
25 |
|
26 |
Hopefully there is a quick fix to exllama that can make >8k work soon.
|
27 |
|
28 |
+
Otherwise for context <8k. Use exllama. Set `max_seq_len` to 16384, and `compress_pos_emb` to 8.
|
29 |
+
|
30 |
## Motivation
|
31 |
Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. Finetuning has shown to be necessary to properly leverage the longer context. Here I attempt to take a smaller model and extend the context to 16k tokens. This, however, proved problematic as stability suffered in the 8-10k+ range. The Meta paper demonstrated that decreasing perplexities can still be acheived at these context lengths; however, their approach involved tuning all variables on the maximum sequence length after incorporating the RoPE scaling adjustment.
|
32 |
|