Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ fp16 weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pn
|
|
15 |
## Overview
|
16 |
|
17 |
This is a finetune of Llama-2-13b, intended to extend the useful context window to 16384 tokens. There are two training phases:
|
18 |
-
1. It is first trained on a long-context (7000-8192 tokens) subset of [dolphin](https://huggingface.co/datasets/ehartford/dolphin), an orca-like dataset (GPT4 split only). This amounts to roughly 110mm tokens
|
19 |
2. The model was then finetuned on [Jon Durbin's Airoboros GPT4 1.4.1](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-1.4.1) for 3 epochs. This took ~17 hours.
|
20 |
|
21 |
**This is a QLoRA fine-tune**.
|
@@ -24,11 +24,10 @@ All training was performed with 1x RTX 6000 Ada.
|
|
24 |
|
25 |
## How to Use
|
26 |
|
27 |
-
This model employs [Partial NTK Rope Scaling](https://github.com/jquesnelle/scaled-rope/pull/1). This methodology is not yet
|
28 |
1. Transformers (use bnb for quantization). Use [fp16 weights](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16).
|
29 |
2. Autogptq/GPTQ-for-Llama. Use these quantized weights.
|
30 |
|
31 |
-
**Note: Due to an erronious `max_position_embeddings` figure in the base model config file, the RoPE scaling factor was computed with `original_max_position_embeddings=2048` (llama-2 should be 4096). This resulted in a scaling factor of 8 instead of 4, despite passing a new `max_position_embeddings=16384`. This could have a negative to neutral performance impact. I intend on retraining this model with the proper scaling factor. If and when I do so, I will replace the weights in this repo and make note of this change at the top of this model card.**
|
32 |
|
33 |
## Motivation
|
34 |
Methods of extending the useful context window of LLM's have gained significant traction. Several methods requiring little to no finetuning/retraining have emerged. Among these is linear position interpolation [kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) and [NTK aware scaling](https://github.com/jquesnelle/scaled-rope). My prior experiments demonstrate significant performance improvements both from finetuning with these scaling adjustments implemented **and** with longer sequences.
|
|
|
15 |
## Overview
|
16 |
|
17 |
This is a finetune of Llama-2-13b, intended to extend the useful context window to 16384 tokens. There are two training phases:
|
18 |
+
1. It is first trained on a long-context (7000-8192 tokens) subset of [dolphin](https://huggingface.co/datasets/ehartford/dolphin), an orca-like dataset (GPT4 split only). This amounts to roughly 110mm tokens. Airoboros-like training prompt was used, with partial NTK scaling applied. This took ~20 hours.
|
19 |
2. The model was then finetuned on [Jon Durbin's Airoboros GPT4 1.4.1](https://huggingface.co/datasets/jondurbin/airoboros-gpt4-1.4.1) for 3 epochs. This took ~17 hours.
|
20 |
|
21 |
**This is a QLoRA fine-tune**.
|
|
|
24 |
|
25 |
## How to Use
|
26 |
|
27 |
+
This model employs [Partial NTK Rope Scaling](https://github.com/jquesnelle/scaled-rope/pull/1). This methodology is not yet implemented natively in Transformers or Exllama (as of 7/21). There are two options to run this, each of which will require replacing the `LlamaEmbedding` with `LlamaPartNTKScaledRotaryEmbedding`, with `max_position_embeddings=16384` and `original_max_position_embeddings=4096`. A monkeypatch can be found here:
|
28 |
1. Transformers (use bnb for quantization). Use [fp16 weights](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16).
|
29 |
2. Autogptq/GPTQ-for-Llama. Use these quantized weights.
|
30 |
|
|
|
31 |
|
32 |
## Motivation
|
33 |
Methods of extending the useful context window of LLM's have gained significant traction. Several methods requiring little to no finetuning/retraining have emerged. Among these is linear position interpolation [kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) and [NTK aware scaling](https://github.com/jquesnelle/scaled-rope). My prior experiments demonstrate significant performance improvements both from finetuning with these scaling adjustments implemented **and** with longer sequences.
|