bhenrym14
/

airoboros-l2-13b-2.1-YaRN-64k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Sep 2, 2023

Commit

41e0890

·

1 Parent(s): 2631d5b

Update README.md

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -27,18 +27,18 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
 ## Motivation
-[Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. Since I am unaware of any existing instruction-tuned models which employ YaRN, I finetuned using Jon Durbin's latest airoboros dataset.
 ## Relative Performance (wikitext perplexity)
-| Context (tokens)  | **bhenrym14/airoboros-l2-13b-PI-16k-fp16** | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16  | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
-| --- | --- | ---| ----- | -----| ------| --- |
-| 512 | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
-| 1024 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85**  |
-| 2048 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
-| 4096 | 4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
-| 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
-| 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
 ## Prompting:

 ## Motivation
+[Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
 ## Relative Performance (wikitext perplexity)
+| Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16  | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
+| --- | --- |--- | ---| ----- | -----| ------| --- |
+| 512 | | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
+| 1024 | | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85**  |
+| 2048 | | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
+| 4096 | |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
+| 8192 | |**4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
+| 12000 | |**4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
 ## Prompting: