bhenrym14 commited on
Commit
41e0890
·
1 Parent(s): 2631d5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -27,18 +27,18 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
27
 
28
  ## Motivation
29
 
30
- [Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. Since I am unaware of any existing instruction-tuned models which employ YaRN, I finetuned using Jon Durbin's latest airoboros dataset.
31
 
32
  ## Relative Performance (wikitext perplexity)
33
 
34
- | Context (tokens) | **bhenrym14/airoboros-l2-13b-PI-16k-fp16** | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
35
- | --- | --- | ---| ----- | -----| ------| --- |
36
- | 512 | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
37
- | 1024 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
38
- | 2048 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
39
- | 4096 | 4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
40
- | 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
41
- | 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
42
 
43
 
44
  ## Prompting:
 
27
 
28
  ## Motivation
29
 
30
+ [Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. This model is an attempt to enable the community to assess the capabilities of this extension method in real world applications.
31
 
32
  ## Relative Performance (wikitext perplexity)
33
 
34
+ | Context (tokens) | **bhenrym14/airoboros-l2-13b-2.1-YaRN-64k** | bhenrym14/airoboros-l2-13b-PI-16k-fp16 | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
35
+ | --- | --- |--- | ---| ----- | -----| ------| --- |
36
+ | 512 | | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
37
+ | 1024 | | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
38
+ | 2048 | | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
39
+ | 4096 | |4.94 | 4.90 | 5.08 | 5.50 | 4.91 | **4.77** |
40
+ | 8192 | |**4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
41
+ | 12000 | |**4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
42
 
43
 
44
  ## Prompting: