bhenrym14 commited on
Commit
2631d5b
·
1 Parent(s): 66195ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -7
README.md CHANGED
@@ -1,11 +1,8 @@
1
  ---
2
  datasets:
3
  - jondurbin/airoboros-2.1
4
- - kmfoda/booksum
5
  ---
6
 
7
-
8
-
9
  # Extended Context (via YaRN) Llama-2-13b with airoboros-2.1 (fp16)
10
 
11
 
@@ -30,7 +27,7 @@ Ooba use: Be sure to increase the `Truncate the prompt up to this length` parame
30
 
31
  ## Motivation
32
 
33
- Y
34
 
35
  ## Relative Performance (wikitext perplexity)
36
 
@@ -43,9 +40,6 @@ Y
43
  | 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
44
  | 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
45
 
46
- - Larger PI scaling factors increase short context performance degradation. If you don't require 16k context, you're better off using a model with a different context extension method, or a smaller (or no) PI scaling factor. Given this, don't expect anything special from this model on the HF leaderboard. Whether or not this is relevant to you will depend on your intended use case.
47
- - Beyond 8k, this model has lower perplexity than all other models tested here.
48
- - I'm actively exploring/implementing other context extension methods that may ameliorate the tendency of PI methods to impair the ability of the model to attend to the context space equally.
49
 
50
  ## Prompting:
51
 
 
1
  ---
2
  datasets:
3
  - jondurbin/airoboros-2.1
 
4
  ---
5
 
 
 
6
  # Extended Context (via YaRN) Llama-2-13b with airoboros-2.1 (fp16)
7
 
8
 
 
27
 
28
  ## Motivation
29
 
30
+ [Yet another RoPE extensioN method (YARN)](https://github.com/jquesnelle/yarn/blob/master/paper/yarn.pdf) is a novel method of extending the useful context of pretrained LLMs, with architectures employing RoPE, with minimal additonal training requirements. This method is the consequence of efforts to mitigate the shortcomings of other methods such as Position Interpolation (PI) and NTK-Aware scaling. Since I am unaware of any existing instruction-tuned models which employ YaRN, I finetuned using Jon Durbin's latest airoboros dataset.
31
 
32
  ## Relative Performance (wikitext perplexity)
33
 
 
40
  | 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
41
  | 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
42
 
 
 
 
43
 
44
  ## Prompting:
45