Update README.md
Browse files
README.md
CHANGED
@@ -34,8 +34,8 @@ Given the excellent performance of llama-2 13b finetunes relative to llama 33b,
|
|
34 |
|
35 |
## Relative Performance (wikitext perplexity)
|
36 |
|
37 |
-
| Context (tokens) | bhenrym14/airoboros-l2-13b-PI-16k-fp16| bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
|
38 |
-
| --- | ---| ----- | -----| ------| --- |
|
39 |
| 512 | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
|
40 |
| 1024 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
|
41 |
| 2048 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
|
@@ -43,7 +43,7 @@ Given the excellent performance of llama-2 13b finetunes relative to llama 33b,
|
|
43 |
| 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
|
44 |
| 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
|
45 |
|
46 |
-
- Larger PI scaling factors increase short context performance degradation. If you don't require 16k context, you're better off using a model with a different context extension method, or a smaller (or no) PI scaling factor.
|
47 |
- Beyond 8k, this model has lower perplexity than all other models tested here.
|
48 |
- I'm actively exploring/implementing other context extension methods that may ameliorate the tendency of PI methods to impair the ability of the model to attend to the context space equally.
|
49 |
|
|
|
34 |
|
35 |
## Relative Performance (wikitext perplexity)
|
36 |
|
37 |
+
| Context (tokens) | **bhenrym14/airoboros-l2-13b-PI-16k-fp16** | bhenrym14/airophin-v2-13b-PI-8k-fp16 | bhenrym14/airophin-13b-pntk-16k-fp16| bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-fp16 |bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 |
|
38 |
+
| --- | --- | ---| ----- | -----| ------| --- |
|
39 |
| 512 | 7.67 | 7.38 | 7.62 | 8.24 | 7.90 | **7.23** |
|
40 |
| 1024 | 6.15 | 5.99 | 6.20 | 6.71 | 6.17 | **5.85** |
|
41 |
| 2048 | 5.29 | 5.22 | 5.38 | 5.87 | 5.23 | **5.07** |
|
|
|
43 |
| 8192 | **4.71** | **4.71** | 4.90 | 5.32 | Not Tested | 57.1 |
|
44 |
| 12000 | **4.54** | 55 | 4.82 | 56.1 | Not Tested | Not Tested |
|
45 |
|
46 |
+
- Larger PI scaling factors increase short context performance degradation. If you don't require 16k context, you're better off using a model with a different context extension method, or a smaller (or no) PI scaling factor. Given this, don't expect anything special on the HF leaderboard.
|
47 |
- Beyond 8k, this model has lower perplexity than all other models tested here.
|
48 |
- I'm actively exploring/implementing other context extension methods that may ameliorate the tendency of PI methods to impair the ability of the model to attend to the context space equally.
|
49 |
|