Doctor-Shotgun commited on
Commit
a65c7bb
1 Parent(s): 0f46524

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - togethercomputer/RedPajama-Data-1T-Sample
5
+ language:
6
+ - en
7
+ tags:
8
+ - llama
9
+ - llama 2
10
+ - smol_llama
11
+ ---
12
+ # smol_llama-220M-GQA-32k-theta
13
+
14
+ Experimental model meant to serve as a long-context speculative decoding model.
15
+
16
+ Created using [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).
17
+
18
+ This variant uses the rope theta (rope frequency base) method for context extension.
19
+
20
+ Wikitext Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
21
+ ```
22
+ Base Model
23
+ 2048: 20.2193
24
+ 4096: 102.6928
25
+ 8192: 235.5210
26
+ 16384: 390.7198
27
+ 32768: 515.8053
28
+
29
+ 32k - Linear Rope Scale 16.0
30
+ 2048: 25.7148
31
+ 4096: 23.4461
32
+ 8192: 22.3326
33
+ 16384: 21.6744
34
+ 32768: 21.4317
35
+
36
+ 32k - Rope Theta 1000000.0
37
+ 2048: 20.2158
38
+ 4096: 18.3868
39
+ 8192: 17.5976
40
+ 16384: 17.1462
41
+ 32768: 16.6989
42
+ ```