BlinkDL commited on
Commit
c147460
1 Parent(s): 364e833

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -1,3 +1,34 @@
1
  ---
 
 
 
 
 
 
 
2
  license: bsd-2-clause
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - pytorch
6
+ - text-generation
7
+ - causal-lm
8
+ - rwkv
9
  license: bsd-2-clause
10
+ datasets:
11
+ - The Pile
12
+
13
  ---
14
+
15
+ # RWKV-4 1.5B
16
+
17
+ ## Model Description
18
+
19
+ RWKV-4 1.5B is a L24-D2048 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
20
+
21
+ ** Note: It's a BF16 model, and it may overflow if you are using FP16 (probably fixable by rescaling the weights). **
22
+
23
+ At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
24
+
25
+ ctx_len = 1024
26
+ n_layer = 24
27
+ n_embd = 2048
28
+
29
+ Preview checkpoint: RWKV-4-Pile-1B5-20220814-4526.pth : Trained on the Pile for 187B tokens.
30
+ * Pile loss 2.0635
31
+ * LAMBADA ppl 7.34, acc 55.64%
32
+ * PIQA acc 71.44%
33
+ * SC2016 acc 68.25%
34
+ * Hellaswag acc_norm 51.60%