BlinkDL commited on
Commit
6fe5107
1 Parent(s): 78d56e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  - rwkv
9
  license: apache-2.0
10
  datasets:
11
- - The Pile
12
 
13
  ---
14
 
@@ -22,9 +22,12 @@ RWKV-4 3B is a L32-D2560 causal language model trained on the Pile. See https://
22
 
23
  At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
24
 
25
- ctx_len = 1024
26
- n_layer = 32
27
- n_embd = 2560
 
 
 
28
 
29
  Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
30
  * Pile loss 1.9469
@@ -32,3 +35,4 @@ Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331
32
  * PIQA acc 73.72%
33
  * SC2016 acc 70.28%
34
  * Hellaswag acc_norm 59.63%
 
 
8
  - rwkv
9
  license: apache-2.0
10
  datasets:
11
+ - the_pile
12
 
13
  ---
14
 
 
22
 
23
  At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
24
 
25
+ New checkpoint: RWKV-4-Pile-3B-20221110-ctx4096.pth : Fine-tuned to ctx_len = 4096
26
+ * LAMBADA ppl 5.25, acc 63.96%
27
+ * PIQA acc 74.16%
28
+ * SC2016 acc 70.71%
29
+ * Hellaswag acc_norm 59.89%
30
+ ctx_len = 4096 n_layer = 32 n_embd = 2560
31
 
32
  Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
33
  * Pile loss 1.9469
 
35
  * PIQA acc 73.72%
36
  * SC2016 acc 70.28%
37
  * Hellaswag acc_norm 59.63%
38
+ ctx_len = 1024 n_layer = 32 n_embd = 2560