BlinkDL
/

rwkv-4-pile-3b

Text Generation

Model card Files Files and versions Community

BlinkDL commited on Nov 13, 2022

Commit

6fe5107

•

1 Parent(s): 78d56e5

Update README.md

Files changed (1) hide show

README.md +8 -4

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ tags:
 - rwkv
 license: apache-2.0
 datasets:
-- The Pile
 ---
@@ -22,9 +22,12 @@ RWKV-4 3B is a L32-D2560 causal language model trained on the Pile. See https://
 At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
-ctx_len = 1024
-n_layer = 32
-n_embd = 2560
 Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
 * Pile loss 1.9469
@@ -32,3 +35,4 @@ Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331
 * PIQA acc 73.72%
 * SC2016 acc 70.28%
 * Hellaswag acc_norm 59.63%

 - rwkv
 license: apache-2.0
 datasets:
+- the_pile
 ---
 At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-LM) to run it.
+New checkpoint: RWKV-4-Pile-3B-20221110-ctx4096.pth : Fine-tuned to ctx_len = 4096
+* LAMBADA ppl 5.25, acc 63.96%
+* PIQA acc 74.16%
+* SC2016 acc 70.71%
+* Hellaswag acc_norm 59.89%
+ctx_len = 4096 n_layer = 32 n_embd = 2560
 Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
 * Pile loss 1.9469
 * PIQA acc 73.72%
 * SC2016 acc 70.28%
 * Hellaswag acc_norm 59.63%
+ctx_len = 1024 n_layer = 32 n_embd = 2560