BlinkDL
/

rwkv-4-pile-430m

Text Generation

Model card Files Files and versions Community

BlinkDL commited on Mar 13, 2023

Commit

a4cf5a5

•

1 Parent(s): 99dc399

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -14,6 +14,10 @@ datasets:
 # RWKV-4 430M
 ## Model Description
 RWKV-4 430M is a L24-D1024 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
@@ -32,8 +36,6 @@ RWKV-4-Pile-430M-20220808-8066.pth : Trained on the Pile for 333B tokens.
 * SC2016 acc 63.87%
 * Hellaswag acc_norm 40.90%
-## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.
 With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
 RWKV-4a-Pile-433M-20221223-8039.pth
 * Pile loss 2.2394

 # RWKV-4 430M
+# WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
+# WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
+# WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
 ## Model Description
 RWKV-4 430M is a L24-D1024 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
 * SC2016 acc 63.87%
 * Hellaswag acc_norm 40.90%
 With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
 RWKV-4a-Pile-433M-20221223-8039.pth
 * Pile loss 2.2394