BlinkDL commited on
Commit
a4cf5a5
1 Parent(s): 99dc399

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -14,6 +14,10 @@ datasets:
14
 
15
  # RWKV-4 430M
16
 
 
 
 
 
17
  ## Model Description
18
 
19
  RWKV-4 430M is a L24-D1024 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
@@ -32,8 +36,6 @@ RWKV-4-Pile-430M-20220808-8066.pth : Trained on the Pile for 333B tokens.
32
  * SC2016 acc 63.87%
33
  * Hellaswag acc_norm 40.90%
34
 
35
- ## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.
36
-
37
  With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
38
  RWKV-4a-Pile-433M-20221223-8039.pth
39
  * Pile loss 2.2394
 
14
 
15
  # RWKV-4 430M
16
 
17
+ # WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
18
+ # WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
19
+ # WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
20
+
21
  ## Model Description
22
 
23
  RWKV-4 430M is a L24-D1024 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
 
36
  * SC2016 acc 63.87%
37
  * Hellaswag acc_norm 40.90%
38
 
 
 
39
  With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
40
  RWKV-4a-Pile-433M-20221223-8039.pth
41
  * Pile loss 2.2394