Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,10 @@ datasets:
|
|
14 |
|
15 |
# RWKV-4 430M
|
16 |
|
|
|
|
|
|
|
|
|
17 |
## Model Description
|
18 |
|
19 |
RWKV-4 430M is a L24-D1024 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
|
@@ -32,8 +36,6 @@ RWKV-4-Pile-430M-20220808-8066.pth : Trained on the Pile for 333B tokens.
|
|
32 |
* SC2016 acc 63.87%
|
33 |
* Hellaswag acc_norm 40.90%
|
34 |
|
35 |
-
## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.
|
36 |
-
|
37 |
With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
|
38 |
RWKV-4a-Pile-433M-20221223-8039.pth
|
39 |
* Pile loss 2.2394
|
|
|
14 |
|
15 |
# RWKV-4 430M
|
16 |
|
17 |
+
# WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
|
18 |
+
# WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
|
19 |
+
# WARNING: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 models unless you know what you are doing.
|
20 |
+
|
21 |
## Model Description
|
22 |
|
23 |
RWKV-4 430M is a L24-D1024 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
|
|
|
36 |
* SC2016 acc 63.87%
|
37 |
* Hellaswag acc_norm 40.90%
|
38 |
|
|
|
|
|
39 |
With tiny attention (--tiny_att_dim 512 --tiny_att_layer 18):
|
40 |
RWKV-4a-Pile-433M-20221223-8039.pth
|
41 |
* Pile loss 2.2394
|