Update README.md
Browse files
README.md
CHANGED
@@ -20,14 +20,14 @@ RWKV-4 3B is a L32-D2560 causal language model trained on the Pile. See https://
|
|
20 |
|
21 |
Use https://github.com/BlinkDL/ChatRWKV to run it.
|
22 |
|
23 |
-
|
24 |
* LAMBADA ppl 5.25, acc 63.96%
|
25 |
* PIQA acc 74.16%
|
26 |
* SC2016 acc 70.71%
|
27 |
* Hellaswag acc_norm 59.89%
|
28 |
* ctx_len = 4096 n_layer = 32 n_embd = 2560
|
29 |
|
30 |
-
|
31 |
* Pile loss 1.9469
|
32 |
* LAMBADA ppl 5.24, acc 63.94%
|
33 |
* PIQA acc 73.72%
|
|
|
20 |
|
21 |
Use https://github.com/BlinkDL/ChatRWKV to run it.
|
22 |
|
23 |
+
Best checkpoint: RWKV-4-Pile-3B-20221110-ctx4096.pth : Fine-tuned to ctx_len = 4096
|
24 |
* LAMBADA ppl 5.25, acc 63.96%
|
25 |
* PIQA acc 74.16%
|
26 |
* SC2016 acc 70.71%
|
27 |
* Hellaswag acc_norm 59.89%
|
28 |
* ctx_len = 4096 n_layer = 32 n_embd = 2560
|
29 |
|
30 |
+
Previous checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331B tokens.
|
31 |
* Pile loss 1.9469
|
32 |
* LAMBADA ppl 5.24, acc 63.94%
|
33 |
* PIQA acc 73.72%
|