Update README.md
Browse files
README.md
CHANGED
@@ -20,16 +20,11 @@ RWKV-4 14B is a L40-D5120 causal language model trained on the Pile. See https:/
|
|
20 |
|
21 |
Use https://github.com/BlinkDL/ChatRWKV to run it.
|
22 |
|
23 |
-
n_layer = 40
|
24 |
-
n_embd = 5120
|
25 |
-
|
26 |
RWKV-4-Pile-14B-2023xxxx-ctx4096-testxxx.pth : Fine-tuned to ctx_len 4096.
|
27 |
-
* ctx_len = 4096
|
28 |
* Highly recommended. It's great.
|
29 |
|
30 |
-
RWKV-4-Pile-14B-20230213-8019.pth : Trained on the Pile for 331B tokens
|
31 |
-
* ctx_len
|
32 |
-
* Pile loss 1.7579
|
33 |
* LAMBADA ppl 3.81, acc 71.05%
|
34 |
* PIQA acc 77.42%
|
35 |
* SC2016 acc 75.57%
|
|
|
20 |
|
21 |
Use https://github.com/BlinkDL/ChatRWKV to run it.
|
22 |
|
|
|
|
|
|
|
23 |
RWKV-4-Pile-14B-2023xxxx-ctx4096-testxxx.pth : Fine-tuned to ctx_len 4096.
|
|
|
24 |
* Highly recommended. It's great.
|
25 |
|
26 |
+
RWKV-4-Pile-14B-20230213-8019.pth : Trained on the Pile for 331B tokens
|
27 |
+
* Pile loss 1.7579 (ctx_len 1024)
|
|
|
28 |
* LAMBADA ppl 3.81, acc 71.05%
|
29 |
* PIQA acc 77.42%
|
30 |
* SC2016 acc 75.57%
|