Update README.md
Browse files
README.md
CHANGED
@@ -35,4 +35,18 @@ Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331
|
|
35 |
* Hellaswag acc_norm 59.63%
|
36 |
* ctx_len = 1024 n_layer = 32 n_embd = 2560
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.
|
|
|
35 |
* Hellaswag acc_norm 59.63%
|
36 |
* ctx_len = 1024 n_layer = 32 n_embd = 2560
|
37 |
|
38 |
+
### Instruct-test models: only useful if you construct your prompt following dataset templates
|
39 |
+
|
40 |
+
RWKV-4-Pile-3B-Instruct-test1
|
41 |
+
instruct-tuned on https://huggingface.co/datasets/bigscience/xP3all/viewer/en/train
|
42 |
+
|
43 |
+
RWKV-4-Pile-3B-Instruct-test2
|
44 |
+
instruct-tuned on https://huggingface.co/datasets/Muennighoff/flan & NIv2
|
45 |
+
|
46 |
+
### Chinese models
|
47 |
+
|
48 |
+
RWKV-4-Pile-3B-EngChn-testNovel-xxx for writing Chinese novels (trained on 200G Chinese novels.)
|
49 |
+
|
50 |
+
RWKV-4-Pile-3B-EngChn-testxxx for Chinese Q&A (trained on 10G Chinese text. only for testing purposes.)
|
51 |
+
|
52 |
## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.
|