BlinkDL
/

rwkv-4-pile-3b

Text Generation

Model card Files Files and versions Community

BlinkDL commited on Feb 21, 2023

Commit

caeb851

•

1 Parent(s): 09f4c63

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -35,4 +35,18 @@ Final checkpoint: RWKV-4-Pile-3B-20221008-8023.pth : Trained on the Pile for 331
 * Hellaswag acc_norm 59.63%
 * ctx_len = 1024 n_layer = 32 n_embd = 2560
 ## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.

 * Hellaswag acc_norm 59.63%
 * ctx_len = 1024 n_layer = 32 n_embd = 2560
+### Instruct-test models: only useful if you construct your prompt following dataset templates
+RWKV-4-Pile-3B-Instruct-test1
+instruct-tuned on https://huggingface.co/datasets/bigscience/xP3all/viewer/en/train
+RWKV-4-Pile-3B-Instruct-test2
+instruct-tuned on https://huggingface.co/datasets/Muennighoff/flan & NIv2
+### Chinese models
+RWKV-4-Pile-3B-EngChn-testNovel-xxx for writing Chinese novels (trained on 200G Chinese novels.)
+RWKV-4-Pile-3B-EngChn-testxxx for Chinese Q&A (trained on 10G Chinese text. only for testing purposes.)
 ## Note: 4 / 4a / 4b models ARE NOT compatible. Use RWKV-4 unless you know what you are doing.