File size: 1,887 Bytes
060b78b 0be695e 060b78b 0be695e 8ce9232 0be695e 060b78b 0be695e b5dea7a 0be695e a07420d 0be695e 1e8417a 296e423 9024fcb 605a997 edbe355 5f4ea53 762c715 5f4ea53 edbe355 2bb4c10 2739b27 ea1f914 29fbd06 697dc39 e4d5839 697dc39 b97b0b6 697dc39 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
language:
- en
tags:
- pytorch
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- the_pile
---
# RWKV-4 7B
[UPDATE: Try RWKV-4-World (https://huggingface.co/BlinkDL/rwkv-4-world) for generation & chat & code in 100+ world languages, with great English zero-shot & in-context learning ability too.]
## Model Description
RWKV-4 7B is a L32-D4096 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
Use https://github.com/BlinkDL/ChatRWKV to run it.
ctx_len = 1024
n_layer = 32
n_embd = 4096
RWKV-4-Pile-7B-20230109-ctx4096.pth : Fine-tuned to ctx_len 4096.
* Likely the best. Please test.
################################
"Raven": RWKV alpaca+vicuna-style model: https://huggingface.co/BlinkDL/rwkv-4-raven (highly recommended)
It is a strong chat model too. You can use +i for "Alpaca Instruct" in latest ChatRWKV v2. Examples:
```
+i Explain the following metaphor: "Life is like cats".
+i write a python function to read data from an excel file.
```
################################
RWKV-4-Pile-7B-20230xxx-ctx8192-testxxx : Fine-tuned to ctx_len 8192.
* Slightly weaker than ctx4096 model when ctxlen < 3k.
RWKV-4-Pile-7B-20221115-8047.pth : Trained on the Pile for 332B tokens.
* Pile loss 1.8415T
* LAMBADA ppl 4.38, acc 67.18%
* PIQA acc 76.06%
* SC2016 acc 73.44%
* Hellaswag acc_norm 65.51%
### Instruct-test models (OLD): only useful if you construct your prompt following dataset templates
Note I am using "Q: instruct\n\nA: result" prompt for all instructs.
RWKV-4-Pile-7B-Instruct-test1
instruct-tuned on https://huggingface.co/datasets/bigscience/xP3all/viewer/en/train
RWKV-4-Pile-7B-Instruct-test2
instruct-tuned on https://huggingface.co/datasets/Muennighoff/flan & NIv2
### Chinese models
RWKV-4-Pile-7B-EngChn-testNovel-xxx for writing Chinese novels (trained on 200G Chinese novels.)
|