File size: 1,300 Bytes
7a9ace6 cccf815 7a9ace6 cccf815 7a9ace6 cccf815 512186e 39379db ff9ea24 6f726bb dd2f5c9 6f726bb 3957ef8 512186e 3c64a58 aee797f 512186e 0616e70 6f726bb 06dc12e 88df261 aee797f 88df261 aee797f 88df261 aee797f 88df261 06dc12e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
language:
- en
- zh
- de
- fr
- es
- pt
- ru
- it
- ja
- ko
- vi
- ar
tags:
- pytorch
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
---
# RWKV-4 World
## Model Description
RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).
How to use:
* use latest rwkv pip package (0.7.4+)
* use latest ChatRWKV v2/benchmark_world.py to test
* larger models are stronger even though not fully trained yet
The difference between World & Raven:
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
* use Question/Answer or User/AI or Human/Bot prompt for Q&A. **DO NOT USE Bob/Alice or Q/A**
* use **fp32** (will overflow in fp16 at this moment - fixable in future) or bf16 (slight degradation)
NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']
A good prompt example:
```
Question: hi
Answer: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
Question: xxxxxx
Answer:
``` |