File size: 1,473 Bytes

7a9ace6
cccf815
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7a9ace6
cccf815
 
 
7a9ace6
cccf815
 
 
 
 
512186e
39379db
 
ff9ea24
944ae40
dd2f5c9
6f726bb
4ccf1d2
3957ef8
a6b6a11
3c64a58
aee797f
512186e
0616e70
a6b6a11
30457d0
ab45047
 
34d8092
 
ab45047
 
30457d0
ab45047
30457d0
ab45047
34d8092
 
30457d0
1067d68
06dc12e
88df261
aee797f
88df261
aee797f
1067d68
aee797f
88df261
06dc12e

---
language:
- en
- zh
- de
- fr
- es
- pt
- ru
- it
- ja
- ko
- vi
- ar
tags:
- pytorch
- text-generation
- causal-lm
- rwkv
license: apache-2.0
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
---

# RWKV-4 World

## Model Description

RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).

How to use:
* use latest rwkv pip package (0.7.4+)
* use https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark_world.py to test it
* larger models are stronger even though not fully trained yet

The differences between World & Raven:
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
* use Question/Answer or User/AI or Human/Bot for chat. **DO NOT USE Bob/Alice or Q/A**
* use **fp32** (will overflow in fp16 at this moment - fixable in future) or bf16 (slight degradation)

NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']

QA prompt (replace \n\n in xxx to \n):
```
Question: xxx

Answer:
```
and
```
Instruction: xxx

Input: xxx

Response:
```

A good chat prompt (replace \n\n in xxx to \n):
```
Question: hi

Answer: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.

Question: xxx

Answer:
```