File size: 3,577 Bytes
fdb274e a430242 fdb274e 2cc8958 fdb274e 2cc8958 fdb274e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
license: apache-2.0
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
language:
- en
- zh
- de
- fr
- es
- pt
- ru
- it
- ja
- ko
- vi
- ar
thumbnail:
tags:
- rwkv
- text-generation
- causal-lm
- ggml
inference: false
---
# RWKV-4 World GGML
### This repository contains quantized conversions of the current RWKV-4 World checkpoints.
*For use with frontends that support GGML quantized RWKV models, such as rwkv.cpp and KoboldCpp.*
*Last updated on 2023-09-28.*
**Description:**
- The motivation behind these quantizations was that latestissue's quants were missing the 0.1B and 0.4B models. The rest of the models can be found here: [latestissue/rwkv-4-world-ggml-quantized](https://huggingface.co/latestissue/rwkv-4-world-ggml-quantized)
# RAM USAGE
Model | Starting RAM usage (KoboldCpp)
:--:|:--:
RWKV-4-World-0.1B.q4_0.bin | 289.3 MiB
RWKV-4-World-0.1B.q4_1.bin | 294.7 MiB
RWKV-4-World-0.1B.q5_0.bin | 300.2 MiB
RWKV-4-World-0.1B.q5_1.bin | 305.7 MiB
RWKV-4-World-0.1B.q8_0.bin | 333.1 MiB
RWKV-4-World-0.1B.f16.bin | 415.3 MiB
|
RWKV-4-World-0.4B.q4_0.bin | 484.1 MiB
RWKV-4-World-0.4B.q4_1.bin | 503.7 MiB
RWKV-4-World-0.4B.q5_0.bin | 523.1 MiB
RWKV-4-World-0.4B.q5_1.bin | 542.7 MiB
RWKV-4-World-0.4B.q8_0.bin | 640.2 MiB
RWKV-4-World-0.4B.f16.bin | 932.7 MiB
|
RWKV-4-World-1.5B.q4_0.bin | 1.2 GiB
RWKV-4-World-1.5B.q4_1.bin | 1.3 GiB
RWKV-4-World-1.5B.q5_0.bin | 1.4 GiB
RWKV-4-World-1.5B.q5_1.bin | 1.5 GiB
RWKV-4-World-1.5B.q8_0.bin | 1.9 GiB
RWKV-4-World-1.5B.f16.bin | 3.0 GiB
**Notes:**
- rwkv.cpp [[0df970a]](https://github.com/saharNooby/rwkv.cpp/tree/0df970a6adddd4b938795f92e660766d1e2c1c1f) was used for conversion and quantization. First they were converted to f16 ggml files, then quantized.
- KoboldCpp [[bc841ec]](https://github.com/LostRuins/koboldcpp/tree/bc841ec30232036a1e231c0b057689abc3aa00cf) was used to test the model.
The original models can be found [here](https://huggingface.co/BlinkDL/rwkv-4-world), and the original model card can be found below.
* * *
# RWKV-4 World
## Model Description
RWKV-4 trained on 100+ world languages (70% English, 15% multilang, 15% code).
World = Some_Pile + Some_RedPajama + Some_OSCAR + All_Wikipedia + All_ChatGPT_Data_I_can_find
XXXtuned = finetune of World on MC4, OSCAR, wiki, etc.
How to use:
* use https://github.com/josStorer/RWKV-Runner for GUI
* use latest rwkv pip package (0.8.0+)
* use https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark_world.py and https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_WORLD.py to test it
The differences between World & Raven:
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
* use Question/Answer or User/AI or Human/Bot for chat. **DO NOT USE Bob/Alice or Q/A**
For 0.1/0.4/1.5B models, use **fp32** for first layer (will overflow in fp16 at this moment - fixable in future), or bf16 if you have 30xx/40xx GPUs. Example strategy: cuda fp32 *1 -> cuda fp16
NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']
QA prompt (replace \n\n in xxx to \n):
```
Question: xxx
Answer:
```
and
```
Instruction: xxx
Input: xxx
Response:
```
A good chat prompt (replace \n\n in xxx to \n):
```
User: hi
Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
User: xxx
Assistant:
``` |