xiaol's picture
Update README.md
759a8f9
|
raw
history blame
No virus
1.83 kB
metadata
license: apache-2.0
datasets:
  - Norquinal/claude_multiround_chat_30k
  - OpenLeecher/Teatime

We proudly announce this is the world first 128k context model based on RWKV architecture today, 2023-08-10.

This model trained with instructions datasets and chinese web novel and tradition wuxia, more trainning details would be updated.

Test input 67k tokens to summary ,can find in example folders ,more cases are coming.

Full finetuned using this repo to train 128k context model , 4*A800 40hours with 1.3B tokens. https://github.com/SynthiaDL/TrainChatGalRWKV/blob/main/train_world.sh

QQ图片20230810153529.jpg

Using RWKV Runner https://github.com/josStorer/RWKV-Runner to test this , use temp 0.1-0.2 topp 0.7 for more precise answer ,temp between 1-2.x is more creatively. 微信截图_20230810162303.png

image.png

微信截图_20230810142220.png

4UYBX4RA0%8PA{1YSSK)AVW.png

QQ图片20230810143840.png

image.png

67k input test image.png