Any plan for bigger model such as 30B?

#10
by lpy86786 - opened

Thanks to the effort of developers, for now I have experienced the powerful performance of rwkv model.
Will there be bigger models like 30B, 65B or even 130B in the near future? Then the relationship between performance and the model size can be fully tested.
I hope there will be an emergent phenomenon as the size of model increases, that is, the performance is greatly improved on bigger models.

感谢大佬们的工作。
请问未来对于更大的模型,比如30B,65B甚至130B有计划做吗?
大的模型估计能带来一些提升,其实我更期待看到的是出现涌现现象,也就是模型大小增加到一定程度时,模型会显示出一些新的能力。。。

the plan is 24B -> 50B -> 100B this year :)

Let's make sure as a community, that we can run all of those models on normal desktop hardware! Need some good runtimes 👀

if it realy works the 100B model with 4 bit quantization would probably be possible to run on a desktop with 128GB ram :-) That would be so amazing... But unknown if it would still work with nearly the same accuracy. ChatGLM did that to their model and the decrease in accuracy is absolutely minimal on the 130B model, but they where only able to reduce the weights and not the activations. (https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md) would be interesting to see if RWKV will face the same problem on a big model.

@BlinkDL I really want to say awesome work dude, you aren't a fork or a mod of someone else's models but putting out the original I'm amazed. Out of curiosity what are you using to train your models? Do you have access to the hardware just laying around to get these done so fast?

I'm asking since I'm wondering if I have anything I might be able to do to help that effort, even if it's just a couple bucks over paypal or whatever.

@BlinkDL This is great news, I will look forward to trying 24B in the future!

@trahloc There is a ko-fi link on the github pages if you want to support the developer. The compute is sponsored by Stability-AI and EleutherAi

@Verah Ah, it's within the projects themselves. I'm used to just see it as part of folks profile / about me.

Are there signs of CoT emergence observed in recent RWKV models? I don't see any in the Q8 7B model.

@ZhangRC , maybe, because it already processes inputs token-by-token, due to executing like an RNN, instead of "all at once", like Transformers do. If that's the reason, the model should display some "CoT" improvements by default 👀

@Raspbfox @ZhangRC CoT is simple. Simply tune it with more CoT data.

lpy86786 changed discussion status to closed

Sign up or log in to comment