quantizations

#1
by Aryanne - opened

can you please quantize to gguf(ggml) if possible? (to run on koboldcpp/rwkv.cpp)

https://huggingface.co/latestissue come to discord server https://discord.gg/pWH5MkvtNR to ask for help with this cool guy @latestissue

@Aryanne remember to use the new tokenizer txt file in this repo to inference.

Hello xiaol. I am a fan of your small model. So far it is the one that has worked best for me in terms of consistency and speed of response. I come from LatestIssue who told me that you were the creator of this model. I wanted to ask you, wouldn't it be possible to make another model of this one but from 0.4b or lower? The thing is that although I manage to run it on my 4Gb RAM cell phone, it crashes shortly after and I can no longer run it. I think that with a 0.4b one like the original RWKV World I would do better. Thank you for creating this model, it is a great advance for cell phones.

okay, I see it's the "request time" :D
Would you possibly also be able to do the 3B + optionally/eventually 7B?
I don't know what exactly you did to the model, or maybe it's because of the newest rwkvcpp update, but it's fast as hell, faster than the regular 1.5B world model. Or maybe just placebo, I'll have to test and compare more.

@Novaciano have you tried ncnn https://github.com/daquexian/faster-rwkv, 4G is enough for 1.5B ,
and i will make a 0.4B for RWKV world, recently i am preparing train raven 14B for openllm leaderboard

@latestissue sounds very good to run a larger model, looking forward for the test result .
do you mean you need a 3B model? that's on the schedule.

Sure, 3B would be perfect. I guess with the upcoming changes in RWKV-5, 3B will be fast enough to use on devices with 8+ GB RAM.

Aryanne changed discussion status to closed

Sign up or log in to comment