RWKV Quantisation

#1
by thefaheem - opened

Hey @TheBloke First of all Thanks for your work for the community man!

I'm Asking You, Why Don't You Create Quantized Model For RWKV. it would be very very helpful to me and the community?

yo! I Searched But Never Got This Result...

Thanks @dduval

thefaheem changed discussion status to closed

I Can't use it with LLama cpp python, it gives a value error. I Think its because RWKV is not Decoder Architecture and its not unidirectional.

So, What Should I Use To Run These?

Can Anyone Help?

thefaheem changed discussion status to open

It worked for me with koboldcpp: https://github.com/LostRuins/koboldcpp
I only had time to try the 14B q5_1.

I'm a Dumb. Can You Please Tell me How to Run in linux or colab

Instructions for Windows:

  1. Download the latest release: https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp.exe
  2. double-click koboldcpp.exe
  3. click Launch and open your model .bin file

That's it! You may be able to improve performance if you launch it from command prompt and set a number of threads and give high priority to the process. Run koboldcpp.exe --help to see all the options. I launch it using the following command: koboldcpp.exe ggml-model-q5_1.bin --launch --threads 16 --highpriority --smartcontext

For Linux?..

I have yet to try koboldcpp on Linux. Check the README.md on the GitHub page for Linux instructions. I see that oobabooga's text-generation-webui should support RWKV as well: https://github.com/oobabooga/text-generation-webui/blob/main/docs/RWKV-model.md

Edit: I just realized that you specifically asked for linux or colab, and I gave you Windows instructions. Sorry for that. As for oobabooga, I may be wrong, but I don't think it supports quantized versions.

No Problem Mate, I Found it to be works well with rwkv.cpp.

Anyways Thanks For Your Help...

thefaheem changed discussion status to closed

Sign up or log in to comment