How to run this model

#1
by MarinaraSpaghetti - opened

Hey guys, just a little note: to run this model, please use the basic exllama2 loader and set the alpha_value to 3! Cheers, and thanks for the quants!

I have a 16GB GPU card - am I able to run 4_25 in 15.2GB (16k) mode? If so, what parameters do I pass to exllama? Thanks!

I have a 16GB GPU card - am I able to run 4_25 in 15.2GB (16k) mode? If so, what parameters do I pass to exllama? Thanks!

You can run it easily using Oobabooga's WebUI. There, simply load up the model in the Model card, with 16k context. You should be able to use it with 32k context if you check the 4bit-caching flag, as well.

Sign up or log in to comment