How to run this model

by MarinaraSpaghetti - opened Jan 26

Jan 26

Hey guys, just a little note: to run this model, please use the basic exllama2 loader and set the alpha_value to 3! Cheers, and thanks for the quants!

defro

May 4

I have a 16GB GPU card - am I able to run 4_25 in 15.2GB (16k) mode? If so, what parameters do I pass to exllama? Thanks!

MarinaraSpaghetti

May 4

I have a 16GB GPU card - am I able to run 4_25 in 15.2GB (16k) mode? If so, what parameters do I pass to exllama? Thanks!

You can run it easily using Oobabooga's WebUI. There, simply load up the model in the Model card, with 16k context. You should be able to use it with 32k context if you check the 4bit-caching flag, as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment