Can you be more specific how to use 8k context with ExLLama in Oobabooga?

#3
by justsumguy - opened

Originally the model card said for ExLlama that you had to manually add the patch which I couldn't figure out how to do, then you recently updated it with this note:
If you are using exllama the monkey-patch is built into the engine, please use -cpe to set the scaling factor, ie. if you are running it at 4k context, pass -cpe 2 -l 4096

This may work from running on a command-line but does not specify how to pass those parameters to Oobabooga. Applying them to the CMD_FLAGS after --loader exllama or --monkey-patch is not a recognized argument. Also, that example doesn't indicate whether cpe goes up to 4 for 8k context or down to 1, requiring further pre-requisite knowledge to use this model properly

Same question here, I think it is related to exllama itself. Someone would have (I think) a way to add the parameter into ooba.

deleted
This comment has been hidden

I just did a PR which adds these values into ooba to be able to set them.

https://github.com/oobabooga/text-generation-webui/pull/2876

Edit: ignore that PR, ooba just did it now as well lol https://github.com/oobabooga/text-generation-webui/pull/2875

Edit2: it's merged

Im using huggingface text-generation inference and running the fp16 version. How do i apply the patching to that?

Sign up or log in to comment