Suggested Settings for loading/using e.g. OobaBooga
Thanks for creating and releasing this model. A lot of people want to use it but which settings would be most important to make it run well on consumer hardware, which a lot of people have.
For example:
- Loader - Transformers? exLlama? Llamaccp?
- GPU/CPU memory allocations?
- Chat Parameters - e.g. new tokens, etc.
Maybe you could provide some rough, ballpark suggestions for use with what would be low-end, middle-range, high-end systems
https://github.com/oobabooga/text-generation-webui/tree/main/docs
Thanks for creating and releasing this model. A lot of people want to use it but which settings would be most important to make it run well on consumer hardware, which a lot of people have.
For example:
- Loader - Transformers? exLlama? Llamaccp?
- GPU/CPU memory allocations?
- Chat Parameters - e.g. new tokens, etc.
Maybe you could provide some rough, ballpark suggestions for use with what would be low-end, middle-range, high-end systems
https://github.com/oobabooga/text-generation-webui/tree/main/docs
I haven't personally used oogabooga, but generally you would want to use GPTQ or GGML for fast inference and lower vram requirements at home.
It would require ~12GB of vram, if you dont have that you will need 12GB of ram, GGML supports CPU, GPTQ/Exllama does not
It supports up to 4096 context size for new tokens, but less will keep your vram and performance in check