further quants

#1
by IkariDev - opened

could you please upload more quants or the source model so that we can do them?

Thanks for the interest. The source model is available, it’s in the AlpacaCielo folder. I’ll upload some more official quants tomorrow.

Thank you very much for your hard work!

Could you please make some suggestions for newbies on which settings to use running this on OobaBooga? For example, consumer hardware 16GB VRAM, and 64GB RAM. Others might like to know for 8GB VRAM / 32GB RAM.

Settings like which way to load the model, max tokens...

At the moment, in ooba I use the ggml model and offload all 41 layers to gpu. Even with 8gb vram, it might still work. However, gptq's should be available soon which will be a better option for anyone with a sufficient gpu. As for max tokens, anything should be fine, it doesn't have a problem with repetition or infinite responses from my testing.

TheBloke has created quants for this model, which you can find on his page.

totally-not-an-llm changed discussion status to closed

Sign up or log in to comment