Quantizations for llama.cpp

#23
by rozek - opened

Thank you very much for this marvellous work! Being able to use long contexts (depending on the amount of available RAM, of course) is wonderful!

In order to use your model with llama.cpp, I've generated some quantizations in GGUF format.

Together org

@rozek Thank you for your support!

You're welcome! If you want any changes in my description of your model, just tell me!

llama cpp supports long contexts? (more than 4096) ? thanks!

Hello! Sorry for the late response, but I have been quite busy in the last few days.

Yes, it supports longer contexts - provided that you change the limits in the source and recompile - as I did in my own fork (which is currently a bit behind, let's see if I find the time to sync it with the original again) (which I just synced with the original branch again)

Sign up or log in to comment