Bad results with ggml version of your model

#2
by appliedstuff - opened

Hello,

do you have any example prompts and responses of your model. With a GGML version from the Bloke, the model performs very bad. See here: https://huggingface.co/TheBloke/llama-2-13B-German-Assistant-v2-GGML/discussions

So, I am interested in what you think about the cause of this?
Is it the GGML conversion that makes it that bad or is the model itself that cause that it has such a worse performance?

Thebloke and me knowing about the worse performance
Since I have never ever worked with ggml I have no experience how to fix that

Ok that mean the results on the full model are better? Maybe you can provide some examples in the model card? That would be great! Would make it a lot easier to decide if I use it without deploying it on a GPU server. Thanks in advance!

Sign up or log in to comment