should i expect lower accuracy from the original model

#2
by YairFr - opened

but sending the same prompt, one time to the original model , using LlamaForCausalLM.from_pretrained
and one time via wrapping the ggml model via Llamacpp and use it in langchain's AgentExecutor - i get different (and worse) results

Can you provide an example of those differences?

what i wrote here https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/discussions/9 is relevant also for this model.

Sign up or log in to comment