should i expect lower accuracy from the original model
#2
by
YairFr
- opened
but sending the same prompt, one time to the original model , using LlamaForCausalLM.from_pretrained
and one time via wrapping the ggml model via Llamacpp and use it in langchain's AgentExecutor - i get different (and worse) results
Can you provide an example of those differences?
what i wrote here https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/discussions/9 is relevant also for this model.