Why does the Moondream demo give better answers than my local one?

by bermancheese - opened

how do i get my answers to be the same. i dont care if i have to wait longer for the answer.

How are you running it locally? Using the transformers library or something else?

i tryed using llama.cpp, kobold, llmstudio, llamafiles. tryed your gguf and others and all max quants where quants are the only options. im sure i misunderstand the basics or something since im kinda a noob.

Sign up or log in to comment