Running it on an Apple MBP M3 - non-quantized

#14
by christianweyer - opened

We are really loving the results we get with the online demo (https://llava.hliu.cc). Kudos!
When trying to run e.g. an fp16 quant on our M3 Max with llama.cpp or Ollama, we get horrible results (due to some pending PRs in llama.cpp).

How are people running it on a Mac without quants to get the same results as with the original demo?

Thanks!

Sign up or log in to comment