It works.

#3
by Yuuru - opened

Just tested Q4_0 . Runs fine.

Please re-download the files - the rope_theta was wrong, and is now fixed. Apparently this affects generation quality

Apparently this affects generation quality

It so much smarter now. Tried 3Q with full offload, it's comfortably fast.

llama_print_timings: prompt eval time = 1357.32 ms / 63 tokens ( 21.54 ms per token, 46.42 tokens per second)

Tested Q4_K_M. Runs perfect. Thank you!

What is the minimum system requirement? Did you run it locally?

Sign up or log in to comment