Quantized version temporarily unavailable

#2
by jonabur - opened
LumiOpen org

We saw some performance issues with the quantized version and have taken it down temporarily while we investigate.

We saw some performance issues with the quantized version and have taken it down temporarily while we investigate.
Any ETA on this? :)

LumiOpen org

We ended up needing to submit a PR for llama.cpp to support our tokenizer. We submitted the PR today so hopefully it can be fixed soon:

https://github.com/ggerganov/llama.cpp/pull/7713/files

Once the PR is merged we should be able to upload a new version.

Sweet, looking forward to that!

ggerganov approved it

Progress on this?

LumiOpen org

Should be coming back ~today!

LumiOpen org

We have a little more testing to do, but it looks good for tomorrow.

LumiOpen org

It's uploaded, please let us know if you have any trouble. Make sure you're using a current version of llama.cpp though!

jonabur changed discussion status to closed

Sign up or log in to comment