GGUF versions

#1
by christianweyer - opened

Hey Ronan,

when will we see .gguf versions to use with llama.cpp?
Thx!

Trelis org

Howdy, I'll aim to get it up this week.

Perfect, thank you!

also very very interested by the gguf version :) thanks a lot, i definitely need this function calling version

Trelis org

Ok, the issue here is with the long RoPE that is being used by Microsoft. It's causing issues with TGI and with ggufs. I'm tracking this issue: https://github.com/ggerganov/llama.cpp/issues/6849

In the meantime, I plan to release a 4k model, is that useful or the 128k is key?

For me 4k is already useful, 128k would be 20% of my usage

4k is also useful, yes.

Trelis org

Noted, will aim to get on this late next week, I'm travelling, sorry for the delay

The GGUF (4k or 128k) would be very helpful. ❤️

i'm so far running gorilla open function v2. how will compete phi 3 function calling ? gorilla launched a ladder to compare function calling model . anyone have insights about the relevance of the lader? https://gorilla.cs.berkeley.edu/leaderboard.html

Trelis org

This is taking a long time to get resolved on Llama.cpp for making the GGUF.

Would an MLX quant be useful instead (like this)?

or is gguf really needed because that's what is supported by libraries/apps like lm studio?

I am using ollama with LiteLLM, so the gguf would be great.

Has there been any update on this @RonanMcGovern ? :-)

Trelis org

Howdy, so I don't think this issue got resolved for making 128k ggufs, but I have asked about phi 3.5 where it seems possible - if I get confirmation I can see about doing a train of phi 3.5 for function calling.

https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/discussions/3

Best, Ronan

Way to go @RonanMcGovern - thx!

Sign up or log in to comment