GGUF versions

by christianweyer - opened Apr 28

Discussion

christianweyer

Apr 28

Hey Ronan,

when will we see .gguf versions to use with llama.cpp?
Thx!

RonanMcGovern

Trelis org Apr 29

Howdy, I'll aim to get it up this week.

christianweyer

Apr 29

Perfect, thank you!

ivanpzk

Apr 30

also very very interested by the gguf version :) thanks a lot, i definitely need this function calling version

RonanMcGovern

Trelis org May 1

Ok, the issue here is with the long RoPE that is being used by Microsoft. It's causing issues with TGI and with ggufs. I'm tracking this issue: https://github.com/ggerganov/llama.cpp/issues/6849

In the meantime, I plan to release a 4k model, is that useful or the 128k is key?

ivanpzk

May 2

For me 4k is already useful, 128k would be 20% of my usage

christianweyer

May 2

4k is also useful, yes.

RonanMcGovern

Trelis org about 1 month ago

Noted, will aim to get on this late next week, I'm travelling, sorry for the delay

Voxels

28 days ago

The GGUF (4k or 128k) would be very helpful. ❤️

ivanpzk

26 days ago

•

edited 26 days ago

i'm so far running gorilla open function v2. how will compete phi 3 function calling ? gorilla launched a ladder to compare function calling model . anyone have insights about the relevance of the lader? https://gorilla.cs.berkeley.edu/leaderboard.html

RonanMcGovern

Trelis org 20 days ago

This is taking a long time to get resolved on Llama.cpp for making the GGUF.

Would an MLX quant be useful instead (like this)?

or is gguf really needed because that's what is supported by libraries/apps like lm studio?

christianweyer

20 days ago

I am using ollama with LiteLLM, so the gguf would be great.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment