eos token in gguf

#1
by mradermacher - opened

let's continue here, because redoing quants requires me to delete the repo. new static quants should be available shortly. if they work, i'll add imatrix ones.

mradermacher changed discussion title from end token to eos token in gguf

Thanks Michael! I'm downloading @bartowski 's quant (8-bit) with the fix now and will confirm here once I've tested it. ETA 10 mins.

Can confirm with @bartowski 's 8-bit new quant that things are working perfectly!

Please go ahead with your quantization Michael. Thanks for the support and being responsive.

Screenshot 2024-06-18 at 1.14.07 PM.png

migtissera changed discussion status to closed
This comment has been hidden

@mradermacher let me know when your quants are up. Thanks again.

That's good news. Since it's still generating, I can easily add the f16 to the job, too. I think. Hopefully. It's a first...

Haha, you got the message before I hid it lol. It was 8-bit that I was testing with all along (loading in 8-bit with Python Transformers), so my bad there. Don't worry about the 16-bit.

Ah, and the quants unfortunately ended up on the slowest box, so it will take up to a day for them to be done. I'll try to notify you :) IF you just want to try one out, you can already do so: https://huggingface.co/mradermacher/Tess-v2.5-Phi-3-medium-128k-14B-GGUF/tree/main

Ah, how we love it when people detect decisive quality differences between a quant and itself g. I would expect that 8 bit transformers and Q8_0 do have quality differences (and I would have excpected the Q8_0 to win, but I can easily be wrong about this). Anyway, you'*ll get your f16 whether you want it or not now. I actually do generate f16's for quants ~10B by default, and I am debating whether I should do it by default for some larger sizes, too.

Awesome! 8-bit is all I need. Downloading now.

Chat later!

Sign up or log in to comment