Broken

by lemon07r - opened May 31, 2024

May 31, 2024

Seems to be broken, unfortunately. I tried a gguf my repo quant https://huggingface.co/lemon07r/llama-3-SNAMD-8B-Q8_0-GGUF and that didnt work, so I downloaded the original repo and quantized it myself using the latest version of lcpp, etc. Also did not work. Tried two different versions of kcpp to load it, vulkan, openblas, hipblas too, all of them just open and close before I get to see the error.

lemon07r

May 31, 2024

http://5.9.86.149/hf/llama-3-SNAMD-8B/

You can try this, I had the big blender bot on the koboldai discord server do the same merge using your yaml config. Should work.

lemon07r

May 31, 2024

http://5.9.86.149/hf/llama-3-SNAMD-8B/

You can try this, I had the big blender bot on the koboldai discord server do the same merge using your yaml config. Should work.

Nope.. This is broken too. Same issue. Tried to quant it myself. Any idea why it doesnt work?

nbeerbower

Owner May 31, 2024

I will try quanting it tomorrow (I broke my linux workstation, whoops) and share the config if it works. The safetensors work for me using Transformers on a GPU.

EloyOn

May 31, 2024

•

edited May 31, 2024

@lemon07r Could be a problem with the tokenizer? Something like that happened when some users tried to quant Stheno-Mahou. Lewdiculous made it work in that thread.
https://huggingface.co/nbeerbower/llama-3-Stheno-Mahou-8B/discussions/1#66577ab60c9058052fd84ffe

emnakamura

May 31, 2024

Hi, I finished quantizing this model here: https://huggingface.co/emnakamura/llama-3-SNAMD-8B-GGUF

You need to use llama.cpp's convert-hf-to-gguf.py script. We use --outtype f32 to minimize data loss when converting to GGUF.

lemon07r

May 31, 2024

•

edited May 31, 2024

Hi, I finished quantizing this model here: https://huggingface.co/emnakamura/llama-3-SNAMD-8B-GGUF

You need to use llama.cpp's convert-hf-to-gguf.py script. We use --outtype f32 to minimize data loss when converting to GGUF.

That's exactly how I did it though, latest version of lcpp on fedora 40, converted to f32 first. With both your weights, and my own weights (from same recipe). Not sure why the quant didnt work. I'll give yours an attempt.
EDIT do you have a q8_0 for me to test with? cause thats the quant that didnt work for me.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment