mradermacher
/

falcon-180B-WizardLM_Orca-i1-GGUF

Inference Endpoints

Model card Files Files and versions Community

Nvidia RAG fine tune

#3

by KnutJaegersberg - opened May 2

KnutJaegersberg

May 2

Just wanting to make you aware of 2 new fine tunes

https://huggingface.co/nvidia/ChatQA-1.5-70B
and 8 b version

Owner May 2

Hmm, that's not exactly a request to quantize them, but that's what I do :) So let's see how llama.cpp fails today.

mradermacher changed discussion status to closed May 2

Owner May 2

Yeah, both are not supported by llama.cpp yet.

Owner May 2

I could do a hack-conversion with convert.py and forcing the pretokenizer type to llama3.

Owner May 2

I've hardcoded llama3 pretokenizer, let's see how that turns out. current llama.cpp is such a disaster.

Owner May 2

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 803, got 723

the 8b really doesn't seem to be supported.

Owner May 2

Just wish the convert script would error out rather than generate garbage.

Owner May 2

yup, both are not supported.

KnutJaegersberg

May 2

it is what it is, thanks for trying!

Owner May 2

bartowski has imatrix quants. No clue why his conversion created a working model (his llama version is older, but there shouldn't be any relevant changes), but they do work. Have a look there!

Owner May 2

https://github.com/ggerganov/llama.cpp/issues/7046 has a workaround, I'll give it another try.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment