Nvidia RAG fine tune

#3
by KnutJaegersberg - opened

Just wanting to make you aware of 2 new fine tunes

https://huggingface.co/nvidia/ChatQA-1.5-70B
and 8 b version

Hmm, that's not exactly a request to quantize them, but that's what I do :) So let's see how llama.cpp fails today.

mradermacher changed discussion status to closed

Yeah, both are not supported by llama.cpp yet.

I could do a hack-conversion with convert.py and forcing the pretokenizer type to llama3.

I've hardcoded llama3 pretokenizer, let's see how that turns out. current llama.cpp is such a disaster.

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 803, got 723

the 8b really doesn't seem to be supported.

Just wish the convert script would error out rather than generate garbage.

yup, both are not supported.

it is what it is, thanks for trying!

bartowski has imatrix quants. No clue why his conversion created a working model (his llama version is older, but there shouldn't be any relevant changes), but they do work. Have a look there!

https://github.com/ggerganov/llama.cpp/issues/7046 has a workaround, I'll give it another try.

Sign up or log in to comment