GGUF Version?

#1
by johnnnna - opened

Hi @yunconglong , thanks for the awesome work.
This model is currently #1 in the leaderboard with an average score of 77.44, way higher than 60-70 billion parameter models... After the optimization now TruthfulQA's score is even 78.02.
Perhaps it may to be of interest to @TheBloke @Rybens @Nan-Do and others.
A quantized gguf version of the model would be phenomenal. The fact that it is 13b is mind blowing

I started the quantization process, but the hf does not want to cooperate much. The model downloads at 5 mbps.

I'll get back to you when the quants are sent

It seems we end doing the same lately (sorry about that).

I already finished the quantization the files are here .

The model lacks all the required files to be quantized, I managed to solve it and after several tests it seems to be working OK, but I suggest you to test it thoroughly.

Great work @Nan-Do !

Yeah, tokenizer.model file weren't present in the repo, I had to download it from original TomGrc/FusionNet_7Bx2_MoE_14B model page.

In that case, as you've already created a repo with quantizations, I'll consider whether to create my own at all

@LoneStriker Can you model EXL2 quantization?

EXL2 takes forever...

@Nan-Do Thanks for the GGUF...

The more I look at this model the more impressed I am.. it's only (2) 7B models...

@Nan-Do , no Q4_KM?

@akhil3417 I chose to use the Q4_0 and Q4_1 versions. The Q4_1 version should be fairly similar in size (just slightly bigger). Doesn't that one fit your needs?

@Nan-Do Could you go into more detail about how you fixed the issue with the lack of required files? I haven't had any luck trying to generate a suitable tokenizer.model file.

@christopherthompson81 @Rybens already explained how to solve the problem a few comments above. Just copy the file from the base model.

@Nan-Do I was getting nonsense back out of the quants I made when I did that.

I am getting this error: error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file_gpt4all: failed to load model
Any fix? :v

I just have to say, that the ability of this model is completely bananas. It is the absolutely best thing I have ever used in my rag. Kudos for that, and especially thanks to @Rybens and @Nan-Do for the quants!

I just have to say, that the ability of this model is completely bananas. It is the absolutely best thing I have ever used in my rag. Kudos for that, and especially thanks to @Rybens and @Nan-Do for the quants!

If you like performance of this model, try https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO. It's size is only 7B, no moe, trained with meta's self-rewarding method.
Here are gguf quants - https://huggingface.co/brittlewis12/Snorkel-Mistral-PairRM-DPO-GGUF/tree/main

@Rybens thanks for the quants , though one can distinguish them using their size even if they have identical names, maybe you should consider renaming it correctly.

also how does https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO suggested by you
compares with neauralbeagle-7b ? any ideads

after seeing the results i must say its brlliant , 7b models are killing it.
wonder how its gonna do in open llm leaderboard

@Rybens thanks for the quants , though one can distinguish them using their size even if they have identical names, maybe you should consider renaming it correctly.

also how does https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO suggested by you
compares with neauralbeagle-7b ? any ideads

They have names appropriate to the quants used, maybe what you mean is that the names showed are shortened by hf by simply being too long.

And the model I recommended is pretty smart for its size, so I thought you guys would be interested and that's all

They have names appropriate to the quants used, maybe what you mean is that the names showed are shortened by hf by simply being too long.

looks like its the browsers fault , looks fine in desktop.

also i wonder if openllm leaderboard rankings are rigged or what , that model(snorkel-mistral) is ranked very poorly , cz a/q to them is looks like best 7b out their . ranking next to gpt4-turbo is no joke.

Hi @yunconglong , could you add the missing tokenizer.model file from https://huggingface.co/TomGrc/FusionNet_7Bx2_MoE_v0.1/blob/main/tokenizer.model to this repository, so that creating GGUF quants works out of the box? Thank you!

Hi @yunconglong , could you add the missing tokenizer.model file from https://huggingface.co/TomGrc/FusionNet_7Bx2_MoE_v0.1/blob/main/tokenizer.model to this repository, so that creating GGUF quants works out of the box? Thank you!

done

Sign up or log in to comment