GGUF Version?

by johnnnna - opened Jan 21, 2024

Jan 21, 2024

•

edited Jan 21, 2024

Hi @yunconglong , thanks for the awesome work.
This model is currently #1 in the leaderboard with an average score of 77.44, way higher than 60-70 billion parameter models... After the optimization now TruthfulQA's score is even 78.02.
Perhaps it may to be of interest to @TheBloke @Rybens @Nan-Do and others.
A quantized gguf version of the model would be phenomenal. The fact that it is 13b is mind blowing

Rybens

Jan 21, 2024

I started the quantization process, but the hf does not want to cooperate much. The model downloads at 5 mbps.

I'll get back to you when the quants are sent

Nan-Do

Jan 21, 2024

•

edited Jan 21, 2024

It seems we end doing the same lately (sorry about that).

I already finished the quantization the files are here .

The model lacks all the required files to be quantized, I managed to solve it and after several tests it seems to be working OK, but I suggest you to test it thoroughly.

Rybens

Jan 21, 2024

Great work @Nan-Do !

Yeah, tokenizer.model file weren't present in the repo, I had to download it from original TomGrc/FusionNet_7Bx2_MoE_14B model page.

In that case, as you've already created a repo with quantizations, I'll consider whether to create my own at all

xldistance

Jan 21, 2024

@LoneStriker Can you model EXL2 quantization?

LoneStriker

Jan 21, 2024

@LoneStriker Can you model EXL2 quantization?

exl2 models uploading here: https://huggingface.co/models?search=LoneStriker%20Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B

senseable

Jan 21, 2024

EXL2 takes forever...

@Nan-Do Thanks for the GGUF...

The more I look at this model the more impressed I am.. it's only (2) 7B models...

akhil3417

Jan 21, 2024

@Nan-Do , no Q4_KM?

Nan-Do

Jan 22, 2024

@akhil3417 I chose to use the Q4_0 and Q4_1 versions. The Q4_1 version should be fairly similar in size (just slightly bigger). Doesn't that one fit your needs?

christopherthompson81

Jan 22, 2024

@Nan-Do Could you go into more detail about how you fixed the issue with the lack of required files? I haven't had any luck trying to generate a suitable tokenizer.model file.

Nan-Do

Jan 22, 2024

@christopherthompson81 @Rybens already explained how to solve the problem a few comments above. Just copy the file from the base model.

christopherthompson81

Jan 22, 2024

@Nan-Do I was getting nonsense back out of the quants I made when I did that.

Rybens

Jan 23, 2024

@akhil3417 @christopherthompson81 I made other quants not available in @Nan-Do 's repo
Here it is https://huggingface.co/Rybens/truthful_dpo_tomgrc_fusionnet_7bx2_moe_13b_GGUF

CTodkIcR

Jan 24, 2024

I am getting this error: error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file_gpt4all: failed to load model
Any fix? :v

Bearsaerker

Jan 24, 2024

I just have to say, that the ability of this model is completely bananas. It is the absolutely best thing I have ever used in my rag. Kudos for that, and especially thanks to @Rybens and @Nan-Do for the quants!

Rybens

Jan 24, 2024

I just have to say, that the ability of this model is completely bananas. It is the absolutely best thing I have ever used in my rag. Kudos for that, and especially thanks to @Rybens and @Nan-Do for the quants!

If you like performance of this model, try https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO. It's size is only 7B, no moe, trained with meta's self-rewarding method.
Here are gguf quants - https://huggingface.co/brittlewis12/Snorkel-Mistral-PairRM-DPO-GGUF/tree/main

akhil3417

Jan 25, 2024

•

edited Jan 25, 2024

@Rybens thanks for the quants , though one can distinguish them using their size even if they have identical names, maybe you should consider renaming it correctly.

also how does https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO suggested by you
compares with neauralbeagle-7b ? any ideads

after seeing the results i must say its brlliant , 7b models are killing it.
wonder how its gonna do in open llm leaderboard

Rybens

Jan 25, 2024

@Rybens thanks for the quants , though one can distinguish them using their size even if they have identical names, maybe you should consider renaming it correctly.

also how does https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO suggested by you
compares with neauralbeagle-7b ? any ideads

They have names appropriate to the quants used, maybe what you mean is that the names showed are shortened by hf by simply being too long.

And the model I recommended is pretty smart for its size, so I thought you guys would be interested and that's all

akhil3417

Jan 25, 2024

•

edited Jan 25, 2024

They have names appropriate to the quants used, maybe what you mean is that the names showed are shortened by hf by simply being too long.

looks like its the browsers fault , looks fine in desktop.

also i wonder if openllm leaderboard rankings are rigged or what , that model(snorkel-mistral) is ranked very poorly , cz a/q to them is looks like best 7b out their . ranking next to gpt4-turbo is no joke.

count-zero

Feb 26, 2024

Hi @yunconglong , could you add the missing tokenizer.model file from https://huggingface.co/TomGrc/FusionNet_7Bx2_MoE_v0.1/blob/main/tokenizer.model to this repository, so that creating GGUF quants works out of the box? Thank you!

yunconglong

Owner Feb 28, 2024

Hi @yunconglong , could you add the missing tokenizer.model file from https://huggingface.co/TomGrc/FusionNet_7Bx2_MoE_v0.1/blob/main/tokenizer.model to this repository, so that creating GGUF quants works out of the box? Thank you!

done

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment