Less is more

#25
by Henk717 - opened

Hey Maziyar,

Wanted to give some feedback from the KoboldAI community, all our users prefer the files to be as large as possible for two different reasons.

  1. They prefer portable files that are self-contained and now have to merge them locally.
  2. Previous uploaders before gguf-split came out would cap the file size at HF's file limit (So almost 50GB), this meant most models could be downloaded with a single link or otherwise usually only 2 or at most 3.

Unlike regular Pytorch files which tend to be uploaded in one repo / branch per file and which software is tightly integrated with Huggingface we can't do the same for GGUF as easily. This is because we want to download GGUF files from other sources than Huggingface in for example our Docker image.

So the way it's currently implemented requires users to copy more links (And merge locally for files where this should not have been needed if they wish a cleaner model collection).

My community would prefer keeping the splits close to 50gb per split, I'm sure others will chime in if this is worse for them.

Hi @Henk717

Thanks for your feedback, I didn't know the number of splits would matter since Llama.cpp loads them from the first split. I try to keep the number of splits between 2 or 3 unless they are very big. (there was a bug in max-size by GB, so I go with number of tensors, so it is not as accurate)

Hopefully moving forward I can use the 48G as the max split.

1 They prefer portable files that are self-contained and now have to merge them locally.

you dont have to merge them

2 Previous uploaders before gguf-split came out would cap the file size at HF's file limit (So almost 50GB), this meant most models could be downloaded with a single link or otherwise usually only 2 or at most

huggingface-cli download MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-GGUF --local-dir . --include 'Q2_Kgguf'
or
press download a few times

3

you lose

So the way it's currently implemented requires users to copy more links (And merge locally for files where this should not have been needed if they wish a cleaner model collection).

no

My community would prefer keeping the splits close to 50gb per split

quantize them for your (((community))) then

I'm sure others will chime in if this is worse for them.

hello

Hi @Henk717

Thanks for your feedback, I didn't know the number of splits would matter since Llama.cpp loads them from the first split. I try to keep the number of splits between 2 or 3 unless they are very big. (there was a bug in max-size by GB, so I go with number of tensors, so it is not as accurate)

Hopefully moving forward I can use the 48G as the max split.

its ok you can just keep on doing what you're doing, thanks for the quants sir

Hi @Henk717

Thanks for your feedback, I didn't know the number of splits would matter since Llama.cpp loads them from the first split. I try to keep the number of splits between 2 or 3 unless they are very big. (there was a bug in max-size by GB, so I go with number of tensors, so it is not as accurate)

Hopefully moving forward I can use the 48G as the max split.

Number of tensors explains a lot on why all the gguf-split uploaders have been doing odd upload amounts, makes perfect sense now!
The llama-anon account is known to troll what I do on Huggingface but ill still elaborate a bit on the points for full disclosure to explain each stance and why it matters.

The comment about not having to merge them isn't a valid comment in this context, people who merely wish to load a functional model indeed don't have to merge them thanks to gguf-split. But I had multiple people ask how to merge files because they dislike having split models in their collection. So while not a functional neccesity its user preference to have a single GGUF when possible.

The concept that these are 3 files therefore my argument is invalid also does not apply, because I mean as a maximum based on the 50GB file limit. You'd formerly only see 3 different GGUF files on very large models and on the higher sizes of those models. Smaller sizes would still be only 1 or 2 files. So its not a valid counter to my less is more request.

The last 3 things argued I don't think are relevant points and mostly stem from the fact this user was banned on our Discord as evidenced by the (((community)) remark.

The llama-anon account is known to troll what I do on Huggingface

what I do

What is it that you do, exactly?

But I had multiple people ask how to merge files because they dislike having split models in their collection. So while not a functional neccesity its user preference to have a single GGUF when possible.

This preference is specific to your niche community. There are practical advantages to uploading split GGUFs to Huggingface. For instance, download speeds are less likely to be throttled when downloading multiple files, and large models can be distributed across multiple SSDs if necessary.

The concept that these are 3 files therefore my argument is invalid also does not apply,

If you are referring to the "3", it was a response to your empty list item.

The last 3 things argued I don't think are relevant points and mostly stem from the fact this user was banned on our Discord as evidenced by the (((community)) remark.

A) I did not see a reason to provide a full response as it is already addressed in the other points. Additionally, your users can organize the split GGUFs into a folder for a cleaner and more organized appearance.

B) If you wish to criticize someone's free work on the internet, it would be more constructive to do it better than them and only afterward start crying about how "bad" their work is.

C) I am adding my input to this discussion.

Sign up or log in to comment