eachadea/vicuna-13b-1.1 vs TheBloke/vicuna-13B-1.1-HF

#2
by Thireus - opened

I'm a bit puzzled about the size difference between eachadea/vicuna-13b-1.1 and TheBloke/vicuna-13B-1.1-HF.

Looking at the description of both models it would appear they are both transformation of lmsys/vicuna-13b-delta-v1.1 with delta files applied to Llama 13B. Is there anything else I'm missing? Why are the model files much larger in size on TheBloke/vicuna-13B-1.1-HF?

I've also observed the tokenizer_config.json differ with the "content" lines not mentioning </s>.

EDIT: See explanation below

@TheBloke , thank you for looking into this. Could it be the llama-13b-hf that differs?

When I transform Llama-13b to Llama-13-hf it produces pytorch_model-00001-of-00003.bin and pytorch_model-00002-of-00003.bin of 9.9GB each and pytorch_model-00003-of-00003.bin of 6.5GB.

EDIT: see explanation below

Yeah OK I see the same as you. I just did a fresh Llama 13B to HF conversion and yeah it's 9.3 + 9.2 + 6.1G:

tomj@Eddie ~/src/gpt4all-chat/build (master●●)$ ll /Volumes/EVO-2TB-A/Users/tomj/Llama-13B-HF
total 51528392
drwxr-xr-x   11 tomj  staff   352B 15 Apr 16:03 .
drwxr-xr-x+ 169 tomj  staff   5.3K 15 Apr 16:00 ..
-rw-r--r--    1 tomj  staff   507B 15 Apr 16:02 config.json
-rw-r--r--    1 tomj  staff   137B 15 Apr 16:02 generation_config.json
-rw-r--r--    1 tomj  staff   9.3G 15 Apr 16:02 pytorch_model-00001-of-00003.bin
-rw-r--r--    1 tomj  staff   9.2G 15 Apr 16:03 pytorch_model-00002-of-00003.bin
-rw-r--r--    1 tomj  staff   6.1G 15 Apr 16:03 pytorch_model-00003-of-00003.bin
-rw-r--r--    1 tomj  staff    33K 15 Apr 16:03 pytorch_model.bin.index.json
-rw-r--r--    1 tomj  staff     2B 15 Apr 16:03 special_tokens_map.json
-rw-r--r--    1 tomj  staff   488K 15 Apr 16:03 tokenizer.model
-rw-r--r--    1 tomj  staff   141B 15 Apr 16:03 tokenizer_config.json

I would have to guess that there's been some change in the HF format, but I don't know what exactly.

Ahhh yeah I've got it! See this commit to the conversion script: https://github.com/huggingface/transformers/commit/786092a35e18154cacad62c30fe92bac2c27a1e1

They resolved an issue unique to Llama 13B which causes "the checkpoint becoming 37GB instead of 26GB for some reason."

So yeah my guess is that there is redundant or useless data in the older Llama 13B HF releases, and that's been replicated in my release here as well.

I'm going to re-do my conversion to fix it. Though you could equally just use eachadea's instead as that's already correct.

Excellent, I'm glad you figured out the issue. Does it mean TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g is also affected?

I have replaced all the files in this repo, so they're now the correct size.

As regards the GPTQ, I don't think there would be any issue. As I understand the issue, the reason the HF repo was too large was due to certain layers being duplicated. So GPTQ would read the first layer and I assume ignore the second. Certainly the GPTQ file size was correct for the size of model.

But just to be safe, I re-ran the GPTQ on my new 1.1 HF repo to produce a new safetensors file and uploaded it to TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g . The file size was exactly the same as the one I generated before, but the SHA256SUM was different. My guess is that this is because I used a slightly new version of the GPTQ-for-LLaMa code, which is changing every day with various improvements and bug fixes all the time. So it may have done some calculations differently. (To be exact, when re-creating the file today I used the GPTQ-for-LLaMa code as of commit https://github.com/qwopqwop200/GPTQ-for-LLaMa/commit/58c8ab4c7aaccc50f507fd08cce941976affe5e0 - the last commit of yesterday. I didn't use the code from today as they've just done a big refactor so I wanted to avoid any potential new issues in that. Especially as it seems today's changes have meant that it's no longer possible to link the latest GPTQ-for-LLaMa in to ooba's UI, due to them changing some of the function signatures.)

Anyway, whether or not there was an issue with the previous GPTQ I have done a new safetensors file based on the correctly sized HF repo, so if you have any concerns you can re-fetch the safetensors and use that instead of the old one.

I've not yet done the no-act-order.pt but I will do that in a minute.

Thanks for bringing this to my attention!

TheBloke changed discussion status to closed

Sign up or log in to comment