Size mismatch...
Shouldn't this model be the same exact size as the model here https://huggingface.co/decapoda-research/llama-13b-hf-int4 ?
Maybe he used LoRA which adds a few parameters.
still doesn't explain about 500mb extra size, since the lora is about 37mb
also https://huggingface.co/elinas/alpaca-13b-int4 got the same exact size compared to https://huggingface.co/decapoda-research/llama-13b-hf-int4
so still wondering
Hi -- didn't realize I had set this to public already, was going to add a few details. I did use LoRA to finetune this myself, but I agree that doesn't clearly explain the difference in size. I'm not 100% sure what accounts for it, but the workflow I used was to start with the unquantized https://huggingface.co/decapoda-research/llama-13b-hf, then finetune it using peft, merge the adapter in and use GPTQ to convert it to 4bit quantization. So I didn't rely on any public alpaca lora adapters or base models, it's possible that my result differed due to some of the params I used along the way.
I also haven't tested this model extensively, I can't guarantee the quality of inference will be on par with the Stanford example or even alpaca-lora. In fact, I would expect it not to be as I only finetuned on 1 epoch and my lowest learning rate was ~.85.
My main goal was to test the workflow, and I tried to minimize GPU costs, I suspect someone else will produce a better quantized model. If you do end up playing around with the model feel free to share how it performs and if you have any suggestions for tuning or generation params!
Thanks a lot for the explanation
I'll update you if I find anything after testing