nealchandra/alpaca-13b-hf-int4

Mar 18, 2023

Shouldn't this model be the same exact size as the model here https://huggingface.co/decapoda-research/llama-13b-hf-int4 ?

wassname

Mar 18, 2023

Maybe he used LoRA which adds a few parameters.

yehiaserag

Mar 18, 2023

still doesn't explain about 500mb extra size, since the lora is about 37mb
also https://huggingface.co/elinas/alpaca-13b-int4 got the same exact size compared to https://huggingface.co/decapoda-research/llama-13b-hf-int4
so still wondering

nealchandra

Owner Mar 18, 2023

Hi -- didn't realize I had set this to public already, was going to add a few details. I did use LoRA to finetune this myself, but I agree that doesn't clearly explain the difference in size. I'm not 100% sure what accounts for it, but the workflow I used was to start with the unquantized https://huggingface.co/decapoda-research/llama-13b-hf, then finetune it using peft, merge the adapter in and use GPTQ to convert it to 4bit quantization. So I didn't rely on any public alpaca lora adapters or base models, it's possible that my result differed due to some of the params I used along the way.

I also haven't tested this model extensively, I can't guarantee the quality of inference will be on par with the Stanford example or even alpaca-lora. In fact, I would expect it not to be as I only finetuned on 1 epoch and my lowest learning rate was ~.85.

My main goal was to test the workflow, and I tried to minimize GPU costs, I suspect someone else will produce a better quantized model. If you do end up playing around with the model feel free to share how it performs and if you have any suggestions for tuning or generation params!

yehiaserag

Mar 18, 2023

Thanks a lot for the explanation
I'll update you if I find anything after testing

nealchandra
/

alpaca-13b-hf-int4

Size mismatch...