Is the 8-bit gptq of 8b base model available?

#3
by AshTaurus - opened

I actually needed the base model for my use case. It will be very helpful if you can upload that. Thanks.

Astronomer org

Hey, we are delaying the release of the base non-instruct model quants due to an under investigation bug in llama 3.
See the link here: https://twitter.com/danielhanchen/status/1781395882925343058.
There are some tokens in the base model that are under trained which terrible training results.

I think the solution has been found so we may release the 2 models very soon either today or tomorrow.

Are you looking to fine-tune on the base?

Astronomer org

@AshTaurus Here it is: https://huggingface.co/astronomer-io/Llama-3-8B-GPTQ-8-Bit. If you are doing instruct fine tuning please read the top of the read me file. I may either release a script in the folder or release a patched version of the model with the average value of all the embedding dimensions (vector components) for special tokens initialized to the mean so you don't get exploding gradients or NaN gradients during training

Thanks ❤️

AshTaurus changed discussion status to closed

Sign up or log in to comment