8-bit quantization model

#2
by mrm8488 - opened
BERTIN Project org

As seen here: https://huggingface.co/spaces/bertin-project/bertin-gpt-j-6B/discussions/1#633aeb9acbdbadd99c070c74
With the new feature that automatically quantizes the model weights to 8 bits, IMHO, It does not make sense to create a separated and already quantized model. What do you think @versae ?

BERTIN Project org

Yeah. It seems the LoRA work might not be maintained in the future, so maybe using the int8 feature in transformers is the way to go. As I see it, there should be some way to serialize the model in int8 so we can create a branch in the model repo that automatically loads in int8.

BERTIN Project org

I have already done it with the latest ckpt (https://huggingface.co/mrm8488/bertin-gpt-j-6B-ES-v1-8bit). Do I create a branch and push it there?

BERTIN Project org

That'd be great!

Sign up or log in to comment