Add fp16/int8 weights

#2
by mkshing - opened

This PR enables to use this model with Colab Free plan by int8 quantization.
Here's the link to the demo in colab.

https://colab.research.google.com/github/mkshing/notebooks/blob/main/stabilityai_japanese_stablelm_alpha_7b.ipynb

mkshing changed pull request status to open
Stability AI org

Generally LGTM! by the way, if we don't include variant="int8" in the from_pretrained, it will just load the original fp32 version, is that correct?

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    variant="int8",
    low_cpu_mem_usage=True,
    load_in_8bit=True,
)

Exactly!
So, if I'm correct, it loads fp32 weights first and convert to int8 in this case.


model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
-   variant="int8",
    low_cpu_mem_usage=True,
    load_in_8bit=True,
)
Stability AI org

nice! let's merge this. By the way, do you want to also include the variant as a colab dropdown (with default use int8) like model_id so people can be aware of that?

leemeng changed pull request status to merged

@leemeng sure! I will add some comments that only int8 works in colab free.

Sign up or log in to comment