Loading the model takes a long time using from_pretrained

#29
by Zhaoz1997 - opened

I downloaded the large model to the local, but it often takes ten minutes to load the local model, is this related to the cpu processor or memory? Since it only takes me ten seconds to load the model with a high performance computer, in short, how can I reduce the speed of loading the model

processor = AutoProcessor.from_pretrained("./seamless-m4t-v2-large/", local_only=True)
model = SeamlessM4Tv2Model.from_pretrained("./seamless-m4t-v2-large/")

Hey @Zhaoz1997 - you can install the accelerate package as follows:

pip install --upgrade accelerate

And then set low_cpu_mem_usage=True in the call to from_pretrained:

from transformers import SeamlessM4Tv2Model

model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large", low_cpu_mem_usage=True)

This will reduce the load time considerably. You can read how this saves you RAM and reduces the load time in the docs. Also, there's no need to download the weights locally: after you load the weights from the Hub once (e.g. from the facebook org on the Hub, as in the code example above), they are cached locally, meaning they are not re-downloaded. See docs for details.

Sign up or log in to comment