Possibility to download backend-specific weights only

#7
by dennlinger - opened

Hi,
thanks first of all for contributing the model weights! One question that I have is whether it would be generally possible to separate the weights for different backends to make it faster to load the model (e.g., only download PyTorch-specific weights, etc.).

One way to do this would be to utilize the revisions provided through the hub, as it is for example done in GPT-J 6B (the float16 version has its own branch there). Of course this is only a sensible suggestion if it can be somehow ensured that the correct branch would be automatically chosen for all backends...

For reference, this is the case for people using this model in a distributed manner through accelerate, where one has to clone the full repository before being able to use the model. It might be that only the relevant files are downloaded when using the .from_pretrained() option through transformers, but I am unsure whether this is the primary use case for large-scale models like this.

Best,
Dennis

Edit: FWIW, the repository in its entirety takes up about 330GB when cloning it locally. Relevant files make up less than 1/6 of it (~50GB), plus overhead for the .git/ folder with another 30GB or so.

dennlinger changed discussion title from Possibility to download backend-specific weights to Possibility to download backend-specific weights only

Sign up or log in to comment