Any plans to share Sharded Models?

by 1littlecoder - opened Oct 18, 2023

Discussion

1littlecoder

Oct 18, 2023

Any plans to share Sharded Models? that might make it easy to fit this on Colab

yinsong1986

Amazon Web Services org Oct 18, 2023

Hi Thank you for the feedback!

The model is already shaded into 2 parts now.

Any specific shards you are looking for? Can you provide an example? Cheers!

rombodawg

Oct 19, 2023

@yinsong1986 Usually for google colab people use models sharded into pieces roughly around 2-3gb each. this makes it easier for the very low system memory restraints of the free google colab tier to load the model into memory before loading into vram (usually restricted to 12gb of system ram)

So you'd end up with maybe 5-10 actual model weight shards in the end rather than 2. Just wanted to further elaborate for you.

rombodawg

Oct 19, 2023

@1littlecoder Love your videos btw! big fan ♥

1littlecoder

Oct 19, 2023

@1littlecoder Love your videos btw! big fan ♥

Thanks very much. Very kind of you!

yinsong1986

Amazon Web Services org Oct 20, 2023

•

edited Oct 20, 2023

Thanks for your explanation! @rombodawg @1littlecoder

If we plan to further shard the model to small pieces, which is easier for you?

Option 1: replace the shards in this model repo with smaller shards.
Option 2: create a new model repo and upload the same model with more shards there.

Thank you!

rombodawg

Oct 20, 2023

•

edited Oct 20, 2023

I dont know about what 1littlecoder thinks but i highly recommend uploading a new version of the model and call it a (sharded) version, and keep the original model as well. As some user often prefer sharded model, and others prefer having to download less model files.

Thats just my two cents

ssmi153

Oct 25, 2023

You can also upload different revisions of the model in this one repo. TheBloke does this extensively with his different gptq fine tune combinations.

yinsong1986

Amazon Web Services org Oct 26, 2023

Thanks @ssmi153 for your suggestion!

To the best of my understanding, the Bloke they uploaded the models for different gptq models to one repo, so it is easier to differentiate.

But the request here AFAIK is a bit different. Since the current model is already sharded to 2 shards. If we upload another sharded version of the same model, say 10 shards, it may confuse library like HF transformers, to read the correct model files? Pls correct me if I am wrong, or you are referring to some other solution? Thank you!

ssmi153

Oct 27, 2023

@yinsong1986 , the Revision option effectively makes a branch of the repo, so the files are separated. By default, users would recieve the files in the "main" branch, but they can also request to pull from one of the other branches (e.g. you could create one called "smallshards") instead. In reality, if it's easier just to create another repo then you may as well do that :) I was just letting you know that this was an option.

yinsong1986

Amazon Web Services org Oct 27, 2023

Thanks for all your feedback!

@1littlecoder @rombodawg @ssmi153

Now I have uploaded the model with smaller shards in a new branch https://huggingface.co/amazon/MistralLite/tree/small-shards

You should be able to load the model with smaller shards in that branch. pls have a try and let me know how it goes, and thank you!

vvaaee

Amazon Web Services org Oct 31, 2023

This comment has been hidden

yinsong1986 changed discussion status to closed Nov 4, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment