Any plans to share Sharded Models?

#2
by 1littlecoder - opened

Any plans to share Sharded Models? that might make it easy to fit this on Colab

Amazon Web Services org

Hi Thank you for the feedback!

The model is already shaded into 2 parts now.

Any specific shards you are looking for? Can you provide an example? Cheers!

@yinsong1986 Usually for google colab people use models sharded into pieces roughly around 2-3gb each. this makes it easier for the very low system memory restraints of the free google colab tier to load the model into memory before loading into vram (usually restricted to 12gb of system ram)

So you'd end up with maybe 5-10 actual model weight shards in the end rather than 2. Just wanted to further elaborate for you.

@1littlecoder Love your videos btw! big fan β™₯

@1littlecoder Love your videos btw! big fan β™₯

Thanks very much. Very kind of you!

Amazon Web Services org
β€’
edited Oct 20, 2023

Thanks for your explanation! @rombodawg @1littlecoder

If we plan to further shard the model to small pieces, which is easier for you?

  • Option 1: replace the shards in this model repo with smaller shards.
  • Option 2: create a new model repo and upload the same model with more shards there.

Thank you!

I dont know about what 1littlecoder thinks but i highly recommend uploading a new version of the model and call it a (sharded) version, and keep the original model as well. As some user often prefer sharded model, and others prefer having to download less model files.

Thats just my two cents

You can also upload different revisions of the model in this one repo. TheBloke does this extensively with his different gptq fine tune combinations.

Amazon Web Services org

Thanks @ssmi153 for your suggestion!

To the best of my understanding, the Bloke they uploaded the models for different gptq models to one repo, so it is easier to differentiate.

But the request here AFAIK is a bit different. Since the current model is already sharded to 2 shards. If we upload another sharded version of the same model, say 10 shards, it may confuse library like HF transformers, to read the correct model files? Pls correct me if I am wrong, or you are referring to some other solution? Thank you!

@yinsong1986 , the Revision option effectively makes a branch of the repo, so the files are separated. By default, users would recieve the files in the "main" branch, but they can also request to pull from one of the other branches (e.g. you could create one called "smallshards") instead. In reality, if it's easier just to create another repo then you may as well do that :) I was just letting you know that this was an option.

Amazon Web Services org

Thanks for all your feedback!

@1littlecoder @rombodawg @ssmi153

Now I have uploaded the model with smaller shards in a new branch https://huggingface.co/amazon/MistralLite/tree/small-shards

You should be able to load the model with smaller shards in that branch. pls have a try and let me know how it goes, and thank you!

Amazon Web Services org
This comment has been hidden
yinsong1986 changed discussion status to closed

Sign up or log in to comment