mpt-7b-instruct-sharded

#2
by jprafael - opened

What are the steps required to replicate this for mpt-7b-instruct?

Analytics Club at ETH Zürich org
edited May 22, 2023

Hey - if it's useful, I can take a look at replicating this for mpt-7b-instruct, but it might take me some time to get around to it.

The short version of how to DIY this is:

  1. load the model as it says on the original mosaicML model card
  2. if you want to have it on the hub, make a new model repo & clone your repo locally
  3. follow the transformers docs for saving a sharded model checkpoint & save it and the tokenizer to my_model_dir.
    • For this, I used model.save_pretrained(my_model_dir, max_shard_size="2GB"), but you can change the shard size as needed.
  4. to add basic support for device_map="auto", gradient checkpointing, etc., update the relevant .py files as on this model - see the commit history
  5. now you can use it like this one/push to hub/etc

@pszemraj Thanks! I'll try to run this tomorrow

@pszemraj I was able to replicate this easily with the instructions you provided. For anyone interested, the resulting model weights are available at jprafael/mpt-7b-instruct-sharded.

jprafael changed discussion status to closed
Analytics Club at ETH Zürich org

awesome! great stuff. BTW, I am discussing with a user on this discussion post - there may be some additional updates to make sure that everything works with device_map="auto" specifically in the case of a multi-GPU setup. I have tested inference and fine-tuning with a single GPU and everything works fine, so don't worry about this if multi-gpu is irrelevant for you 👍

I'll reply here/ping you if/when that happens, but just FYI.

Currently I'm just using a single GPU, but I'm happy to incorporate the changes on my side when they're done.

Analytics Club at ETH Zürich org

will keep you posted!

pszemraj pinned discussion

Sign up or log in to comment