NousResearch/Yarn-Mistral-7b-128k · smaller shards, pls

Thanks for making this!
I was wondering if you can save the model into smaller shards before pushing it to the hub. This is helpful for those who (like me, yes ^^ ) don't have access to large compute resources and want to check out things on Colab first.

I recently came across abhishek's setup here
https://huggingface.co/abhishek/llama-2-7b-hf-small-shards/tree/main
where there is a llama2-7b model sharded across 10 files. This means I can load it in Colab (using 4bit) without going out of system memory
https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_LLM.ipynb
..which is nice :)

I don't think it's a lot of work to make smaller shards, provided you can load the model in the first place. I think something like this would do it

model = AutoModelForCausalLM.from_pretrained(....)
tokenizer = AutoTokenizer.from_pretrained(...)
model.save_pretrained(path, max_shard_size="3GB")
tokenizer.save_pretrained(path)