bigscience/mt0-xxl · Adding training to mt0-xxl

Mar 25, 2023

So I’m looking at using a bloomz or mt0 model, preferably fitting in a single A10g GPU to run on AWS G5, for a sort of “intelligent search engine”: we give it a question about code or infrastructure, and it replies with what it knows based on documents fed to it. In order to give useful answers for this, it needs to know stuff from documents and code, Confluence/Sharepoint and internal GitHub respectively.
So:

How would I fine-tune an mt0 or bloomz model? Would something like LoRA work? Do I need to tokenize fine tuning inputs, and if so, how?
What is the most powerful model that fits in 24 gigs of VRAM for training? I’d prefer to use mt0-xxl for its impressive performance even against full Bloomz but I understand that might not be feasible, alternatives welcome!
Any recommendations for inference deployment? Should I switch AWS instances, change program, etc..?

Sorry if this is too much, I’m new to this field, backend and SRE by trade.

Muennighoff

BigScience Workshop org Jun 27, 2023

Sorry for the late reply!

Yes LoRA works. I would try them for your tasks first though, maybe no further fine-tuning is needed
You can fit mt0-xxl into that much with quantization / low-precision. Else mt0-xl is also good.
No recom. here - many things are possible.

Let me know if you have questions!

christopher changed discussion status to closed Jul 3, 2024