Training dolly with deepspeed

#56

by niktarkon - opened Apr 26, 2023

Discussion

niktarkon

Apr 26, 2023

Hi all!
Has anyone trained one of the dolly models with deepspeed? Please share the following information:

What kind of dolly did you train?
In what environment and on what type of GPU?
How much is the GPU consumed in general when deepspeed is running? (Obviously it depends a lot, but I would like to know at least a lower estimate of how much GPU you need)

srowen

Databricks org Apr 26, 2023

Yes - that is what the provided training code does! https://github.com/databrickslabs/dolly
You can tune the 3B, 7B, and 12B models.
Use an A100 if you can; the repo has notes about training on other instances.
How many GPU hours? depends on the type, number of GPUs, model you tune, input size, etc. You can expect about 100+ hours of A100 time for fine-tuning several epochs of the 12B model, as a data point.

niktarkon

Apr 26, 2023

Yes - that is what the provided training code does! https://github.com/databrickslabs/dolly
You can tune the 3B, 7B, and 12B models.
Use an A100 if you can; the repo has notes about training on other instances.
How many GPU hours? depends on the type, number of GPUs, model you tune, input size, etc. You can expect about 100+ hours of A100 time for fine-tuning several epochs of the 12B model, as a data point.

Thank you!
Could you tell me if there are requirements specifically for system RAM?

I decided to experimentally run in a base colab (I know I need a more powerful processor, but I wanted to make sure everything runs for me) and found that the training could not continue due to the system's RAM being full.

P.s.I tried to train dolly 2.8b on the dataset that is used for training by default in https://github.com/databrickslabs/dolly. I took the config and other settings from there.

srowen

Databricks org Apr 26, 2023

GPU mem is really the limiting factor, not system RAM, but you'll need probably 64GB of VM RAM to work with the 12B model to be safe.
Colab instances will not be powerful enough.

srowen changed discussion status to closed Apr 28, 2023

niktarkon

Jun 19, 2023

Could you tell me how many A100 is best to train 12b?

srowen

Databricks org Jun 19, 2023

8 A100 is probably sufficient to train relatively rapidly but depends on your data, settings, requirements

gaurav-mac

Jul 20, 2023

•

edited Jul 26, 2023

hi,
dolly-v2-3b is not working well on my dataset for closed-domain open-book QA even after tuning 'temperature', 'prompt', 'instruction', 'context length', 'data quality' etc.
I am not tied to any organisation, so before purchasing 8xA100 40GB GPUs, I'd like to make sure my understanding if my approach is apt about training size

Question: To train dolly-v2-3b, for a new domain (closed-domain open-book QA), how many samples are good enough?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment