Training dolly with deepspeed

#56
by niktarkon - opened

Hi all!
Has anyone trained one of the dolly models with deepspeed? Please share the following information:

  1. What kind of dolly did you train?
  2. In what environment and on what type of GPU?
  3. How much is the GPU consumed in general when deepspeed is running? (Obviously it depends a lot, but I would like to know at least a lower estimate of how much GPU you need)
Databricks org

Yes - that is what the provided training code does! https://github.com/databrickslabs/dolly
You can tune the 3B, 7B, and 12B models.
Use an A100 if you can; the repo has notes about training on other instances.
How many GPU hours? depends on the type, number of GPUs, model you tune, input size, etc. You can expect about 100+ hours of A100 time for fine-tuning several epochs of the 12B model, as a data point.

Yes - that is what the provided training code does! https://github.com/databrickslabs/dolly
You can tune the 3B, 7B, and 12B models.
Use an A100 if you can; the repo has notes about training on other instances.
How many GPU hours? depends on the type, number of GPUs, model you tune, input size, etc. You can expect about 100+ hours of A100 time for fine-tuning several epochs of the 12B model, as a data point.

Thank you!
Could you tell me if there are requirements specifically for system RAM?

I decided to experimentally run in a base colab (I know I need a more powerful processor, but I wanted to make sure everything runs for me) and found that the training could not continue due to the system's RAM being full.

P.s.I tried to train dolly 2.8b on the dataset that is used for training by default in https://github.com/databrickslabs/dolly. I took the config and other settings from there.

Databricks org

GPU mem is really the limiting factor, not system RAM, but you'll need probably 64GB of VM RAM to work with the 12B model to be safe.
Colab instances will not be powerful enough.

srowen changed discussion status to closed

Could you tell me how many A100 is best to train 12b?

Databricks org

8 A100 is probably sufficient to train relatively rapidly but depends on your data, settings, requirements

hi,
dolly-v2-3b is not working well on my dataset for closed-domain open-book QA even after tuning 'temperature', 'prompt', 'instruction', 'context length', 'data quality' etc.
I am not tied to any organisation, so before purchasing 8xA100 40GB GPUs, I'd like to make sure my understanding if my approach is apt about training size

Question: To train dolly-v2-3b, for a new domain (closed-domain open-book QA), how many samples are good enough?

Sign up or log in to comment