Not sure about hyperparam `test-size` during fine-tuning

#13
by opyate - opened

Hello,

I'm fine-tuning Dolly with my own data.

However, there's a test-size hyperparameter, which I'm not sure about. I can't find mention of it in their repo or pypi.

# https://github.com/databrickslabs/dolly#a10-gpus-1
!deepspeed {num_gpus_flag} \
    --module training.trainer \
    --input-model {input_model} \
    --deepspeed {deepspeed_config} \
    --epochs 2 \
    --local-output-dir {local_output_dir} \
    --dbfs-output-dir {dbfs_output_dir} \
    --per-device-train-batch-size 3 \
    --per-device-eval-batch-size 3 \
    --logging-steps 10 \
    --save-steps 200 \
    --save-total-limit 20 \
    --eval-steps 50 \
    --warmup-steps 50 \
    --test-size 200 \
    --lr 5e-6

My training set is 1,000 datapoints. What should the hyperparams, especially test-size be to suit the training size?

Databricks org

Most of the actual training configuration is in the HF Trainer: https://github.com/databrickslabs/dolly/blob/master/training/trainer.py#L236
These arguments to deepspeed tell deepspeed about the training also, so sometimes it's a little repetitive. Here I guess deepspeed also wants to know how big the test set size is. It's possible it's actually redundant, I haven't looked closely.

Thanks for the pointers, Sean. It might very well be redundant.

opyate changed discussion status to closed
Databricks org

Oh, wait I'm misreading this. --test-size is how you pass the argument through deepspeed down to Trainer. It's not redundant. It's just saying how much of the dataset to hold out for eval.

Databricks org

With only 1000 data points the test size may not be very useful here. I would consider just setting it very low (like 1 or 10) and perhaps ignoring the eval loss. You could try running generation at different checkpoints to see how the quality actually looks.

Thanks. I see the dolly-15k dataset only has "train". Mine has "train" and "test". I suppose I have to put it all in "train", for a few more data points.

Databricks org

You can also just modify the code to load your train and test set, instead of randomly splitting test out of train

Deepspeed exited with -9 on a fine-tuning run against the 7b model. I guess it means OOM. (Using 4x A10 GPUs)

I'll look into trying Parameter-Efficient Tuning.
Or perhaps try with 8x A100 (p4d instance).

Databricks org

What model size are you using and what instance? that should not be needed. https://github.com/databrickslabs/dolly#a10-gpus

I'm trying to fine-tune the 7b model with learning rate 5e-8 (to not clobber the weights too much) and number of epochs 2.

Here's the log.

I use 4 x A10G GPUs:

Python 3.9.5
torch:  1.13 ; cuda:  cu117
Mon May  1 12:23:15 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         Off  | 00000000:00:1B.0 Off |                    0 |
|  0%   19C    P8     9W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G         Off  | 00000000:00:1C.0 Off |                    0 |
|  0%   18C    P8     9W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G         Off  | 00000000:00:1D.0 Off |                    0 |
|  0%   19C    P8     9W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
|  0%   19C    P8     8W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

The instance:

image.png

Deepspeed exit code -9 again with learning rate at 5e-7 and 5e-6 (the original rate).

Log for 5e-7
Log for 5e-6

Databricks org

Learning rate is not related to mem usage. Are you following the instructions in the repo for changing the code when using A10s? it will not work out of the box unless you modify some settings as described there.

Databricks org

Oh, you have a g5.12xlarge. Try g5.24xlarge. I think that's not enough RAM to load the model 4x into memory

Ah, sorry - yes, that worked, and it took 5 hours to fine-tune my model. Thanks for your help!

Hello opyate!
I'm also looking to fine tune dolly on ec2 g5.24xlarge instance.

!deepspeed {num_gpus_flag}
--module training.trainer
--input-model {input_model}
--deepspeed {deepspeed_config}
--epochs 2
--local-output-dir {local_output_dir}
--dbfs-output-dir {dbfs_output_dir}
--per-device-train-batch-size 3
--per-device-eval-batch-size 3
--logging-steps 10
--save-steps 200
--save-total-limit 20
--eval-steps 50
--warmup-steps 50
--test-size 200
--lr 5e-6


Did you run this command directly on terminal or was it part of another file? how are the values to variables in {} are passed?

I'm looking into tutorials for using deepspeed but not been able to crack it. It would be great if you share how you used deepspeed to fine tune Dolly!
One tutorial mentioned to run 'accelerate config' first and answer a bunch of questions. Is that the way to proceed?

Thanks,
Abhilash

Did you run this command directly on terminal or was it part of another file? how are the values to variables in {} are passed?

Hi, you can clone the dolly repo into Databricks, then open this notebook, and it's all there. Then just follow the extra guidance for A10 GPUs.

Databricks org

deepspeed docs: https://deepspeed.readthedocs.io/en/latest/
accelerate is a different library.
You already have a working example linked from this model card: https://github.com/databrickslabs/dolly

Many thanks both of you! I have been able to train the dolly-v2-3B model on the 15k dataset. It has reached epoch = 0.41 and I hope it doesn't get into any errors.

My original aim, though, was to fine tune the dolly-v2-3b on my custom data (Summarisation/Extraction). I have the data ready in csv format. I just have to adapt it to the jsonl format.

  1. The dolly 15k data has 4 fields - {"instruction": β€œ", "context": "", "response": , "category": ""}. Is it okay to leave some of them blank?
  2. My data has many "\n" in it. Should I get rid of them?

Any other thing to look out for?

Thanks

Databricks org

Category is actually unused. Context can be blank, yes, you can see that in some entries. You can see how it turns the fields into a string with prompt here: https://github.com/databrickslabs/dolly/blob/master/training/trainer.py#L109 (You could even change the code to do whatever you want; in the end all you are feeding the model are strings)

Sign up or log in to comment