How much 3090-hours did you need for the training epoch?

#2
by UMCU - opened

Hi Bram,

thanks for this great work (and @yhavinga of course)! I have a question. I am in the process of arranging compute to
finetune a Dutch language model of about the same size with about the same sized dataset. You used 4x3090's, do you still remember the wall clock time and the average GPU-loading?

As for the finetuning, I assume you trained all layers?

Groeten,

Bram

HI @UMCU . This flew below the radar, sorry for not catching it!

To be honest I do not remember all the details, but if you look at the train_results.json file you can see some metrics there, like the runtime in seconds and how many samples were processed per second at a context length of 4096 (https://huggingface.co/BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny/blob/main/train_results.json#L6).

For finetuning I did not finetune all layers due to limited compute. I used QLoRA. If you have the compute, I would recommend doing a full finetune indeed, or at least LoRA with all linear layers.

Hope that helps!

Bram

BramVanroy changed discussion status to closed

Thanks for the reply Bram!

Sign up or log in to comment