BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny · How much 3090-hours did you need for the training epoch?

UMCU

Nov 21, 2023

Hi Bram,

thanks for this great work (and @yhavinga of course)! I have a question. I am in the process of arranging compute to
finetune a Dutch language model of about the same size with about the same sized dataset. You used 4x3090's, do you still remember the wall clock time and the average GPU-loading?

As for the finetuning, I assume you trained all layers?

Groeten,

Bram

BramVanroy

Owner Dec 4, 2023

HI @UMCU . This flew below the radar, sorry for not catching it!

To be honest I do not remember all the details, but if you look at the train_results.json file you can see some metrics there, like the runtime in seconds and how many samples were processed per second at a context length of 4096 (https://huggingface.co/BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny/blob/main/train_results.json#L6).

For finetuning I did not finetune all layers due to limited compute. I used QLoRA. If you have the compute, I would recommend doing a full finetune indeed, or at least LoRA with all linear layers.

Hope that helps!

Bram

BramVanroy changed discussion status to closed Dec 13, 2023

UMCU

Dec 13, 2023

•

edited Dec 13, 2023

Thanks for the reply Bram!