How much 3090-hours did you need for the training epoch?
Hi Bram,
thanks for this great work (and
@yhavinga
of course)! I have a question. I am in the process of arranging compute to
finetune a Dutch language model of about the same size with about the same sized dataset. You used 4x3090's, do you still remember the wall clock time and the average GPU-loading?
As for the finetuning, I assume you trained all layers?
Groeten,
Bram
HI @UMCU . This flew below the radar, sorry for not catching it!
To be honest I do not remember all the details, but if you look at the train_results.json file you can see some metrics there, like the runtime in seconds and how many samples were processed per second at a context length of 4096 (https://huggingface.co/BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny/blob/main/train_results.json#L6).
For finetuning I did not finetune all layers due to limited compute. I used QLoRA. If you have the compute, I would recommend doing a full finetune indeed, or at least LoRA with all linear layers.
Hope that helps!
Bram
Thanks for the reply Bram!