What was the cost of the training?

#12
by ssoltani - opened

you mentioned for the training you used 8XA100 80G. I was wondering what was the total time of training and what was the cost?

DeepSpeed-ZERO-2 + QLoRA + Flash Attention2, with a batch size of 16, 8xA100(80G) takes 5-7 hours for training, each card takes up 70G VRAM. The training time depends on your machine set-up and may vary a lot.

Sign up or log in to comment