how set hardware and software config for training?

by lIlBrother - opened Jul 26, 2023

Jul 26, 2023

first of all, thx for model upload.

by the way, i want to further learning on this model, and i have A100 80GB * 8EA.
i'm using deepspeed zero3 but i can not training 😥 (gpu or cpu resource is insufficient)

please little tip for me?🤗

jondurbin

Owner Jul 27, 2023

I used qlora, so I only needed one A100 80GB. Here's exactly what I did: https://gist.github.com/jondurbin/87fc040b92a3073125ed516b04bc6e19

When I do full fine-tunes (13b/7b), I typically do something like this:
https://gist.github.com/jondurbin/7183e6edcc5cb57d5f544614d0ce0503

I have not tried a full fine tune of 70b yet, but will be taking a stab at it soon. If I have success, I will post a gist.

pankajmathur

Jul 27, 2023

Thanks Jon, I am a fan of your work. I truly appreciate all the hard work you put into this and then sharing it with community. I know it a lot lot do especially as a small or 1 person team :) but it's very rewarding too, so yeah please keep up the good work and congrats on reaching #2 on Open LLM Leader board.

lIlBrother

Aug 1, 2023

@jondurbin hi i successed full finetune of llama-2-70b.
it is so hard to set hardware.

i used gcp a100 80GB 8EA
and use deepspeed zero-3 and optimizer offload, not offload for parameter.

it can be done very close to explosive memory
hardwork!

lIlBrother

Aug 1, 2023

ah, and it is the best important thing. i used adafactor optimizer!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment