Hardware

#5
by Tom-Neverwinter - opened

What hardware was this model run on?

how much ram is expected to be used either with vram or ram?

Shinoji Research org

Q8 needs about 80GB vram, there are some other quants available for lower end systems. I usually use vast.ai or runpod for making & testing these models.

What the hardware need for fine-tuning it?

Shinoji Research org

1xh100 on qlora, >4x (5 or more, I'd use 8) 80 gb GPUs LORA.

I tried running the model as is with only 76ish GB of vram, I ordered another card so in theory I can run it. if the model ever moves to mamba it will wipe the floor with everyone.

I cant seem to run it off my system ram for some reason? 128GB

I will have to make sure I make the qlora properly

Shinoji Research org

What are you using for inferencing?

No reason you wouldn't be able to run system ram, are you using FP16, or Q8? Q8 needs about 80gb, FP16 would need 160.

the aim was fp16, but with this information its very clear its going to be a q8?
https://pastebin.com/wRVCpcep [system specs]

I can load the model when I use system ram, however it will never give a response in oobabooga

Shinoji Research org

I wonder if that's just a performance issue. On an EPYC 9654 (very very high memory bandwidth) I get 2 toks / second in Q8.

Might be worth trying some of the lower quants other people have made, are you able to run any of the Miqudev releases?

I have not tried yet. I'll download one and try it

Sign up or log in to comment