Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision

Help! How to Run an A.I with an AMD GPU (Rx 580 8Gb)

#5
by A2Hero - opened

Lads is there any one know how to do that?
even the chat gpt 4 can't help me well.
I don't know if I doing something wrong or not but I feel like there is way like bypassing ROCm in windows with help of wsl 2 or something like that. Please any one have a solution for it ?

13B GPTQ models is about 14GB size, which won;t fit on 8GB cards, There is no offloading technique for GPTQ so far, you probably need to refer to flexgen and with unquantized weights

There is offloading in GPTQ-for-LLaMa but it's really, really slow, and I don't know if it works for ROCm implementations of GPTQ-for-LLaMa. ExLlama has ROCm but no offloading, which I imagine is what you're referring to.

But it sounds like the OP is using Windows and there's no ROCm for Windows, not even in WSL, so that's a deadend I'm afraid.

@A2Hero I would suggest you use GGML, which can work on your AMD card via OpenCL acceleration.

@Yhyu13 I meant any kidna of a.i model. even 6b or lower.

@TheBloke
sure i will try ggml with something like TheBloke/orca_mini_3B-GGML. (my cpu is i5 4690 and i have 16ram)
but i really hope that someday ( and i hope is near ) amd support rocm in windows or anything can help to run TheBloke/wizard-mega-13B-GPTQ.
thanks for the advice!

Sign up or log in to comment