Can't get it to work on Runpod
I've been unable to get models over about 70B parameters to run on Runpod using webchat-UI—not GGUF or GPTQ—no matter what I try. I'm using multiple processors so it shouldn't be a hardware problem. My guess is that I'm not understanding something basic like how to fetch models that have been segmented due to file size or configure setups with multiple processors.
I just want to run for inferencing; I'm not trying to do anything fancy here.
Suggestions?
Total noob here, but the only way I could find to join split files was to use LM Studio to download the models; there are other ways, none of them worked for me or looked like they'd suck me into linux debugging hell.
Currently able to run this model at Q5_K_M on my local machine, so I know the download process works.