Inference on non-NVIDIA GPUs?

#9
by pmartin2000 - opened

Hi, I was hoping to use AWS inferentia2 (https://aws.amazon.com/ec2/instance-types/inf2/) instances which have the Neuron GPU instead of NVIDIA's.
Pardon my noob question but since CUDA doesn't work with Neuron, is there any way for me to get this to work with some of the latest SQL Coder models you and TheBloke have provided?

Defog.ai org

Hi @pmartin2000 , thanks for raising up this question. The models we release are technically just the weights (following the same architecture as their underlying base models), and you can port these weights + architecture to whichever format you want to use independently of our release on huggingface. I'm not familiar with Neuron but from the Pytorch Neuron docs (https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/index.html), it seems like you should be able to follow the instructions for serving the BERT model (https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.html), swapping out the model name/path for your local version of sqlcoder. All the best with it; do let us know how it performs on their custom hardware! πŸ˜„

Thanks @wongjingping I'll look into this and try out Neuron's lib.

pmartin2000 changed discussion status to closed

Sign up or log in to comment