defog/sqlcoder-34b-alpha · Inference on non-NVIDIA GPUs?

pmartin2000

Jan 31, 2024

Hi, I was hoping to use AWS inferentia2 (https://aws.amazon.com/ec2/instance-types/inf2/) instances which have the Neuron GPU instead of NVIDIA's.
Pardon my noob question but since CUDA doesn't work with Neuron, is there any way for me to get this to work with some of the latest SQL Coder models you and TheBloke have provided?

wongjingping

Defog.ai org Feb 1, 2024

Hi @pmartin2000 , thanks for raising up this question. The models we release are technically just the weights (following the same architecture as their underlying base models), and you can port these weights + architecture to whichever format you want to use independently of our release on huggingface. I'm not familiar with Neuron but from the Pytorch Neuron docs (https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/index.html), it seems like you should be able to follow the instructions for serving the BERT model (https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.html), swapping out the model name/path for your local version of sqlcoder. All the best with it; do let us know how it performs on their custom hardware! 😄

pmartin2000

Feb 1, 2024

Thanks @wongjingping I'll look into this and try out Neuron's lib.

pmartin2000 changed discussion status to closed Feb 1, 2024