Is it FP32 or FP16 version

#1
by pankajdev007 - opened

Is the ORT model fp32 or fp16, if it is fp32 can you share some way to export it to fp16 to able to fit in 16GB GPU?

ONNXConfig for all org
edited Jan 25, 2023

Hi @pankajdev007 , I used this repo as a base to export the model to ONNX, so I do believe it's fp32.

Check the weights by doing this after the model load:

print(model.dtype)

You can force the weights of your model to be fp16 by doing this in PyTorch:

net = Model()
net.half()

So you can probably do it with Transformers too!

(I found this on this PyTorch thread. It would be best if you were careful about prediction after conversion because of potential NaNs).

Yes.. I tried model.half() but it is not applying on ONNX model, but works on normal transformer model.. I need a way to convert the GPT-j to onnx fp16. I used optimum onnx: python -m optimum.exporters.onnx --task causal-lm-with-past --for-ort --model gpt-j-6B gptj16_onnx/

to convert, but did not get a way to convert it to FP16

ONNXConfig for all org

Could you try to load the PyTorch model, apply model.half(), save the PyTorch model, and then export this saved model to ONNX?

(To store the saved model, you can create a new HF repo for that)

Sign up or log in to comment