`model.onnx` to `model_fp16.onnx`
Hello, I'm new to ONNX, and I'm trying to follow your path. I've successfully exported a model to ONNX, and now I want to reduce the precision to FP16. Could you please show me an example of such a conversion from model.onnx
to model_fp16.onnx
? Thank you very much
Sure, here's my float16 conversion script: https://github.com/huggingface/transformers.js/blob/main/scripts/float16.py
Thank you so much for your answer, today I spent half a day playing with this code. This is very interesting.
I had to extend DEFAULT_OP_BLOCK_LIST to solve conversion problems.
But as a result, for some reason the model inference breaks and in response I get an array [np. float16(nan), np. float16(nan), np. float16(nan) ...]
infer_shapes_path('model_fp32.onnx', 'model_fp32_infer_shape.onnx')
model = onnx.load('model_fp32_infer_shape.onnx')
op_block_list = DEFAULT_OP_BLOCK_LIST
op_block_list.append('RandomNormal')
op_block_list.append('RandomNormalLike')
tts_model_fp16 = convert_float_to_float16(model, min_positive_val=1e-7, max_finite_val=1e4,
keep_io_types=True, disable_shape_infer=True,
op_block_list=op_block_list,
node_block_list=[]
)
Right now I'm still trying to find the right parameters for convert_float_to_float16, I'd appreciate any help. Thank you again.
Oh right, there are a bunch of nodes I needed to add to the block list. Most of these are in the decoder, which you can see in the following image (see where cast to fp32 nodes are added before certain operations). Hope that helps!
You can view the graph visualization here.