Export Aquilachat-7b to ONNX via Optimum Failed

#2
by sammysun0711 - opened

Hi, @qhduan , first of all, thanks for your great work, it is really helpful for community for Chinses LLM.
As I see in modeling_aquila.py, this model can re-use most of structure of LlaMA, and Optimum supports LlaMA ONNX export.
So I save Aquilachat-7b model locally and try to export ONNX model as follow:

optimum-cli export onnx --model aquilachat-7b \
    --task text-generation --trust-remote-code \
    --framework pt --opset 17 onnx

Here I met following issue during shape inference:

~/anaconda3/envs/aigc/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py
309 in _create_node
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
RuntimeError: ScalarType ComplexFloat is an unexpected tensor scalar type 

I checked that node: %443 indeed a Tensor with ComplexFloat type in self_attn.
Since Complex types is a known limitation for the ONNX exporter: https://github.com/pytorch/pytorch/issues/59246, could you please share any workaround to export ONNX model?

node:  %442 : Tensor = onnx::Transpose[perm=[0, 2, 1, 3]](%441), scope: transformers_modules.aquilachat-7b.modeling_aquila.LlamaForCausalLM::/transformers_modules.aquilachat-7b.modeling_aquila.LlamaModel::model/transformers_modules.aquilachat-7b.modeling_aquila.LlamaDecoderLayer::layers.0/transformers_modules.aquilachat-7b.modeling_aquila.LlamaAttention::self_attn

value_t:  tensor([[ 1.0000+0.0000e+00j,  1.0000+0.0000e+00j,  1.0000+0.0000e+00j,
          ...,  1.0000+0.0000e+00j,  1.0000+0.0000e+00j,
          1.0000+0.0000e+00j],
        [ 0.5403+8.4147e-01j,  0.6479+7.6172e-01j,  0.7318+6.8156e-01j,
          ...,  1.0000+1.5399e-04j,  1.0000+1.3335e-04j,
          1.0000+1.1548e-04j],
        [-0.4161+9.0930e-01j, -0.1604+9.8705e-01j,  0.0709+9.9748e-01j,
          ...,  1.0000+3.0799e-04j,  1.0000+2.6670e-04j,
          1.0000+2.3096e-04j],
        ...,
        [-0.8799+4.7523e-01j,  0.7803+6.2535e-01j, -0.9998+1.9127e-02j,
          ...,  0.8079+5.8938e-01j,  0.8547+5.1911e-01j,
          0.8904+4.5525e-01j],
        [-0.8753-4.8361e-01j,  0.0292+9.9957e-01j, -0.7446-6.6752e-01j,
          ...,  0.8078+5.8951e-01j,  0.8546+5.1922e-01j,
          0.8903+4.5535e-01j],
        [-0.0660-9.9782e-01j, -0.7424+6.6991e-01j, -0.0900-9.9594e-01j,
          ...,  0.8077+5.8963e-01j,  0.8546+5.1934e-01j,
          0.8903+4.5545e-01j]])
node:  %443 : Tensor = onnx::Constant[value=<Tensor>]()

Aquila seems to use the META's official RoPE implementation (but a little different in float16), HuggingFace transformers' Llama re-implementation it, but it has some different with META's.

That's why I replace RoPE code from transformers' Llama to META's, I really don't have time to check what's the different and fix it, maybe later, it would be great if you could help.

Sign up or log in to comment