Is it possible to export unixcoder to ONNX format?

#1
by Lyriccoder - opened

If I try to follow the guide (https://huggingface.co/docs/transformers/main/serialization) and try to export the unixcoder model, I see the errors:

    raise RepositoryNotFoundError(
transformers.utils.hub.RepositoryNotFoundError: 401 Client Error: Repository not found for url: https://huggingface.co/unixcoder-base/resolve/main/config.json. If the repo is private, make sure you are authenticated.
OSError: unixcoder-base is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

I tried to export model with local files (located in https://huggingface.co/microsoft/unixcoder-base/tree/main),
but I also get the following error:

    raise ValueError(
ValueError: Unrecognized processor in unixcoder-base. Should have a `processor_type` key in its preprocessor_config.json, or one of the following `model_type` keys in its config.json: clip, flava, layoutlmv2, layoutlmv3, layoutxlm, sew, sew-d, speech_to_text, speech_to_text_2, trocr, unispeech, unispeech-sat, vilt, vision-text-dual-encoder, wav2vec2, wav2vec2-conformer, wav2vec2_with_lm, wavlm

Could you please tell me whether it is possible to export the unixcoder model to ONNX?
If no, could you please make such an opportunity?
I am trying to reduce the inference time for GPU of unixcoder model (I have an own checkpoint)

Microsoft org

Hi guys, do u have any updates?
@nielsr

Microsoft org

Hi @Lyriccoder ,

The reason you get the error is because you just need to provide the model name, rather than the URL. The following works for me in Google Colab:

!pip install -q transformers[onnx]
!python -m transformers.onnx --model=microsoft/unixcoder-base onnx/ --atol 1e-4 

First, I am talking about model with fine-tuning for code summarization (folder code-summarization).
Secondly, I tried to export already fine-tuned Unixcoder model (Seq2Seq, I have a saved checkpoint) for code summarization (not the original model).
Does it happen just because Seq2Seq with Unixcoder encoder-decoder model is not supported?

Microsoft org

Do you mean that your model is an instance of the EncoderDecoderModel class?

I trained your model, located here:

https://github.com/microsoft/CodeBERT/blob/master/UniXcoder/downstream-tasks/code-summarization/model.py. I didn't change that model.

I trained it successfully, I have a checkpoint. Onnx supports loading models from checkpoints (not only officially deployed in huggingface).
But when I try to load a load checkpoint with ONNX (https://github.com/microsoft/CodeBERT/blob/master/UniXcoder/downstream-tasks/code-summarization/model.py), I have the error:

 raise ValueError(
ValueError: Unrecognized processor in unixcoder-base. Should have a `processor_type` key in its preprocessor_config.json, or one of the following `model_type` keys in its config.json: clip, flava, layoutlmv2, layoutlmv3, layoutxlm, sew, sew-d, speech_to_text, speech_to_text_2, trocr, unispeech, unispeech-sat, vilt, vision-text-dual-encoder, wav2vec2, wav2vec2-conformer, wav2vec2_with_lm, wavlm
Microsoft org

Ok I see. That custom Seq2Seq class isn't supported by the ONNX tools that HuggingFace provides (which only include models available in the Transformers library).

So this would require a custom implementation. Alternatively (and this is what I'd recommend), is to fine-tune an EncoderDecoderModel class, warm-started with the weights of microsoft/unixcoder-base for both the encoder and decoder, on a code summarization dataset. This is also what the UnixCoder authors did as seen here.

We're planning to add ONNX support for that class soon.

Oh, got it. It's not an issue of Unixcoder. I am looking forward for that class.
If it is possible, could you please make documentation with example for that case when ONNX will support the mentioned feature in future?

Thank you for your answer.

Lyriccoder changed discussion status to closed

Sign up or log in to comment