Farseq -> Transformers conversion

#1
by mys - opened
No description provided.
mys changed pull request status to open

Thanks for you contribution. Actually I intended to upload the fairseq version of the caption ckpt, as users reported that it is hard to download from the aliyun oss. I'll directly upload a new one for transformers, and this one will be marked with Fairseq version.

Thanks for your ofa checkpoints. With my own inference code, the origin checkpoints in ofa-large-caption (https://huggingface.co/OFA-Sys/ofa-large-caption) have a lower CIDEr about 130. But with your checkpoints converted from fairseq, the performance is correct with a CIDEr 146. This means that the code in transformers ofa model and my own inference code is correct. It looks like just the origin checkpoints has some minor issues.
Therefore, I would like to ask if it is possible to provide checkpoints of other transformers ofa model converted by fairseq? For example, the pretrain ofa (https://huggingface.co/OFA-Sys/ofa-large), this will be of great benefit to fine-tune our own model. Or maybe the code for converting from fairseq to transformers?
Thanks a lot!

Hi @cckevinn , here's the code I used for this conversion: https://colab.research.google.com/drive/1LLJewY92LXdeug5m_ceMUHdlqrRQwSQJ?usp=sharing

I can also share a GitHub repo with links to converted pretrained models later this week. I'm also working on a sample code for fine-tuning pretrained models directly in Transformers.

Hi @mys , Thanks a lot! The code looks fine, I will try it soon.

Hi @cckevinn , here's the code I used for this conversion: https://colab.research.google.com/drive/1LLJewY92LXdeug5m_ceMUHdlqrRQwSQJ?usp=sharing

I can also share a GitHub repo with links to converted pretrained models later this week. I'm also working on a sample code for fine-tuning pretrained models directly in Transformers.

@mys Thank you for sharing your awesome work.

Will the fine-tuning samples include visual grounding?

I would be interested to benchmark OFA vs Donut for UI RefExp task. Here is my working in progress with Donut:
https://huggingface.co/spaces/ivelin/ui-refexp

I know Visual Grounding is pre-trained on RefCoco family which is mostly physical objects , while UI RefExp is primarily RICO android mobile app screenshots. Nevertheless I am curious how fast OFA can transfer learn on RICO RefExp and with what ultimate performance. Happy to share my results as I am with Donut.

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment