JustinLin610 commited on
Commit
69ab424
1 Parent(s): 54f523d

update readme

Browse files
Files changed (2) hide show
  1. README.md +25 -102
  2. app.py +2 -2
README.md CHANGED
@@ -1,102 +1,25 @@
1
- # OFA
2
-
3
- [[Paper]](http://arxiv.org/abs/2202.03052) [Blog] [[Colab](colab.md)]
4
-
5
- ![Overview](examples/overview.png)
6
-
7
- OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks
8
- (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.)
9
- to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: [Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework](http://arxiv.org/abs/2202.03052).
10
-
11
-
12
- ## News
13
- * 2022.2.11: Released the Colab notebook for image captioning [![][colab]](https://colab.research.google.com/drive/1Q4eNhhhLcgOP4hHqwZwU1ijOlabgve1W?usp=sharing). Enjoy!
14
- * 2022.2.11: Released the pretrained checkpoint of OFA-Large and the complete (2-staged) finetuning code for image captioning.
15
- * 2022.2.10: Released the inference code & finetuned checkpoint for image captioning, which can reproduce **the results on COCO Karparthy test split (149.6 CIDEr)**
16
-
17
- [colab]: <https://colab.research.google.com/assets/colab-badge.svg>
18
-
19
- ## TODO
20
- * To release finetuning and inference codes for multimodal downstream tasks soon, including image captioning, VQA, text-to-image generation, SNLI-VE, Referring expression, comprehension, etc.
21
- * To release codes for pretraining soon.
22
-
23
-
24
- ## Approach
25
- ![approach](examples/approach.jpg)
26
-
27
-
28
- ## Requirements
29
- * python 3.7.4
30
- * pytorch 1.8.1
31
- * JAVA 1.8 (for COCO evaluation)
32
-
33
-
34
- ## Installation
35
- ```bash
36
- git clone https://github.com/OFA-Sys/OFA
37
- pip install -r requirements.txt
38
- ```
39
-
40
-
41
- ## Datasets and Checkpoints
42
- See [datasets.md](datasets.md) and [checkpoints.md](checkpoints.md).
43
-
44
-
45
- ## Pretraining
46
- To release soon:)
47
-
48
-
49
- # Finetuning & Inference
50
- Below we provide methods for fintuning and inference on different downstream tasks.
51
- ## Caption
52
- 1. Download data and files and put them in the correct directory
53
- 2. Train
54
- ```bash
55
- cd run_scripts/caption
56
- nohup sh train_caption_stage1.sh & # stage1, train with cross-entropy loss
57
- nohup sh train_caption_stage2.sh & # stage2, load the best ckpt of stage1 and train with CIDEr optimization
58
- ```
59
- 3. Inference
60
- ```bash
61
- cd run_scripts/caption ; sh evaluate_caption.sh # inference & evaluate
62
- ```
63
-
64
- # Gallery
65
- Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).
66
-
67
- ## Text-to-Image Generation (normal query)
68
- ![t2i_normal](examples/normal_images.png)
69
-
70
- ## Text-to-Image Generation (counterfactual query)
71
- ![t2i_counterfactual](examples/counterfactual_images.png)
72
-
73
- ## Open-Ended VQA
74
- ![open_vqa](examples/open_vqa.png)
75
-
76
- ## Grounded QA (unseen task)
77
- ![grounded_qa](examples/grounded_qa.png)
78
-
79
- ## Viusal Grounding (unseen domain)
80
- ![vg](examples/viusal_grounding.png)
81
-
82
-
83
- ## Citation
84
- Please cite our paper if you find it helpful :)
85
-
86
- ```
87
- @article{wang2022OFA,
88
- title={Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework},
89
- author={Wang, Peng and Yang, An and Men, Rui and Lin, Junyang and Bai, Shuai and Li, Zhikang and Ma, Jianxin and Zhou, Chang and Zhou, Jingren and Yang, Hongxia},
90
- journal={arXiv e-prints},
91
- pages={arXiv--2202},
92
- year={2022}
93
- }
94
- ```
95
-
96
-
97
- ## Related Codebase
98
- * [fairseq](https://github.com/pytorch/fairseq)
99
-
100
-
101
- ## License
102
- Apache-2.0
 
1
+ ---
2
+ title: OFA-Image_Caption
3
+ emoji: 🖼
4
+ colorFrom: red
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ app_file: app.py
8
+ pinned: true
9
+ ---
10
+ # Configuration
11
+ `title`: _string_
12
+ OFA Image Caption
13
+ `emoji`: _string_
14
+ 🖼
15
+ `colorFrom`: _string_
16
+ red
17
+ `colorTo`: _string_
18
+ indigo
19
+ `sdk`: _string_
20
+ gradio
21
+ `app_file`: _string_
22
+ app.py
23
+
24
+ `pinned`: _boolean_
25
+ true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -2,7 +2,7 @@ import gradio as gr
2
  import os
3
  import torch
4
  import numpy as np
5
- from fairseq import utils,tasks
6
  from utils import checkpoint_utils
7
  from utils.eval_utils import eval_step
8
  from tasks.mm_tasks.caption import CaptionTask
@@ -109,4 +109,4 @@ def image_caption(inp):
109
 
110
 
111
  io = gr.Interface(fn=image_caption, inputs=gr.inputs.Image(type='pil'), outputs='text')
112
- io.launch(debug=True)
 
2
  import os
3
  import torch
4
  import numpy as np
5
+ from fairseq import utils, tasks
6
  from utils import checkpoint_utils
7
  from utils.eval_utils import eval_step
8
  from tasks.mm_tasks.caption import CaptionTask
 
109
 
110
 
111
  io = gr.Interface(fn=image_caption, inputs=gr.inputs.Image(type='pil'), outputs='text')
112
+ io.launch(enable_queue=True)