Transformers
PyTorch
English
bridgetower
Inference Endpoints
anahita-b commited on
Commit
ab358db
1 Parent(s): 27d6951

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -7
README.md CHANGED
@@ -10,13 +10,15 @@ datasets:
10
  - mscoco_captions
11
  ---
12
 
13
- # BridgeTower base-itm model
14
 
15
  The BridgeTower model was proposed in [BridgeTower: Building Bridges Between Encoders in Vision-Language Representative Learning] by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
16
  The model was pretrained model on English language using masked language modeling (MLM) and image text matching (ITM)objectives. It was introduced in
17
  [this paper](https://arxiv.org/pdf/2206.08657.pdf) and first released in
18
  [this repository](https://github.com/microsoft/BridgeTower).
19
 
 
 
20
  ## Model description
21
 
22
  The abstract from the paper is the following:
@@ -24,7 +26,6 @@ Vision-Language (VL) models with the Two-Tower architecture have dominated visua
24
 
25
  ## Intended uses & limitations(TODO)
26
 
27
- You can use the raw model for image and text retrieval.
28
 
29
  ### How to use
30
 
@@ -103,11 +104,7 @@ The model was pre-trained for 100k steps on 8 NVIDIA A100 GPUs with a batch size
103
  The optimizer used was AdamW with a learning rate of 1e-5. No data augmentation was used except for center-crop. The image resolution in pre-training is set to 288 x 288.
104
 
105
  ## Evaluation results
106
- When fine-tuned on downstream tasks, this model achieves the following results:
107
-
108
- | Task | | | | | | | | |
109
- |:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
110
- | | | | | | | | | |
111
 
112
  ### BibTeX entry and citation info
113
  ```bibtex
 
10
  - mscoco_captions
11
  ---
12
 
13
+ # BridgeTower base-itm-mlm model
14
 
15
  The BridgeTower model was proposed in [BridgeTower: Building Bridges Between Encoders in Vision-Language Representative Learning] by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.
16
  The model was pretrained model on English language using masked language modeling (MLM) and image text matching (ITM)objectives. It was introduced in
17
  [this paper](https://arxiv.org/pdf/2206.08657.pdf) and first released in
18
  [this repository](https://github.com/microsoft/BridgeTower).
19
 
20
+ BridgeTower got accepted to [AAAI'23](https://aaai.org/Conferences/AAAI-23/).
21
+
22
  ## Model description
23
 
24
  The abstract from the paper is the following:
 
26
 
27
  ## Intended uses & limitations(TODO)
28
 
 
29
 
30
  ### How to use
31
 
 
104
  The optimizer used was AdamW with a learning rate of 1e-5. No data augmentation was used except for center-crop. The image resolution in pre-training is set to 288 x 288.
105
 
106
  ## Evaluation results
107
+ Please refer to [Table 5](https://arxiv.org/pdf/2206.08657.pdf) for BridgeTower's performance on Image Retrieval and other down stream tasks.
 
 
 
 
108
 
109
  ### BibTeX entry and citation info
110
  ```bibtex