AhmedSSabir
commited on
Commit
•
22d77c4
1
Parent(s):
cf0ecfa
Update README.md
Browse files
README.md
CHANGED
@@ -24,6 +24,37 @@ The model is trained with a strict filter of 0.4 similarity distance thresholds
|
|
24 |
For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
|
25 |
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
```
|
28 |
conda create -n BERT_visual python=3.6 anaconda
|
29 |
conda activate BERT_visual
|
|
|
24 |
For the [dataset](https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset)
|
25 |
|
26 |
|
27 |
+
|
28 |
+
## # Result with SoTA pre-trained image Captioning BLIP
|
29 |
+
|
30 |
+
|
31 |
+
Comparison result with BLIP (125M pre-trained images) [Table 7 COCO Caption Karpathy testset](https://arxiv.org/pdf/2201.12086.pdf).
|
32 |
+
For the VilBERT model (3.5 pre-trained images) please refer to the paper.
|
33 |
+
|
34 |
+
## Accuarcy
|
35 |
+
|
36 |
+
| Model | B-1 | B-2 | B-3 | B-4 | M | R | C | S |BERTscore |
|
37 |
+
|----------------------------------|---------|-------|--------|-------|--------|--------|-------|--------|---------|
|
38 |
+
| BLIP Beam Search b=3 | .797 | .649 | **.514** | **.403** | **.311** | **.606** |**1.365** |**.243** | **.9484** |
|
39 |
+
| + BERT-CNN $th=0$ | .798 | .646 | .506 | .392 | .305 | .598 | 1.339 | .238 | .9473 |
|
40 |
+
| + BERT-CNN $th\geq0.2$ | .798 | .647 | .507 | .393 | .306 | .600 | 1.342 | .238 | .9473 |
|
41 |
+
| + BERT-CNN $th\geq0.3$ | .802 | .651 | .511 | .397 | .307 | .601 | 1.349 | .238 | .9479 |
|
42 |
+
| + BERT-CNN $th\geq0.4$ | **.806** | **.654** | .513 | .397 | .303 | .599 | 1.343 | .235 | .9476 |
|
43 |
+
|
44 |
+
## Diversity
|
45 |
+
|
46 |
+
| Model | Uniq | V | MBlue-1↓ | Div-1 |Div-2 | SBERT-sts|
|
47 |
+
|----------------------------------|---------|-------|----------|-------|-------|----------|
|
48 |
+
| BLIP Beam Search b=3 | **8.60** | 1406 | .461 | .68 | .80 | .8058 |
|
49 |
+
| + BERT-CNN $th=0$ | 8.49 | **1532** | .457 | .68 | .80 | .8046 |
|
50 |
+
| + BERT-CNN $th\geq0.2$ | 8.48 | 1486 | .458 | .68 | .80 | .8052 |
|
51 |
+
| + BERT-CNN $th\geq0.3$ | 8.41 | 1448 | .458 | .68 | .80 | **.8060** |
|
52 |
+
| + BERT-CNN $th\geq0.4$ | 8.30 | 1448 | **.455** | .68 | .80 | .8053 |
|
53 |
+
|human | 9.14 | 3425 | .375 | .74 | .84 | NA |
|
54 |
+
|
55 |
+
|
56 |
+
|
57 |
+
|
58 |
```
|
59 |
conda create -n BERT_visual python=3.6 anaconda
|
60 |
conda activate BERT_visual
|