Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ In Vietnamese, there is currently limited availability of datasets and methodolo
|
|
4 |
To address this challenging task, we propose an approach using three primary models: a Scene Text Recognition model, a Vision model, and a Language model. Particularly, the Scene Text Recognition model is responsible for extracting scene text from image, the Vision model is tasked with extracting visual features from image, and the Language model takes the output from the two aforementioned models as input and generates the corresponding answer for the question. Our approach has achieved a CIDEr score of 3.6384 in the private test set, ranking first among competing teams.
|
5 |
|
6 |
<p align="center">
|
7 |
-
<img width="800" alt="overview" src="https://
|
8 |
Diagram of our proposed model
|
9 |
</p>
|
10 |
|
@@ -71,6 +71,6 @@ chmod +x evaluate.sh
|
|
71 |
## 4. Examples
|
72 |
|
73 |
<p align="center">
|
74 |
-
<img width="400" alt="overview" src="https://
|
75 |
Generated VQA answers of the proposed model in comparison with that of the baselines.
|
76 |
</p>
|
|
|
4 |
To address this challenging task, we propose an approach using three primary models: a Scene Text Recognition model, a Vision model, and a Language model. Particularly, the Scene Text Recognition model is responsible for extracting scene text from image, the Vision model is tasked with extracting visual features from image, and the Language model takes the output from the two aforementioned models as input and generates the corresponding answer for the question. Our approach has achieved a CIDEr score of 3.6384 in the private test set, ranking first among competing teams.
|
5 |
|
6 |
<p align="center">
|
7 |
+
<img width="800" alt="overview" src="https://raw.githubusercontent.com/tuanlt175/mblip_stqa/refs/heads/master/figures/overview.png"><br>
|
8 |
Diagram of our proposed model
|
9 |
</p>
|
10 |
|
|
|
71 |
## 4. Examples
|
72 |
|
73 |
<p align="center">
|
74 |
+
<img width="400" alt="overview" src="https://raw.githubusercontent.com/tuanlt175/mblip_stqa/refs/heads/master/figures/examples.png"><br>
|
75 |
Generated VQA answers of the proposed model in comparison with that of the baselines.
|
76 |
</p>
|