Spaces:
Sleeping
Sleeping
att
Browse files
README.md
CHANGED
@@ -10,4 +10,18 @@ pinned: false
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
+
# This is a simple VQA system using Hugging Face, PyTorch and Vision-and-Language Transformer (ViLT)
|
14 |
+
-------------
|
15 |
+
|
16 |
+
In this repository we created a simple VQA system capable of recognize spatial and context information of fashion images (e.g. clothes color and details).
|
17 |
+
|
18 |
+
The project was based in this paper **FashionVQA: A Domain-Specific Visual Question Answering System** [[1]](#1).
|
19 |
+
|
20 |
+
We used the repo <https://github.com/dandelin/ViLT> for the VQA models.
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
+
## References
|
25 |
+
<a id="1">[1]</a>
|
26 |
+
Min Wang and Ata Mahjoubfar and Anupama Joshi, 2022
|
27 |
+
FashionVQA: A Domain-Specific Visual Question Answering System
|