Spaces:

wiusdy
/

VQA_fashion_hvar

Sleeping

wiusdy commited on Feb 11

Commit

5dde576

•

1 Parent(s): 8c94c5c

updating the number of models and using a new pretrained model for finetuning a BLIP

Files changed (3) hide show

README.md CHANGED Viewed

@@ -8,17 +8,21 @@ widget:
   src: "617.jpg"
 ---
-# This is a simple VQA system using Hugging Face, PyTorch and Vision-and-Language Transformer (ViLT)
 -------------
 In this repository we created a simple VQA system capable of recognize spatial and context information of fashion images (e.g. clothes color and details).
-The project was based in this paper **FashionVQA: A Domain-Specific Visual Question Answering System** [[1]](#1).
 ## References
 <a id="1">[1]</a>
 Min Wang and Ata Mahjoubfar and Anupama Joshi, 2022
-FashionVQA: A Domain-Specific Visual Question Answering System

   src: "617.jpg"
 ---
+# This is a simple VQA system using Hugging Face, PyTorch and VQA models
 -------------
 In this repository we created a simple VQA system capable of recognize spatial and context information of fashion images (e.g. clothes color and details).
+The project was based in this paper **FashionVQA: A Domain-Specific Visual Question Answering System** [[1]](#1). We also used the VQA pre-trained model from **BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation** [[]](#2) to make the model finetuning the two new models.
+We used the datasets **Deep Fashion with Masks** available in <https://huggingface.co/datasets/SaffalPoosh/deepFashion-with-masks> and the **Control Net Dataset** available in <https://huggingface.co/datasets/ldhnam/deepfashion_controlnet>.
 ## References
 <a id="1">[1]</a>
 Min Wang and Ata Mahjoubfar and Anupama Joshi, 2022
+FashionVQA: A Domain-Specific Visual Question Answering System
+<a id="2">[2]</a>
+Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi, 2022
+BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

app.py CHANGED Viewed

@@ -7,7 +7,7 @@ inference = Inference()
 with gr.Blocks() as block:
-    options = gr.Dropdown(choices=["Model 1", "Model 2", "Model 3"], label="Models", info="Select the model to use..", )
     # need to improve this one...
     txt = gr.Textbox(label="Insert a question..", lines=2)

 with gr.Blocks() as block:
+    options = gr.Dropdown(choices=["Model 1", "Model 2"], label="Models", info="Select the model to use..", )
     # need to improve this one...
     txt = gr.Textbox(label="Insert a question..", lines=2)

inference.py CHANGED Viewed

@@ -1,4 +1,4 @@
-from transformers import ViltProcessor, ViltForQuestionAnswering, Pix2StructProcessor, Pix2StructForConditionalGeneration, Blip2Processor, Blip2ForConditionalGeneration
 from transformers.utils import logging
 class Inference:

+from transformers import ViltProcessor, ViltForQuestionAnswering, Pix2StructProcessor, Pix2StructForConditionalGeneration
 from transformers.utils import logging
 class Inference: