KB-VQA-E

Sleeping

App Files Files Community

m7mdal7aj commited on May 8, 2024

Commit

dd674b3

•

1 Parent(s): 9246d41

Update my_model/utilities/ui_manager.py

Browse files

Files changed (1) hide show

my_model/utilities/ui_manager.py +1 -1

my_model/utilities/ui_manager.py CHANGED Viewed

@@ -62,7 +62,7 @@ class UIManager():
                         \nAn examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined approach that converts visual content into the linguistic domain, creating detailed captions and object enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The research refines the fine-tuning of PT-LLMs by integrating specialized tokens, enhancing the models’ ability to interpret visual contexts. The research also reviews current image representation techniques and knowledge sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not require specialized expertise.
                         \nRigorous ablation experiments conducted to assess the impact of various visual context elements on model performance, with a particular focus on the importance of image descriptions generated during the captioning phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus, and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment with practical application needs.
                         \nThe evaluation results underscore the developed model’s competent and competitive performance. It achieves a VQA score of 63.57% under syntactic evaluation and excels with an Exact Match (EM) score of 68.36%. Further, semantic evaluations yield even more impressive outcomes, with VQA and EM scores of 71.09% and 72.55%, respectively. These results demonstrate that the model effectively applies reasoning over the visual context and successfully retrieves the necessary knowledge to answer visual questions.""")
-            st.write("""\n\n\nThis is an interactive application developed to demonstrate the project developed as part of the dissertation for Masters degree in Artificial Intelligence at the [University of Bath](https://www.bath.ac.uk/).
                         \n\n\nDeveloped by: [Mohammed H AlHaj](https://www.linkedin.com/in/m7mdal7aj)""")
         with col2:
             st.image("Files/mm.jpeg")

                         \nAn examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined approach that converts visual content into the linguistic domain, creating detailed captions and object enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The research refines the fine-tuning of PT-LLMs by integrating specialized tokens, enhancing the models’ ability to interpret visual contexts. The research also reviews current image representation techniques and knowledge sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not require specialized expertise.
                         \nRigorous ablation experiments conducted to assess the impact of various visual context elements on model performance, with a particular focus on the importance of image descriptions generated during the captioning phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus, and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment with practical application needs.
                         \nThe evaluation results underscore the developed model’s competent and competitive performance. It achieves a VQA score of 63.57% under syntactic evaluation and excels with an Exact Match (EM) score of 68.36%. Further, semantic evaluations yield even more impressive outcomes, with VQA and EM scores of 71.09% and 72.55%, respectively. These results demonstrate that the model effectively applies reasoning over the visual context and successfully retrieves the necessary knowledge to answer visual questions.""")
+            st.write("""\n\n\nThis is an interactive application built to demonstrate the project developed and allow for interaction with the KB-VQA model as part of the dissertation for Masters degree in Artificial Intelligence at the [University of Bath](https://www.bath.ac.uk/).
                         \n\n\nDeveloped by: [Mohammed H AlHaj](https://www.linkedin.com/in/m7mdal7aj)""")
         with col2:
             st.image("Files/mm.jpeg")