|
import streamlit as st |
|
import streamlit.components.v1 as components |
|
|
|
|
|
def run_home() -> None: |
|
""" |
|
Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit. |
|
This function sets up the main home page for demonstrating the project. |
|
|
|
Returns: |
|
None |
|
""" |
|
|
|
st.markdown("<br>" * 2, unsafe_allow_html=True) |
|
st.markdown(""" |
|
<div style="text-align: justify;"> |
|
|
|
Welcome to the interactive application for the **Knowledge-Based Visual Question Answering (KB-VQA)** |
|
project. This application is an integral part of a |
|
[Master’s dissertation in Artificial Intelligence](https://info.online.bath.ac.uk/msai/) at the |
|
[University of Bath](https://www.bath.ac.uk/). As we delve into the fascinating world of VQA, I invite you |
|
to explore the intersection of visual perception, language understanding, and cutting-edge AI research. |
|
</div>""", |
|
unsafe_allow_html=True) |
|
|
|
st.markdown("<br>" * 1, unsafe_allow_html=True) |
|
|
|
st.markdown("### Background") |
|
with st.expander("Read Background"): |
|
st.write(""" |
|
<div style="text-align: justify;"> |
|
|
|
Since its inception by **Alan Turing** in 1950, the **Turing Test** has been a fundamental benchmark for |
|
evaluating machine intelligence against human standards. As technology evolves, so too must the criteria |
|
for assessing AI. The **Visual Turing Test** represents a modern extension that includes visual cognition |
|
within the scope of AI evaluation. At the forefront of this advancement is **Visual Question Answering |
|
(VQA)**, a field that challenges AI systems to perceive, comprehend, and articulate insights about |
|
visual inputs in natural language. This progression reflects the complex interplay between perception |
|
and cognition that characterizes human intelligence, positioning VQA as a crucial metric for gauging |
|
AI’s ability to emulate human-like understanding. |
|
|
|
Mature VQA systems hold transformative potential across various domains. In robotics, VQA systems can |
|
enhance autonomous decision-making by enabling robots to interpret and respond to visual cues. In |
|
medical imaging and diagnosis, VQA systems can assist healthcare professionals by accurately |
|
interpreting complex medical images and providing insightful answers to diagnostic questions, thereby |
|
enhancing both the speed and accuracy of medical assessments. In manufacturing, VQA systems can optimize |
|
quality control processes by enabling automated systems to identify defects and ensure product |
|
consistency with minimal human intervention. These advancements underscore the importance of developing |
|
robust VQA capabilities, as they push the boundaries of the Visual Turing Test and bring us closer to |
|
achieving true human-like AI cognition. |
|
|
|
Unlike other vision-language tasks, VQA requires many Computer Vision sub-tasks to be solved in the process, |
|
including: **Object recognition**, **Object detection**, **Attribute classification**, **Scene |
|
classification**, **Counting**, **Activity recognition**, **Spatial relationships among objects**, |
|
and **Common-sense reasoning**. These VQA tasks often do not require external factual knowledge and only |
|
in rare cases require common-sense reasoning. Furthermore, VQA models cannot derive additional knowledge |
|
from existing VQA datasets should a question require it, therefore **Knowledge-Based Visual Question |
|
Answering (KB-VQA)** has been introduced. KB-VQA is a relatively new extension to VQA with datasets |
|
representing a knowledge-based VQA task where the visual question cannot be answered without external |
|
knowledge, where the essence of this task is centred around knowledge acquisition and integration with |
|
the visual contents of the image. |
|
</div>""", |
|
unsafe_allow_html=True) |
|
|
|
st.markdown("<br>" * 1, unsafe_allow_html=True) |
|
|
|
st.write(""" |
|
<div style="text-align: justify;"> |
|
|
|
This application showcases the advanced capabilities of the KB-VQA model, empowering users to seamlessly |
|
upload images, pose questions, and obtain answers derived from both visual and textual data. |
|
By leveraging sophisticated Multimodal Learning techniques, this project bridges the gap between visual |
|
perception and linguistic interpretation, effectively merging these modalities to provide coherent and |
|
contextually relevant responses. This research not only showcases the cutting-edge progress in artificial |
|
intelligence but also pushes the boundaries of AI systems towards passing the **Visual Turing Test**, where |
|
machines exhibit **human-like** understanding and reasoning in processing and responding to visual |
|
information. |
|
<br> |
|
<br> |
|
### Tools: |
|
|
|
- **Dataset Analysis**: Provides an overview of the KB-VQA datasets and displays various analysis of the |
|
OK-VQA dataset. |
|
- **Model Architecture**: Displays the model architecture and accompanying abstract and design details for |
|
the Knowledge-Based Visual Question Answering (KB-VQA) model. |
|
- **Results**: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis. |
|
It provides an interface for users to explore different aspects of the model performance and evaluation |
|
samples. |
|
- **Run Inference**: This tool allows users to run inference to test and use the fine-tuned KB-VQA model |
|
using various configurations. |
|
</div>""", |
|
unsafe_allow_html=True) |
|
st.markdown("<br>" * 3, unsafe_allow_html=True) |
|
st.write(" ###### Developed by: [Mohammed Bin Ali AlHaj](https://www.linkedin.com/in/m7mdal7aj)") |
|
st.write(""" |
|
**Credit:** |
|
* The project uses [LLaMA-2](https://ai.meta.com/llama/) for its reasoning capabilities and implicit knowledge |
|
to derive answers from the supplied visual context. It is made available under |
|
[Meta LlaMA license](https://ai.meta.com/llama/license/). |
|
* This application is built on [Streamlit](https://streamlit.io), providing an interactive and user-friendly |
|
interface. |
|
""") |
|
|