File size: 6,909 Bytes
aa9fe38
 
 
484de35
c87bfdc
aa9fe38
 
 
c87bfdc
 
 
aa9fe38
 
189bdf9
c87bfdc
 
189bdf9
095898c
c87bfdc
 
 
095898c
c87bfdc
 
524a7ff
 
 
c87bfdc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d998b63
c87bfdc
 
fefd0d0
c87bfdc
 
 
 
 
 
 
 
814ed9a
524a7ff
 
c87bfdc
 
 
 
 
 
 
 
 
 
 
7b5b75c
 
f95cb86
c87bfdc
 
 
 
 
 
 
 
 
 
 
 
4579a89
1aca5be
c87bfdc
 
6dfa05e
 
 
c87bfdc
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
import streamlit as st
import streamlit.components.v1 as components


def run_home() -> None:
    """
    Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit.
    This function sets up the main home page for demonstrating the project.

    Returns:
        None
    """

    st.markdown("<br>" * 2, unsafe_allow_html=True)
    st.markdown("""
            <div style="text-align: justify;">
                        
            Welcome to the interactive application for the **Knowledge-Based Visual Question Answering (KB-VQA)** 
            project. This application is an integral part of a 
            [Master’s dissertation in Artificial Intelligence](https://info.online.bath.ac.uk/msai/) at the 
            [University of Bath](https://www.bath.ac.uk/). As we delve into the fascinating world of VQA, I invite you 
            to explore the intersection of visual perception, language understanding, and cutting-edge AI research.
            </div>""",
                unsafe_allow_html=True)
    
    st.markdown("<br>" * 1, unsafe_allow_html=True)
    
    st.markdown("### Background")
    with st.expander("Read Background"):
        st.write("""
                <div style="text-align: justify;">
                
                Since its inception by **Alan Turing** in 1950, the **Turing Test** has been a fundamental benchmark for 
                evaluating machine intelligence against human standards. As technology evolves, so too must the criteria 
                for assessing AI. The **Visual Turing Test** represents a modern extension that includes visual cognition 
                within the scope of AI evaluation. At the forefront of this advancement is **Visual Question Answering 
                (VQA)**, a field that challenges AI systems to perceive, comprehend, and articulate insights about 
                visual inputs in natural language. This progression reflects the complex interplay between perception 
                and cognition that characterizes human intelligence, positioning VQA as a crucial metric for gauging 
                AI’s ability to emulate human-like understanding.
                
                Mature VQA systems hold transformative potential across various domains. In robotics, VQA systems can 
                enhance autonomous decision-making by enabling robots to interpret and respond to visual cues. In 
                medical imaging and diagnosis, VQA systems can assist healthcare professionals by accurately 
                interpreting complex medical images and providing insightful answers to diagnostic questions, thereby 
                enhancing both the speed and accuracy of medical assessments. In manufacturing, VQA systems can optimize 
                quality control processes by enabling automated systems to identify defects and ensure product 
                consistency with minimal human intervention. These advancements underscore the importance of developing 
                robust VQA capabilities, as they push the boundaries of the Visual Turing Test and bring us closer to 
                achieving true human-like AI cognition.
                
                Unlike other vision-language tasks, VQA requires many Computer Vision sub-tasks to be solved in the process, 
                including: **Object recognition**, **Object detection**, **Attribute classification**, **Scene 
                classification**, **Counting**, **Activity recognition**, **Spatial relationships among objects**, 
                and **Common-sense reasoning**. These VQA tasks often do not require external factual knowledge and only 
                in rare cases require common-sense reasoning. Furthermore, VQA models cannot derive additional knowledge 
                from existing VQA datasets should a question require it, therefore **Knowledge-Based Visual Question 
                Answering (KB-VQA)** has been introduced. KB-VQA is a relatively new extension to VQA with datasets 
                representing a knowledge-based VQA task where the visual question cannot be answered without external 
                knowledge, where the essence of this task is centred around knowledge acquisition and integration with 
                the visual contents of the image.
                </div>""",
                 unsafe_allow_html=True)

    st.markdown("<br>" * 1, unsafe_allow_html=True)

    st.write("""
            <div style="text-align: justify;">
            
            This application showcases the advanced capabilities of the KB-VQA model, empowering users to seamlessly 
            upload images, pose questions, and obtain  answers derived from both visual and textual data. 
            By leveraging sophisticated Multimodal Learning techniques, this project bridges the gap between visual 
            perception and linguistic interpretation, effectively merging these modalities to provide coherent and 
            contextually relevant responses. This research not only showcases the cutting-edge progress in artificial 
            intelligence but also pushes the boundaries of AI systems towards passing the **Visual Turing Test**, where 
            machines exhibit **human-like** understanding and reasoning in processing and responding to visual 
            information.
            <br>
            <br>
            ### Tools:
            
            - **Dataset Analysis**: Provides an overview of the KB-VQA datasets and displays various analysis of the 
            OK-VQA dataset.
            - **Model Architecture**: Displays the model architecture and accompanying abstract and design details for 
            the Knowledge-Based Visual Question Answering (KB-VQA) model.
            - **Results**: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis. 
            It provides an interface for users to explore different aspects of the model performance and evaluation 
            samples.
            - **Run Inference**: This tool allows users to run inference to test and use the fine-tuned KB-VQA model 
            using various configurations.
            </div>""",
             unsafe_allow_html=True)
    st.markdown("<br>" * 3, unsafe_allow_html=True)
    st.write(" ###### Developed by: [Mohammed Bin Ali AlHaj](https://www.linkedin.com/in/m7mdal7aj)")
    st.write("""
            **Credit:** 
            * The project uses [LLaMA-2](https://ai.meta.com/llama/) for its reasoning capabilities and implicit knowledge 
            to derive answers from the supplied visual context. It is made available under 
            [Meta LlaMA license](https://ai.meta.com/llama/license/).
            * This application is built on [Streamlit](https://streamlit.io), providing an interactive and user-friendly 
            interface.
            """)