m7mdal7aj commited on
Commit
c87bfdc
1 Parent(s): 30bdce8

Update my_model/tabs/home.py

Browse files
Files changed (1) hide show
  1. my_model/tabs/home.py +85 -17
my_model/tabs/home.py CHANGED
@@ -2,26 +2,94 @@ import streamlit as st
2
  import streamlit.components.v1 as components
3
 
4
 
5
- def run_home():
6
  """
7
  Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit.
8
  This function sets up the main home page for demonstrating the project.
 
 
 
9
  """
10
 
11
- st.markdown(
12
- """
13
- <div style="text-align: center;">
14
- <h1>Multimodal Learning for Visual Question Answering using World Knowledge</h1>
15
- <h2>Knowledge-Based Visual Question Answering (KB-VQA)</h2>
16
- </div>
17
- """,
18
- unsafe_allow_html=True
19
- )
20
-
21
- st.write("""\n\n\nThis is an interactive application built to demonstrate the project developed and allow for interaction with the KB-VQA model as part of the dissertation for Masters degree in Artificial Intelligence at the [University of Bath](https://www.bath.ac.uk/).
22
- \n\n\nDeveloped by: [Mohammed H AlHaj](https://www.linkedin.com/in/m7mdal7aj)""")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- st.write("""## Credits
25
- -This project predominantly uses [LLaMA-2](https://ai.meta.com/llama/) and derivative models for language inference. Models are made available under the [Meta LlaMA license](https://ai.meta.com/llama/license/).
26
- -This application is built on [streamlit](https://streamlit.io).
27
- """)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  import streamlit.components.v1 as components
3
 
4
 
5
+ def run_home() -> None:
6
  """
7
  Displays the home page for the Knowledge-Based Visual Question Answering (KB-VQA) project using Streamlit.
8
  This function sets up the main home page for demonstrating the project.
9
+
10
+ Returns:
11
+ None
12
  """
13
 
14
+ st.markdown("""
15
+ <div style="text-align: justify;">
16
+
17
+ \n\n\n**Welcome to the interactive application for the Knowledge-Based Visual Question Answering (KB-VQA)
18
+ project. This application is an integral part of a
19
+ [Master’s dissertation in Artificial Intelligence](https://info.online.bath.ac.uk/msai/) at the
20
+ [University of Bath](https://www.bath.ac.uk/). As we delve into the fascinating world of VQA, I invite you
21
+ to explore the intersection of visual perception, language understanding, and cutting-edge AI research.**
22
+ </div>""",
23
+ unsafe_allow_html=True)
24
+ st.markdown("### Background")
25
+ with st.expander("Read Background"):
26
+ st.write("""
27
+ <div style="text-align: justify;">
28
+
29
+ Since its inception by **Alan Turing** in 1950, the **Turing Test** has been a fundamental benchmark for
30
+ evaluating machine intelligence against human standards. As technology evolves, so too must the criteria
31
+ for assessing AI. The **Visual Turing Test** represents a modern extension that includes visual cognition
32
+ within the scope of AI evaluation. At the forefront of this advancement is **Visual Question Answering
33
+ (VQA)**, a field that challenges AI systems to perceive, comprehend, and articulate insights about
34
+ visual inputs in natural language. This progression reflects the complex interplay between perception
35
+ and cognition that characterizes human intelligence, positioning VQA as a crucial metric for gauging
36
+ AI’s ability to emulate human-like understanding.
37
+
38
+ Mature VQA systems hold transformative potential across various domains. In robotics, VQA systems can
39
+ enhance autonomous decision-making by enabling robots to interpret and respond to visual cues. In
40
+ medical imaging and diagnosis, VQA systems can assist healthcare professionals by accurately
41
+ interpreting complex medical images and providing insightful answers to diagnostic questions, thereby
42
+ enhancing both the speed and accuracy of medical assessments. In manufacturing, VQA systems can optimize
43
+ quality control processes by enabling automated systems to identify defects and ensure product
44
+ consistency with minimal human intervention. These advancements underscore the importance of developing
45
+ robust VQA capabilities, as they push the boundaries of the Visual Turing Test and bring us closer to
46
+ achieving true human-like AI cognition.
47
+
48
+ Unlike other vision-language tasks, VQA requires many CV sub-tasks to be solved in the process,
49
+ including: **Object recognition**, **Object detection**, **Attribute classification**, **Scene
50
+ classification**, **Counting**, **Activity recognition**, **Spatial relationships among objects**,
51
+ and **Commonsense reasoning**. These VQA tasks often do not require external factual knowledge and only
52
+ in rare cases require common-sense reasoning. Furthermore, VQA models cannot derive additional knowledge
53
+ from existing VQA datasets should a question require it, therefore **Knowledge-Based Visual Question
54
+ Answering (KB-VQA)** has been introduced. KB-VQA is a relatively new extension to VQA with datasets
55
+ representing a knowledge-based VQA task where the visual question cannot be answered without external
56
+ knowledge, where the essence of this task is centred around knowledge acquisition and integration with
57
+ the visual contents of the image.
58
+ </div>""",
59
+ unsafe_allow_html=True)
60
 
61
+ st.write("""
62
+ <div style="text-align: justify;">
63
+
64
+ This application showcases the advanced capabilities of the KB-VQA model, empowering users to seamlessly
65
+ upload images, pose questions, and obtain answers derived from both visual and textual data.
66
+ By leveraging sophisticated Multimodal Learning techniques, this project bridges the gap between visual
67
+ perception and linguistic interpretation, effectively merging these modalities to provide coherent and
68
+ contextually relevant responses. This research not only showcases the cutting-edge progress in artificial
69
+ intelligence but also pushes the boundaries of AI systems towards passing the **Visual Turing Test**, where
70
+ machines exhibit **human-like** understanding and reasoning in processing and responding to visual
71
+ information.
72
+
73
+ ## Tools:
74
+
75
+ - **Dataset Analysis**: Provides an overview of the KB-VQA datasets and displays various analysis of the
76
+ OK-VQA dataset.
77
+ - **Model Architecture**: Displays the model architecture and accompanying abstract and design details for
78
+ the Knowledge-Based Visual Question Answering (KB-VQA) model.
79
+ - **Results**: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis.
80
+ It provides an interface for users to explore different aspects of the model performance and evaluation
81
+ samples.
82
+ - **Run Inference**: This tool allows users to run inference to test and use the fine-tuned KB-VQA model
83
+ using various configurations.
84
+ </div>""",
85
+ unsafe_allow_html=True)
86
+ st.markdown("<br>" * 1, unsafe_allow_html=True)
87
+ st.write(" ##### Developed by: [Mohammed H AlHaj](https://www.linkedin.com/in/m7mdal7aj)")
88
+ st.markdown("<br>" * 1, unsafe_allow_html=True)
89
+ st.write("""
90
+ **Credit:**
91
+ * The project predominantly uses [LLaMA-2](https://ai.meta.com/llama/) language inference. It is
92
+ made available under [Meta LlaMA license](https://ai.meta.com/llama/license/).
93
+ * This application is built on [Streamlit](https://streamlit.io), providing an interactive and user-friendly
94
+ interface.
95
+ """)