Spaces:
Sleeping
Sleeping
Update my_model/tabs/model_arch.py
Browse files
my_model/tabs/model_arch.py
CHANGED
@@ -33,6 +33,7 @@ def run_model_arch() -> None:
|
|
33 |
of Pre-Trained Large Language Models (PT-LLMs) and Pre-Trained Multimodal Models (PT-LMMs), which have
|
34 |
transformed the machine learning landscape by utilizing expansive, pre-trained knowledge repositories to tackle
|
35 |
complex tasks, thereby enhancing KB-VQA systems.
|
|
|
36 |
An examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined
|
37 |
approach that converts visual content into the linguistic domain, creating detailed captions and object
|
38 |
enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The
|
@@ -40,11 +41,13 @@ def run_model_arch() -> None:
|
|
40 |
to interpret visual contexts. The research also reviews current image representation techniques and knowledge
|
41 |
sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not
|
42 |
require specialized expertise.
|
|
|
43 |
Rigorous ablation experiments conducted to assess the impact of various visual context elements on model
|
44 |
performance, with a particular focus on the importance of image descriptions generated during the captioning
|
45 |
phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus,
|
46 |
and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment
|
47 |
with practical application needs.
|
|
|
48 |
The evaluation results underscore the developed model’s competent and competitive performance. It achieves a
|
49 |
VQA score of 63.57% under syntactic evaluation and excels with an Exact Match (EM) score of 68.36%. Further,
|
50 |
semantic evaluations yield even more impressive outcomes, with VQA and EM scores of 71.09% and 72.55%,
|
@@ -63,6 +66,7 @@ def run_model_arch() -> None:
|
|
63 |
selected for their initial effectiveness, are designed to be pluggable, allowing for easy replacement with more
|
64 |
advanced models as new technologies develop, thus ensuring the module remains at the forefront of technological
|
65 |
advancement.
|
|
|
66 |
Following this, the Prompt Engineering Module processes the generated captions and the list of detected objects,
|
67 |
along with their bounding boxes and confidence levels, merging these elements with the question at hand utilizing
|
68 |
a meticulously crafted prompting template. The pipeline ends with a Fine-tuned Pre-Trained Large Language Model
|
|
|
33 |
of Pre-Trained Large Language Models (PT-LLMs) and Pre-Trained Multimodal Models (PT-LMMs), which have
|
34 |
transformed the machine learning landscape by utilizing expansive, pre-trained knowledge repositories to tackle
|
35 |
complex tasks, thereby enhancing KB-VQA systems.
|
36 |
+
|
37 |
An examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined
|
38 |
approach that converts visual content into the linguistic domain, creating detailed captions and object
|
39 |
enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The
|
|
|
41 |
to interpret visual contexts. The research also reviews current image representation techniques and knowledge
|
42 |
sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not
|
43 |
require specialized expertise.
|
44 |
+
|
45 |
Rigorous ablation experiments conducted to assess the impact of various visual context elements on model
|
46 |
performance, with a particular focus on the importance of image descriptions generated during the captioning
|
47 |
phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus,
|
48 |
and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment
|
49 |
with practical application needs.
|
50 |
+
|
51 |
The evaluation results underscore the developed model’s competent and competitive performance. It achieves a
|
52 |
VQA score of 63.57% under syntactic evaluation and excels with an Exact Match (EM) score of 68.36%. Further,
|
53 |
semantic evaluations yield even more impressive outcomes, with VQA and EM scores of 71.09% and 72.55%,
|
|
|
66 |
selected for their initial effectiveness, are designed to be pluggable, allowing for easy replacement with more
|
67 |
advanced models as new technologies develop, thus ensuring the module remains at the forefront of technological
|
68 |
advancement.
|
69 |
+
|
70 |
Following this, the Prompt Engineering Module processes the generated captions and the list of detected objects,
|
71 |
along with their bounding boxes and confidence levels, merging these elements with the question at hand utilizing
|
72 |
a meticulously crafted prompting template. The pipeline ends with a Fine-tuned Pre-Trained Large Language Model
|