m7mdal7aj commited on
Commit
5234b83
1 Parent(s): fac1f70

Update my_model/tabs/model_arch.py

Browse files
Files changed (1) hide show
  1. my_model/tabs/model_arch.py +4 -0
my_model/tabs/model_arch.py CHANGED
@@ -33,6 +33,7 @@ def run_model_arch() -> None:
33
  of Pre-Trained Large Language Models (PT-LLMs) and Pre-Trained Multimodal Models (PT-LMMs), which have
34
  transformed the machine learning landscape by utilizing expansive, pre-trained knowledge repositories to tackle
35
  complex tasks, thereby enhancing KB-VQA systems.
 
36
  An examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined
37
  approach that converts visual content into the linguistic domain, creating detailed captions and object
38
  enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The
@@ -40,11 +41,13 @@ def run_model_arch() -> None:
40
  to interpret visual contexts. The research also reviews current image representation techniques and knowledge
41
  sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not
42
  require specialized expertise.
 
43
  Rigorous ablation experiments conducted to assess the impact of various visual context elements on model
44
  performance, with a particular focus on the importance of image descriptions generated during the captioning
45
  phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus,
46
  and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment
47
  with practical application needs.
 
48
  The evaluation results underscore the developed model’s competent and competitive performance. It achieves a
49
  VQA score of 63.57% under syntactic evaluation and excels with an Exact Match (EM) score of 68.36%. Further,
50
  semantic evaluations yield even more impressive outcomes, with VQA and EM scores of 71.09% and 72.55%,
@@ -63,6 +66,7 @@ def run_model_arch() -> None:
63
  selected for their initial effectiveness, are designed to be pluggable, allowing for easy replacement with more
64
  advanced models as new technologies develop, thus ensuring the module remains at the forefront of technological
65
  advancement.
 
66
  Following this, the Prompt Engineering Module processes the generated captions and the list of detected objects,
67
  along with their bounding boxes and confidence levels, merging these elements with the question at hand utilizing
68
  a meticulously crafted prompting template. The pipeline ends with a Fine-tuned Pre-Trained Large Language Model
 
33
  of Pre-Trained Large Language Models (PT-LLMs) and Pre-Trained Multimodal Models (PT-LMMs), which have
34
  transformed the machine learning landscape by utilizing expansive, pre-trained knowledge repositories to tackle
35
  complex tasks, thereby enhancing KB-VQA systems.
36
+
37
  An examination of existing Knowledge-Based Visual Question Answering (KB-VQA) methodologies led to a refined
38
  approach that converts visual content into the linguistic domain, creating detailed captions and object
39
  enumerations. This process leverages the implicit knowledge and inferential capabilities of PT-LLMs. The
 
41
  to interpret visual contexts. The research also reviews current image representation techniques and knowledge
42
  sources, advocating for the utilization of implicit knowledge in PT-LLMs, especially for tasks that do not
43
  require specialized expertise.
44
+
45
  Rigorous ablation experiments conducted to assess the impact of various visual context elements on model
46
  performance, with a particular focus on the importance of image descriptions generated during the captioning
47
  phase. The study includes a comprehensive analysis of major KB-VQA datasets, specifically the OK-VQA corpus,
48
  and critically evaluates the metrics used, incorporating semantic evaluation with GPT-4 to align the assessment
49
  with practical application needs.
50
+
51
  The evaluation results underscore the developed model’s competent and competitive performance. It achieves a
52
  VQA score of 63.57% under syntactic evaluation and excels with an Exact Match (EM) score of 68.36%. Further,
53
  semantic evaluations yield even more impressive outcomes, with VQA and EM scores of 71.09% and 72.55%,
 
66
  selected for their initial effectiveness, are designed to be pluggable, allowing for easy replacement with more
67
  advanced models as new technologies develop, thus ensuring the module remains at the forefront of technological
68
  advancement.
69
+
70
  Following this, the Prompt Engineering Module processes the generated captions and the list of detected objects,
71
  along with their bounding boxes and confidence levels, merging these elements with the question at hand utilizing
72
  a meticulously crafted prompting template. The pipeline ends with a Fine-tuned Pre-Trained Large Language Model