metadata

title: KB-VQA
emoji: 🔥
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
license: apache-2.0

Demonstration Environment

The project demo app can be accessed from the developed KB-VQA HF Space, and the entire code can be accessed from here. To run the demo app locally, from the root of the local code repository run streamlit run app.py. This will run the whole app. However, to use the Run Inference Tool, a GPU is required.

Project File Structure

Each main python module of the project is extensively documented to guide the reader on what the module role is and how to use it along with its correcponding classes and functions.

Below is the overall file structure of the project:

KB-VQA
├── Files: Various files required for the demo such as samples images, dissertation report ..etc.
├── models
│ ├── deformable-detr-detic: DETIC Object Detection Model.
│ ├── yolov5: YOLOv5 Object Detection Model.baseline)
├── my_model
│ ├── KBVQA.py : This module is the central component for implementing the designed model architecture for the Knowledge-Based Visual Question Answering (KB-VQA) project.
│ ├── state_manager.py: Manages the user interface and session state to facilitate the Run Inference tool of the Streamlit demo app.
│ ├── LLAMA2
│ │ ├── LLAMA2_model.py: Used for loading LLaMA-2 model to be fine-tuned.
│ ├── captioner
│ │ ├── image_captioning.py: Provides functionality for generating captions for images.
│ ├── detector
│ │ ├── object_detection.py: Used to detect objects in images using object detection models.
│ ├── fine_tuner
│ │ ├── fine_tuner.py: Main Fine-Tuning Script for LLaMa-2 Chat models.
│ │ ├── fine_tuning_data_handler.py: Handles and prepares the data for fine-tuning LLaMA-2 Chat models.
│ │ ├── fine_tuning_data
│ │ │ ├──fine_tuning_data_detic.csv: Fine-tuning data prepared by the prompt engineering module using DETIC detector.
│ │ │ ├──fine_tuning_data_yolov5.csv: Fine-tuning data prepared by the prompt engineering module using YOLOv5. detector.
│ ├── results
│ │ ├── Demo_Images: Contains a pool of images used for the demo app.
│ │ ├── evaluation.py: Provides a comprehensive framework for evaluating the KB-VQA model.
│ │ ├── demo.py: Provides a comprehensive framework for visualizing and demonstrating the results of the KB-VQA evaluation.
│ │ ├── evaluation_results.xlsx : This file contains all the evaluation results based on the evaluation data.
│ ├── tabs
│ │ ├── home.py: Displays an introduction to the application with brief background along with the demo tools description.
│ │ ├── results.py: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis.
│ │ ├── run_inference.py: Responsible for the 'run inference' tool to test and use the fine-tuned models.
│ │ ├── model_arch.py: Displays the model architecture and accompanying abstract and design details
│ │ ├── dataset_analysis.py: Provides tools for visualizing dataset analyses.
│ ├── utilities
│ │ ├── ui_manager.py: Manages the user interface for the Streamlit application, handling the creation and navigation of various tabs.
│ │ ├── gen_utilities.py: Provides a collection of utility functions and classes commonly used across various parts
│ ├── config (All Configurations files are kept separated and stored as ".py" for easy reading - this will change after the project submission.)
│ │ ├── kbvqa_config.py: Configuration parameters for the main KB-VQA model.
│ │ ├── LLAMA2_config.py: Configuration parameters for LLaMA-2 model.
│ │ ├── captioning_config.py : Configuration parameters for the captioning model (InstructBLIP).
│ │ ├── dataset_config.py: Configuration parameters for the dataset processing.
│ │ ├── evaluation_config.py: Configuration parameters for the KB-VQA model evaluation.
│ │ ├── fine_tuning_config.py: Configurable parameters for the fine-tuning nodule.
│ │ ├── inference_config.py: Configurable parameters for the Run Inference tool in the demo app.
├── app.py: main entry point for streamlit - first page in the streamlit app)
├── README.md (readme - this file)
├── requirements.txt: Requirements file for the whole project that includes all the requirements for running the demo app on the HuggingFace space environment.

Author: Mohammed Bin Ali Alhaj