metadata
title: KB-VQA
emoji: π₯
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
license: apache-2.0
Demonstration Environment
The project demo app can be accessed from the developed KB-VQA HF Space, and the entire code can be accessed from here.
To run the demo app locally, from the root of the local code repository run streamlit run app.py
. This will run the whole app. However, to use the Run Inference Tool, a GPU is required.
Project File Structure
Each main python module of the project is extensively documented to guide the reader on what the module role is and how to use it along with its corresponding classes and functions.
Below is the overall file structure of the project:
KB-VQA βββ Files: Various files required for the demo such as samples images, dissertation report ..etc. βββ models β βββ deformable-detr-detic: DETIC Object Detection Model. β βββ yolov5: YOLOv5 Object Detection Model.baseline) βββ my_model β βββ KBVQA.py : This module is the central component for implementing the designed model architecture for the Knowledge-Based Visual Question Answering (KB-VQA) project. β βββ state_manager.py: Manages the user interface and session state to facilitate the Run Inference tool of the Streamlit demo app. β βββ LLAMA2 β β βββ LLAMA2_model.py: Used for loading LLaMA-2 model to be fine-tuned. β βββ captioner β β βββ image_captioning.py: Provides functionality for generating captions for images. β βββ detector β β βββ object_detection.py: Used to detect objects in images using object detection models. β βββ fine_tuner β β βββ fine_tuner.py: Main Fine-Tuning Script for LLaMa-2 Chat models. β β βββ fine_tuning_data_handler.py: Handles and prepares the data for fine-tuning LLaMA-2 Chat models. β β βββ fine_tuning_data β β β βββfine_tuning_data_detic.csv: Fine-tuning data prepared by the prompt engineering module using DETIC detector. β β β βββfine_tuning_data_yolov5.csv: Fine-tuning data prepared by the prompt engineering module using YOLOv5. detector. β βββ results β β βββ Demo_Images: Contains a pool of images used for the demo app. β β βββ evaluation.py: Provides a comprehensive framework for evaluating the KB-VQA model. β β βββ demo.py: Provides a comprehensive framework for visualizing and demonstrating the results of the KB-VQA evaluation. β β βββ evaluation_results.xlsx : This file contains all the evaluation results based on the evaluation data. β βββ tabs β β βββ home.py: Displays an introduction to the application with brief background along with the demo tools description. β β βββ results.py: Manages the interactive Streamlit demo for visualizing model evaluation results and analysis. β β βββ run_inference.py: Responsible for the 'run inference' tool to test and use the fine-tuned models. β β βββ model_arch.py: Displays the model architecture and accompanying abstract and design details β β βββ dataset_analysis.py: Provides tools for visualizing dataset analyses. β βββ utilities β β βββ ui_manager.py: Manages the user interface for the Streamlit application, handling the creation and navigation of various tabs. β β βββ gen_utilities.py: Provides a collection of utility functions and classes commonly used across various parts β βββ config (All Configurations files are kept separated and stored as ".py" for easy reading - this will change after the project submission.) β β βββ kbvqa_config.py: Configuration parameters for the main KB-VQA model. β β βββ LLAMA2_config.py: Configuration parameters for LLaMA-2 model. β β βββ captioning_config.py : Configuration parameters for the captioning model (InstructBLIP). β β βββ dataset_config.py: Configuration parameters for the dataset processing. β β βββ evaluation_config.py: Configuration parameters for the KB-VQA model evaluation. β β βββ fine_tuning_config.py: Configuration parameters for the fine-tuning nodule. β β βββ inference_config.py: Configuration parameters for the Run Inference tool in the demo app. βββ app.py: main entry point for streamlit - first page in the streamlit app) βββ README.md (readme - this file) βββ requirements.txt: Requirements file for the whole project that includes all the requirements for running the demo app on the HuggingFace space environment.
Author: Mohammed Bin Ali Alhaj