Edit model card

Overview

This project aims to support visually impaired individuals in their daily navigation.

This project combines the YOLO model and LLaMa 2 7b for the navigation.

YOLO is trained on the bounding box data from the AI Hub, Output of YOLO (bbox data) is converted as lists like [[class_of_obj_1, xmin, xmax, ymin, ymax, size], [class_of...] ...] then added to the input of question. The LLM is trained to navigate using LearnItAnyway/Visual-Navigation-21k multi-turn dataset

Usage

We show how to use the model in yolo_llama_visnav_test.ipynb

Downloads last month
8
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.