Papers
arxiv:2402.01735

VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

Published on Jan 29
Authors:
,
,
,
,

Abstract

Visually Impaired Assistance (VIA) aims to automatically help the visually impaired (VI) handle daily activities. The advancement of VIA primarily depends on developments in Computer Vision (CV) and Natural Language Processing (NLP), both of which exhibit cutting-edge paradigms with large models (LMs). Furthermore, LMs have shown exceptional multimodal abilities to tackle challenging physically-grounded tasks such as embodied robots. To investigate the potential and limitations of state-of-the-art (SOTA) LMs' capabilities in VIA applications, we present an extensive study for the task of VIA with LMs (VIALM). In this task, given an image illustrating the physical environments and a linguistic request from a VI user, VIALM aims to output step-by-step guidance to assist the VI user in fulfilling the request grounded in the environment. The study consists of a survey reviewing recent LM research and benchmark experiments examining selected LMs' capabilities in VIA. The results indicate that while LMs can potentially benefit VIA, their output cannot be well environment-grounded (i.e., 25.7% GPT-4's responses) and lacks fine-grained guidance (i.e., 32.1% GPT-4's responses).

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.01735 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.01735 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.01735 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.