Spaces:
Runtime error
Runtime error
metadata
title: Vqa Audiobot
emoji: π
colorFrom: indigo
colorTo: purple
sdk: streamlit
python_version: 3.9.0
sdk_version: 1.10.0
app_file: app.py
models:
- Madhuri/t5_small_vqa_fs
- dandelin/vilt-b32-finetuned-vqa
pinned: false
license: mit
Visual Question Answering - Bot
VQA Bot addresses the challenge of visual question answering with the chat and voice assistance. Here, we merged Vision transformer and Language generator with the audio transformer. We pretrained and finetuned our model on Language and Audio transformer to get the desired result. Please use the radio buttons below to navigate.
References
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Author: Wonjae Kim and Bokyung Son and Ildoo Kim
Year: 2021
eprint: 2102.03334
archivePrefix: arXiv
primaryClass: stat.ML