Spaces:
Runtime error
Runtime error
title: Vqa Audiobot | |
emoji: π | |
colorFrom: indigo | |
colorTo: purple | |
sdk: streamlit | |
python_version: 3.9.0 | |
sdk_version: 1.10.0 | |
app_file: app.py | |
models: ['Madhuri/t5_small_vqa_fs', 'dandelin/vilt-b32-finetuned-vqa'] | |
pinned: false | |
license: mit | |
## Visual Question Answering - Bot | |
VQA Bot addresses the challenge of visual question answering with the chat and voice assistance. | |
Here, we merged Vision transformer and Language generator with the audio transformer. | |
We pretrained and finetuned our model on Language and Audio transformer to get the desired result. | |
Please use the radio buttons below to navigate. | |
## References | |
> ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | |
> | |
> Author: Wonjae Kim and Bokyung Son and Ildoo Kim | |
> | |
> Year: 2021 | |
> | |
> eprint: 2102.03334 | |
> | |
> archivePrefix: arXiv | |
> | |
> primaryClass: stat.ML | |