--- title: Vqa Audiobot emoji: 📈 colorFrom: indigo colorTo: purple sdk: streamlit python_version: 3.9.0 sdk_version: 1.10.0 app_file: app.py models: ['Madhuri/t5_small_vqa_fs', 'dandelin/vilt-b32-finetuned-vqa'] pinned: false license: mit --- ## Visual Question Answering - Bot VQA Bot addresses the challenge of visual question answering with the chat and voice assistance. Here, we merged Vision transformer and Language generator with the audio transformer. We pretrained and finetuned our model on Language and Audio transformer to get the desired result. Please use the radio buttons below to navigate. ## References > ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision > > Author: Wonjae Kim and Bokyung Son and Ildoo Kim > > Year: 2021 > > eprint: 2102.03334 > > archivePrefix: arXiv > > primaryClass: stat.ML