---
title: Vqa Audiobot
emoji: 📈
colorFrom: indigo
colorTo: purple
sdk: streamlit
python_version: 3.9.0
sdk_version: 1.10.0
app_file: app.py
models: ['Madhuri/t5_small_vqa_fs', 'dandelin/vilt-b32-finetuned-vqa']
pinned: false
license: mit
---

## Visual Question Answering - Bot

VQA Bot addresses the challenge of visual question answering with the chat and voice assistance.
Here, we merged Vision transformer and Language generator with the audio transformer.
We pretrained and finetuned our model on Language and Audio transformer to get the desired result.
Please use the radio buttons below to navigate.


## References

> ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
>
> Author: Wonjae Kim and Bokyung Son and Ildoo Kim
>
> Year: 2021
>
> eprint: 2102.03334
>
> archivePrefix: arXiv
>
> primaryClass: stat.ML