File size: 855 Bytes
7eba131
 
 
 
 
 
7a69915
7eba131
 
7a69915
7eba131
 
 
 
7a69915
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
title: Vqa Audiobot
emoji: 📈
colorFrom: indigo
colorTo: purple
sdk: streamlit
python_version: 3.9.0
sdk_version: 1.10.0
app_file: app.py
models: ['Madhuri/t5_small_vqa_fs', 'dandelin/vilt-b32-finetuned-vqa']
pinned: false
license: mit
---

## Visual Question Answering - Bot

VQA Bot addresses the challenge of visual question answering with the chat and voice assistance.
Here, we merged Vision transformer and Language generator with the audio transformer.
We pretrained and finetuned our model on Language and Audio transformer to get the desired result.
Please use the radio buttons below to navigate.


## References

> ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
>
> Author: Wonjae Kim and Bokyung Son and Ildoo Kim
>
> Year: 2021
>
> eprint: 2102.03334
>
> archivePrefix: arXiv
>
> primaryClass: stat.ML