README.md · Madhuri/vqa_audiobot at main

metadata

title: Vqa Audiobot
emoji: 📈
colorFrom: indigo
colorTo: purple
sdk: streamlit
python_version: 3.9.0
sdk_version: 1.10.0
app_file: app.py
models:
  - Madhuri/t5_small_vqa_fs
  - dandelin/vilt-b32-finetuned-vqa
pinned: false
license: mit

Visual Question Answering - Bot

VQA Bot addresses the challenge of visual question answering with the chat and voice assistance. Here, we merged Vision transformer and Language generator with the audio transformer. We pretrained and finetuned our model on Language and Audio transformer to get the desired result. Please use the radio buttons below to navigate.

References

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Author: Wonjae Kim and Bokyung Son and Ildoo Kim

Year: 2021

eprint: 2102.03334

archivePrefix: arXiv

primaryClass: stat.ML