---
title: Multilanguage Voice Assistant App
emoji: 🗣️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "3.22.1"
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Multilanguage Voice Assistant App

This application allows users to upload an image and interact via voice input and audio response. It uses Whisper for speech-to-text, Llava for image-to-text, and gTTS for text-to-speech.

## Usage

1. Upload an image.
2. Use the microphone to ask a question or give a prompt related to the image.
3. Receive a detailed description or response about the image, along with an audio output.

## Dependencies

The application uses the following libraries:
- transformers
- bitsandbytes
- accelerate
- whisper
- gradio
- gTTS
- Pillow
- nltk
- torch
- numpy