metadata

title: Multilanguage Voice Assistant App
emoji: 🗣️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.22.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Multilanguage Voice Assistant App

This application allows users to upload an image and interact via voice input and audio response. It uses Whisper for speech-to-text, Llava for image-to-text, and gTTS for text-to-speech.

Usage

Upload an image.
Use the microphone to ask a question or give a prompt related to the image.
Receive a detailed description or response about the image, along with an audio output.

Dependencies

The application uses the following libraries:

transformers
bitsandbytes
accelerate
whisper
gradio
gTTS
Pillow
nltk
torch
numpy