--- title: Multilanguage Voice Assistant App emoji: 🗣️ colorFrom: blue colorTo: green sdk: gradio sdk_version: "3.22.1" app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Multilanguage Voice Assistant App This application allows users to upload an image and interact via voice input and audio response. It uses Whisper for speech-to-text, Llava for image-to-text, and gTTS for text-to-speech. ## Usage 1. Upload an image. 2. Use the microphone to ask a question or give a prompt related to the image. 3. Receive a detailed description or response about the image, along with an audio output. ## Dependencies The application uses the following libraries: - transformers - bitsandbytes - accelerate - whisper - gradio - gTTS - Pillow - nltk - torch - numpy