A newer version of the Gradio SDK is available:
4.37.2
metadata
title: Multilanguage Voice Assistant App
emoji: 🗣️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.22.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Multilanguage Voice Assistant App
This application allows users to upload an image and interact via voice input and audio response. It uses Whisper for speech-to-text, Llava for image-to-text, and gTTS for text-to-speech.
Usage
- Upload an image.
- Use the microphone to ask a question or give a prompt related to the image.
- Receive a detailed description or response about the image, along with an audio output.
Dependencies
The application uses the following libraries:
- transformers
- bitsandbytes
- accelerate
- whisper
- gradio
- gTTS
- Pillow
- nltk
- torch
- numpy