PRIYANSHUDHAKED's picture
Update README.md
3edf507 verified
|
raw
history blame contribute delete
No virus
850 Bytes

A newer version of the Gradio SDK is available: 4.37.2

Upgrade
metadata
title: Multilanguage Voice Assistant App
emoji: 🗣️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.22.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Multilanguage Voice Assistant App

This application allows users to upload an image and interact via voice input and audio response. It uses Whisper for speech-to-text, Llava for image-to-text, and gTTS for text-to-speech.

Usage

  1. Upload an image.
  2. Use the microphone to ask a question or give a prompt related to the image.
  3. Receive a detailed description or response about the image, along with an audio output.

Dependencies

The application uses the following libraries:

  • transformers
  • bitsandbytes
  • accelerate
  • whisper
  • gradio
  • gTTS
  • Pillow
  • nltk
  • torch
  • numpy