adi-123's picture
Update README.md
83e3bd6
|
raw
history blame
2.51 kB
metadata
title: Image-to-Audio Story Generator
emoji: 🐒
colorFrom: red
colorTo: yellow
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
license: unknown

πŸ–ΌοΈ Image to 🎧 Audio Story Generator

This project showcases an end-to-end pipeline that transforms an image into an audio story using various AI models and tools.

🌟 Overview

The goal of this project is to leverage AI capabilities to convert an uploaded image into an audio story. It uses a combination of image captioning, text generation, and text-to-speech models.

πŸš€ Features

πŸ“· Image Captioning

  • Utilizes Salesforce's blip-image-captioning-base model to generate textual descriptions of uploaded images.

✍️ Text Generation (Story Creation)

  • Employs Meta's llama-2-70b-chat model to create a short story influenced by the provided image caption within a positive conclusion of 100 words or less.

πŸ”Š Text-to-Speech Conversion

  • Utilizes Hugging Face's espnet/kan-bayashi_ljspeech_vits model to convert the generated story into an audio file.

🌐 Streamlit Web App

  • Built using Streamlit, allowing users to upload images and visualize the generated image caption, story, and audio.

πŸ“ Usage

To use this application:

  1. Clone this repository.
  2. Install the required dependencies using pip install -r requirements.txt.
  3. Set up the necessary environment variables:
    • TOGETHER_API_KEY: OpenAI API key.
    • HUGGINGFACEHUB_API_TOKEN: Hugging Face API token.
  4. Run the Streamlit app with streamlit run app.py.
  5. Upload an image file (supported formats: jpg, jpeg, png).
  6. Wait for the AI processing to generate the story and audio.
  7. Access the image caption, story, and audio outputs.

πŸ“ Code Structure

  • app.py: Contains the Streamlit web application code, integrating all functionalities.
  • README.md: Documentation explaining the project, usage instructions, and dependencies.
  • requirements.txt: Lists all necessary libraries.

πŸ™Œ Credits

This project was created with love by @Aditya-Neural-Net-Ninja. It makes use of cutting-edge AI models for image analysis, natural language processing, and text-to-speech conversion. Special thanks to Streamlit and Hugging Face for their incredible platforms.

Note: Please ensure you have the required API keys and tokens for OpenAI and Hugging Face to run this application successfully.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference