title: Image-to-Audio Story Generator
emoji: π’
colorFrom: red
colorTo: yellow
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
license: unknown
πΌοΈ Image to π§ Audio Story Generator
This project showcases an end-to-end pipeline that transforms an image into an audio story using various AI models and tools.
π Overview
The goal of this project is to leverage AI capabilities to convert an uploaded image into an audio story. It uses a combination of image captioning, text generation, and text-to-speech models.
π Features
π· Image Captioning
- Utilizes Salesforce's
blip-image-captioning-base
model to generate textual descriptions of uploaded images.
βοΈ Text Generation (Story Creation)
- Employs Meta's
llama-2-70b-chat
model to create a short story influenced by the provided image caption within a positive conclusion of 100 words or less.
π Text-to-Speech Conversion
- Utilizes Hugging Face's
espnet/kan-bayashi_ljspeech_vits
model to convert the generated story into an audio file.
π Streamlit Web App
- Built using Streamlit, allowing users to upload images and visualize the generated image caption, story, and audio.
π Usage
To use this application:
- Clone this repository.
- Install the required dependencies using
pip install -r requirements.txt
. - Set up the necessary environment variables:
TOGETHER_API_KEY
: OpenAI API key.HUGGINGFACEHUB_API_TOKEN
: Hugging Face API token.
- Run the Streamlit app with
streamlit run app.py
. - Upload an image file (supported formats: jpg, jpeg, png).
- Wait for the AI processing to generate the story and audio.
- Access the image caption, story, and audio outputs.
π Code Structure
app.py
: Contains the Streamlit web application code, integrating all functionalities.README.md
: Documentation explaining the project, usage instructions, and dependencies.requirements.txt
: Lists all necessary libraries.
π Credits
This project was created with love by @Aditya-Neural-Net-Ninja. It makes use of cutting-edge AI models for image analysis, natural language processing, and text-to-speech conversion. Special thanks to Streamlit and Hugging Face for their incredible platforms.
Note: Please ensure you have the required API keys and tokens for OpenAI and Hugging Face to run this application successfully.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference