Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.29.0
title: Mustalhim AI
emoji: ๐
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
license: apache-2.0
Mustalhim: Image to Audio Story Generator
Mustalhim (ู ุณุชููู ), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.
Features
- Image Captioning: Generates a descriptive caption for an uploaded image using the
Salesforce/blip-image-captioning-large
model. - Story Generation: Creates a long, engaging story inspired by the image caption using the
ALLaM-7B-Instruct-preview
model. - Text-to-Speech: Converts the generated story into an audio file using the
kokoro
library. - Gradio Interface: Provides an easy-to-use web interface for uploading images and listening to the generated audio.
How It Works
- Image Captioning: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
- Story Generation: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
- Text-to-Speech: The generated story is converted into an audio file using a text-to-speech library.
- Output: The app returns the audio file, which can be played directly in the interface.
Requirements
- Python 3.9+
- Libraries:
gradio
transformers
torch
soundfile
kokoro
sentencepiece
Example Usage
- Upload an image using the Gradio interface.
- The app will generate a caption for the image.
- A story will be created based on the caption.
- The story will be converted into an audio file, which you can listen to directly in the app.
Acknowledgments
- Hugging Face for providing the models and deployment platform.
- Gradio for the easy-to-use interface.
- Salesforce for the
blip-image-captioning-large
model. - ALLaM-AI for the
ALLaM-7B-Instruct-preview
model.
About the Name
Mustalhim (ู ุณุชููู ) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.
Contact
For questions or feedback, feel free to reach out:
- Name: Mohammad Alkhatim
- GitHub: MoJaff
- LinkedIn: Mohammad Alkhatim
- Hugging Face: MoJaff
Experience the magic of Mustalhim and let your images inspire stories! ๐
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference