Spaces:
Sleeping
Sleeping
title: Mustalhim AI | |
emoji: ๐ | |
colorFrom: indigo | |
colorTo: blue | |
sdk: gradio | |
sdk_version: 5.18.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Mustalhim: Image to Audio Story Generator | |
**Mustalhim** (ู ุณุชููู ), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience. | |
## Features | |
- **Image Captioning**: Generates a descriptive caption for an uploaded image using the `Salesforce/blip-image-captioning-large` model. | |
- **Story Generation**: Creates a long, engaging story inspired by the image caption using the `ALLaM-7B-Instruct-preview` model. | |
- **Text-to-Speech**: Converts the generated story into an audio file using the `kokoro` library. | |
- **Gradio Interface**: Provides an easy-to-use web interface for uploading images and listening to the generated audio. | |
## How It Works | |
1. **Image Captioning**: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image. | |
2. **Story Generation**: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption. | |
3. **Text-to-Speech**: The generated story is converted into an audio file using a text-to-speech library. | |
4. **Output**: The app returns the audio file, which can be played directly in the interface. | |
## Requirements | |
- Python 3.9+ | |
- Libraries: | |
- `gradio` | |
- `transformers` | |
- `torch` | |
- `soundfile` | |
- `kokoro` | |
- `sentencepiece` | |
--- | |
## Example Usage | |
1. Upload an image using the Gradio interface. | |
2. The app will generate a caption for the image. | |
3. A story will be created based on the caption. | |
4. The story will be converted into an audio file, which you can listen to directly in the app. | |
--- | |
## Acknowledgments | |
- [Hugging Face](https://huggingface.co) for providing the models and deployment platform. | |
- [Gradio](https://gradio.app) for the easy-to-use interface. | |
- [Salesforce](https://salesforce.com) for the `blip-image-captioning-large` model. | |
- [ALLaM-AI](https://huggingface.co/ALLaM-AI) for the `ALLaM-7B-Instruct-preview` model. | |
--- | |
## About the Name | |
**Mustalhim** (ู ุณุชููู ) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling. | |
--- | |
## Contact | |
For questions or feedback, feel free to reach out: | |
- **Name**: Mohammad Alkhatim | |
- **GitHub**: [MoJaff](https://github.com/MoJaff) | |
- **LinkedIn**: [Mohammad Alkhatim](https://www.linkedin.com/in/mohammad-alkhatim-9b1770266/) | |
- **Hugging Face**: [MoJaff](https://huggingface.co/MoJaff) | |
--- | |
Experience the magic of **Mustalhim** and let your images inspire stories! ๐ | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |