Spaces:

MoJaff
/

Mustalhim_AI

Sleeping

App Files Files Community

Mustalhim_AI / README.md

MoJaff

Update README.md

a11aafe verified 2 months ago

preview code

raw

history blame contribute delete

2.98 kB

	---
	title: Mustalhim AI
	emoji: 👁
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.18.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---
	# Mustalhim: Image to Audio Story Generator



	Mustalhim (مستلهم), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.

	## Features

	- Image Captioning: Generates a descriptive caption for an uploaded image using the `Salesforce/blip-image-captioning-large` model.
	- Story Generation: Creates a long, engaging story inspired by the image caption using the `ALLaM-7B-Instruct-preview` model.
	- Text-to-Speech: Converts the generated story into an audio file using the `kokoro` library.
	- Gradio Interface: Provides an easy-to-use web interface for uploading images and listening to the generated audio.

	## How It Works

	1. Image Captioning: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
	2. Story Generation: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
	3. Text-to-Speech: The generated story is converted into an audio file using a text-to-speech library.
	4. Output: The app returns the audio file, which can be played directly in the interface.





	## Requirements

	- Python 3.9+
	- Libraries:
	- `gradio`
	- `transformers`
	- `torch`
	- `soundfile`
	- `kokoro`
	- `sentencepiece`

	---

	## Example Usage

	1. Upload an image using the Gradio interface.
	2. The app will generate a caption for the image.
	3. A story will be created based on the caption.
	4. The story will be converted into an audio file, which you can listen to directly in the app.

	---



	## Acknowledgments

	- [Hugging Face](https://huggingface.co) for providing the models and deployment platform.
	- [Gradio](https://gradio.app) for the easy-to-use interface.
	- [Salesforce](https://salesforce.com) for the `blip-image-captioning-large` model.
	- [ALLaM-AI](https://huggingface.co/ALLaM-AI) for the `ALLaM-7B-Instruct-preview` model.


	---

	## About the Name

	Mustalhim (مستلهم) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.

	---

	## Contact

	For questions or feedback, feel free to reach out:
	- Name: Mohammad Alkhatim
	- GitHub: [MoJaff](https://github.com/MoJaff)
	- LinkedIn: [Mohammad Alkhatim](https://www.linkedin.com/in/mohammad-alkhatim-9b1770266/)
	- Hugging Face: [MoJaff](https://huggingface.co/MoJaff)

	---

	Experience the magic of Mustalhim and let your images inspire stories! 🚀


	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference