File size: 2,979 Bytes
6ab4d01
 
 
 
5c9a65a
6ab4d01
 
 
 
 
 
5c9a65a
 
870ed22
5c9a65a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a11aafe
5c9a65a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ab4d01
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
title: Mustalhim AI
emoji: ๐Ÿ‘
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
license: apache-2.0
---
# Mustalhim: Image to Audio Story Generator



**Mustalhim** (ู…ุณุชู„ู‡ู…), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.

## Features

- **Image Captioning**: Generates a descriptive caption for an uploaded image using the `Salesforce/blip-image-captioning-large` model.
- **Story Generation**: Creates a long, engaging story inspired by the image caption using the `ALLaM-7B-Instruct-preview` model.
- **Text-to-Speech**: Converts the generated story into an audio file using the `kokoro` library.
- **Gradio Interface**: Provides an easy-to-use web interface for uploading images and listening to the generated audio.

## How It Works

1. **Image Captioning**: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
2. **Story Generation**: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
3. **Text-to-Speech**: The generated story is converted into an audio file using a text-to-speech library.
4. **Output**: The app returns the audio file, which can be played directly in the interface.





## Requirements

- Python 3.9+
- Libraries:
  - `gradio`
  - `transformers`
  - `torch`
  - `soundfile`
  - `kokoro`
  - `sentencepiece`

---

## Example Usage

1. Upload an image using the Gradio interface.
2. The app will generate a caption for the image.
3. A story will be created based on the caption.
4. The story will be converted into an audio file, which you can listen to directly in the app.

---



## Acknowledgments

- [Hugging Face](https://huggingface.co) for providing the models and deployment platform.
- [Gradio](https://gradio.app) for the easy-to-use interface.
- [Salesforce](https://salesforce.com) for the `blip-image-captioning-large` model.
- [ALLaM-AI](https://huggingface.co/ALLaM-AI) for the `ALLaM-7B-Instruct-preview` model.


---

## About the Name

**Mustalhim** (ู…ุณุชู„ู‡ู…) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.

---

## Contact

For questions or feedback, feel free to reach out:  
- **Name**: Mohammad Alkhatim  
- **GitHub**: [MoJaff](https://github.com/MoJaff)  
- **LinkedIn**: [Mohammad Alkhatim](https://www.linkedin.com/in/mohammad-alkhatim-9b1770266/)  
- **Hugging Face**: [MoJaff](https://huggingface.co/MoJaff)

---

Experience the magic of **Mustalhim** and let your images inspire stories! ๐Ÿš€


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference