Mustalhim_AI / README.md
MoJaff's picture
Update README.md
a11aafe verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: Mustalhim AI
emoji: ๐Ÿ‘
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
license: apache-2.0

Mustalhim: Image to Audio Story Generator

Mustalhim (ู…ุณุชู„ู‡ู…), meaning "inspired" in Arabic, is an AI-powered application that transforms images into captivating audio stories. It uses state-of-the-art models for image captioning, story generation, and text-to-speech synthesis to create an immersive experience.

Features

  • Image Captioning: Generates a descriptive caption for an uploaded image using the Salesforce/blip-image-captioning-large model.
  • Story Generation: Creates a long, engaging story inspired by the image caption using the ALLaM-7B-Instruct-preview model.
  • Text-to-Speech: Converts the generated story into an audio file using the kokoro library.
  • Gradio Interface: Provides an easy-to-use web interface for uploading images and listening to the generated audio.

How It Works

  1. Image Captioning: The app uses a pre-trained image captioning model to generate a textual description of the uploaded image.
  2. Story Generation: The caption is passed to a text-generation model, which creates a long, creative story inspired by the caption.
  3. Text-to-Speech: The generated story is converted into an audio file using a text-to-speech library.
  4. Output: The app returns the audio file, which can be played directly in the interface.

Requirements

  • Python 3.9+
  • Libraries:
    • gradio
    • transformers
    • torch
    • soundfile
    • kokoro
    • sentencepiece

Example Usage

  1. Upload an image using the Gradio interface.
  2. The app will generate a caption for the image.
  3. A story will be created based on the caption.
  4. The story will be converted into an audio file, which you can listen to directly in the app.

Acknowledgments

  • Hugging Face for providing the models and deployment platform.
  • Gradio for the easy-to-use interface.
  • Salesforce for the blip-image-captioning-large model.
  • ALLaM-AI for the ALLaM-7B-Instruct-preview model.

About the Name

Mustalhim (ู…ุณุชู„ู‡ู…) is an Arabic word meaning "inspired." This project is inspired by the power of AI to transform images into creative and engaging stories, bridging the gap between visual and auditory storytelling.


Contact

For questions or feedback, feel free to reach out:


Experience the magic of Mustalhim and let your images inspire stories! ๐Ÿš€

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference