Persian-to-Image Text-to-Image Pipeline

Model Overview

This model pipeline is designed to generate images from Persian text descriptions. It works by first translating the Persian text into English and then using a fine-tuned Stable Diffusion model to generate the corresponding image. The pipeline combines two models: a translation model (mohammad-shirkhani/finetune_persian_to_english_mt5_base_summarize_on_celeba_hq) and an image generation model (ebrahim-k/Stable-Diffusion-1_5-FT-celeba_HQ_en).

Model Details

Translation Model

Model Name: mohammad-shirkhani/finetune_persian_to_english_mt5_base_summarize_on_celeba_hq
Architecture: mT5
Purpose: This model translates Persian text into English. It has been fine-tuned on the CelebA-HQ dataset for summarization tasks, making it effective for translating descriptions of facial features.

Image Generation Model

Model Name: ebrahim-k/Stable-Diffusion-1_5-FT-celeba_HQ_en
Architecture: Stable Diffusion 1.5
Purpose: This model generates high-quality images from English text produced by the translation model. It has been fine-tuned on the CelebA-HQ dataset, which makes it particularly effective for generating realistic human faces based on text descriptions.

Pipeline Description

The pipeline operates through the following steps:

Text Translation: The Persian input text is translated into English using the mT5-based translation model.
Image Generation: The translated English text is then used to generate the corresponding image with the Stable Diffusion model.

Code Implementation

1. Install Required Libraries

!pip install transformers diffusers accelerate torch

2. Import Necessary Libraries

import torch
from transformers import MT5ForConditionalGeneration, T5Tokenizer
from diffusers import StableDiffusionPipeline

3. Set Device (GPU or CPU)

This code determines whether the pipeline should use a GPU (if available) or fallback to a CPU.

# Determine the device: GPU if available, otherwise CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

4. Define and Load the Persian-to-Image Model Class

The following class handles both translation and image generation tasks.

# Define the model class
class PersianToImageModel:
    def __init__(self, translation_model_name, image_model_name, device):
        self.device = device

        # Load translation model
        self.translation_model = MT5ForConditionalGeneration.from_pretrained(translation_model_name).to(device)
        self.translation_tokenizer = T5Tokenizer.from_pretrained(translation_model_name)

        # Load image generation model
        self.image_model = StableDiffusionPipeline.from_pretrained(image_model_name).to(device)

    def translate_text(self, persian_text):
        input_ids = self.translation_tokenizer.encode(persian_text, return_tensors="pt").to(self.device)
        translated_ids = self.translation_model.generate(input_ids, max_length=512, num_beams=4, early_stopping=True)
        translated_text = self.translation_tokenizer.decode(translated_ids[0], skip_special_tokens=True)
        return translated_text

    def generate_image(self, english_text):
        image = self.image_model(english_text).images[0]
        return image

    def __call__(self, persian_text):
        # Translate Persian text to English
        english_text = self.translate_text(persian_text)
        print(f"Translated Text: {english_text}")

        # Generate and return image
        return self.generate_image(english_text)

5. Instantiate the Model

The following code snippet demonstrates how to instantiate the combined model.

# Instantiate the combined model
translation_model_name = 'mohammad-shirkhani/finetune_persian_to_english_mt5_base_summarize_on_celeba_hq'
image_model_name = 'ebrahim-k/Stable-Diffusion-1_5-FT-celeba_HQ_en'

persian_to_image_model = PersianToImageModel(translation_model_name, image_model_name, device)

6. Example Usage of the Model

Below are examples of how to use the model to generate images from Persian text.

from IPython.display import display

# Persian text describing a person
persian_text = "این زن دارای موهای موج دار ، لب های بزرگ و موهای قهوه ای است و رژ لب دارد.این زن موهای موج دار و لب های بزرگ دارد و رژ لب دارد.فرد جذاب است و موهای موج دار ، چشم های باریک و موهای قهوه ای دارد."

# Generate and display the image
image = persian_to_image_model(persian_text)
display(image)

# Another example
persian_text2 = "این مرد جذاب دارای موهای قهوه ای ، سوزش های جانبی ، دهان کمی باز و کیسه های زیر چشم است.این فرد جذاب دارای کیسه های زیر چشم ، سوزش های جانبی و دهان کمی باز است."
image2 = persian_to_image_model(persian_text2)
display(image2)