Edit model card

DALL路E 3 Image prompt reverse-engineering

Pre-trained image-captioning model BLIP fine-tuned on a mixture of laion/dalle-3-dataset and semi-automatically gathered (image, prompt) data from DALLE路E 3. It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.

鈿狅笍 Disclaimer: This model is not intended for commercial use as the data it was trained on includes images generated by DALLE路E 3. This is for educational purposes only.

Usage:

Loading the model and preprocessor:

from transformers import BlipForConditionalGeneration, AutoProcessor

model = BlipForConditionalGeneration.from_pretrained("dblasko/blip-dalle3-img2prompt").to(device)
processor = AutoProcessor.from_pretrained("dblasko/blip-dalle3-img2prompt")

Inference example on an image from laion/dalle-3-dataset:

from datasets import load_dataset

dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example
example = dataset[img_index][0]
image = example["image"]
caption = example["caption"]

inputs = processor(images=image, return_tensors="pt").to(device)
pixel_values = inputs.pixel_values

generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"Generated caption: {generated_caption}\nReal caption: {caption}")
Downloads last month
369
Safetensors
Model size
247M params
Tensor type
F32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using dblasko/blip-dalle3-img2prompt 6