Edit model card

DALL路E 3 Image prompt reverse-engineering

Pre-trained image-captioning model BLIP fine-tuned on a mixture of laion/dalle-3-dataset and semi-automatically gathered (image, prompt) data from DALLE路E 3. It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.

鈿狅笍 Disclaimer: This model is not intended for commercial use as the data it was trained on includes images generated by DALLE路E 3. This is for educational purposes only.

Usage:

Loading the model and preprocessor:

from transformers import BlipForConditionalGeneration, AutoProcessor

model = BlipForConditionalGeneration.from_pretrained("dblasko/blip-dalle3-img2prompt").to(device)
processor = AutoProcessor.from_pretrained("dblasko/blip-dalle3-img2prompt")

Inference example on an image from laion/dalle-3-dataset:

from datasets import load_dataset

dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example
example = dataset[img_index][0]
image = example["image"]
caption = example["caption"]

inputs = processor(images=image, return_tensors="pt").to(device)
pixel_values = inputs.pixel_values

generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"Generated caption: {generated_caption}\nReal caption: {caption}")
Downloads last month
309
Safetensors
Model size
247M params
Tensor type
F32

Spaces using dblasko/blip-dalle3-img2prompt 4