--- datasets: - laion/dalle-3-dataset language: - en tags: - art - image-to-text - image-captioning --- # DALL·E 3 Image prompt reverse-engineering Pre-trained image-captioning model BLIP fine-tuned on a mixture of `laion/dalle-3-dataset` and semi-automatically gathered `(image, prompt)` data from DALLE·E 3. It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images. ⚠️ Disclaimer: This model is **not intended for commercial use** as the data it was trained on includes images generated by DALLE·E 3. This is for educational purposes only. ### Usage: Loading the model and preprocessor: ```python from transformers import BlipForConditionalGeneration, AutoProcessor model = BlipForConditionalGeneration.from_pretrained("dblasko/blip-dalle3-img2prompt").to(device) processor = AutoProcessor.from_pretrained("dblasko/blip-dalle3-img2prompt") ``` Inference example on an image from `laion/dalle-3-dataset`: ```python from datasets import load_dataset dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example example = dataset[img_index][0] image = example["image"] caption = example["caption"] inputs = processor(images=image, return_tensors="pt").to(device) pixel_values = inputs.pixel_values generated_ids = model.generate(pixel_values=pixel_values, max_length=50) generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(f"Generated caption: {generated_caption}\nReal caption: {caption}") ```