apapiu
/

small_ldt

Model card Files Files and versions Community

Original text prompts for val_encs.npy

by BigBrane - opened Apr 11, 2024

Apr 11, 2024

Hello Alexandru, I am loving your diffusion transformer github code and learning a ton from it!

I was wondering what the original text prompts for the clip encodings in val_encs.npy were. I've been generating images from them as I run your code, and it would be super helpful to know the original text strings for reference.

Thank you!

apapiu

Owner Apr 13, 2024

Hey @BigBrane - good questions - unfortunately I can't locate the original strings - I would recommend regenerating some that are relevant to your usecase - you can use the encode_text function here: https://github.com/apapiu/transformer_latent_diffusion/blob/5448c8afabdd3384612c43085740d1079439fa7e/tld/data.py#L28.

BigBrane

Apr 15, 2024

Awesome, thanks for the tip! On a related note, I was wondering where you downloaded the MJ dataset/prompts you have uploaded here, or perhaps you've scraped them yourself. I thought I might ask before I attempt to use your scripts to convert this dataset I found, in case it was the same one: https://bridges.monash.edu/articles/dataset/Midjourney_2023_Dataset/25038404

BigBrane

Apr 17, 2024

In addition, you mentioned in a reddit comment of yours, as well in your github readme, that you trained on an additional 500k photos in addition to the midjourney images. I would be extremely grateful if you were to share the dataset/latents for that here as well. I have been modifying your code while trying to reproduce similar results to the checkpoint model you uploaded, using only the midjourney dataset you've provided here. While my models seem pick up the MJ style, they lack a bit of photorealism, and I suspect that the missing piece may lie in the data.

apapiu

Owner Apr 22, 2024

Hey @BigBrane I am fairly sure I used this one for mj - https://huggingface.co/datasets/wanng/midjourney-v5-202304-clean - only the upscaled ones. although the non-upscaled one wold be interesting to use too but you'd need to split them up into 4 separate images.

And good question - I don't remember the exact process for the 500k real images - it is mostly various datasets I found on hugginface filtered by aesthetic score.

Here are two datasets:

https://huggingface.co/datasets/zzliang/GRIT

https://huggingface.co/datasets/kakaobrain/coyo-700m

For the coyo one you can filter on the aesthetic_score_laion_v2 - this makes a difference since a lot of images are pretty poor quality and will influence the model generation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment