Intermediate_features.npy

#1
by aravind011 - opened

Hi, apologies for raising a discussion in the wrong Space. Could you share your intermediate_features.npy as I am not sure how to recreate it. Much thanks.

Hi, @aravind011
Oh, I see. So you were talking about one of my Spaces. Sorry, I could have noticed.

I've just made the dataset repo public that contains the generated images and extracted features.

But I guess it would be better if you could recreate them yourself.
So, first, the images were generated by this model with seeds in the range of 0-99999. (You can refer to the code of this Space for how to use the model.)
Then, the features were extracted with this model and inference function
https://huggingface.co/spaces/hysts/TADNE-image-search-with-DeepDanbooru/blob/main/app.py#L90-L113

Hope this helps.

Thanks for the guidance and the prompt reply, I managed to extract features using DeepDanbooru as you said and used it to extract features from the psi 1.0 dataset provided by TADNE. However when I input different truncation values to the DeepDanbooru model, the L2 results are unable determine the original seed value used.

Originally I wanted to invert this StyleGAN Encoder (using ReStyle Encoder) to determine the closest seed values and w-latents for editing, however I am unable to do so due to TADNE's latent dimension (style_dim) being 1024 rather than 512, leading to the encoder model size becomes too big and hard to fit on a GPU. Could I ask if there any other way other than CLIP to determine the closest seed value from a given image or any other possible ways of inverting the TADNE Encoder? Your input on this matter would be much appreciated.

Thank you for providing the information on DeepDanbooru and the TADNE feature extraction process.

@aravind011

However when I input different truncation values to the DeepDanbooru model, the L2 results are unable determine the original seed value used.

Thanks for the info.
But that's weird. Images generated with psi 0.7 and psi 1.0 are visually very similar, so it's hard to believe. I think I checked that it worked well even when images were slightly modified (cropped, rotated, or added noises, etc.)
To check it myself again, I've generated some images myself with the TADNE Space with psi=1.0 and fed the results to this Space, but it seems to work fine. I've only tested with several images this time, though.

I guess something is wrong with your experiment, but I'm not sure and maybe you are right. I'd like to know a little more detail about your results if you don't mind.

Originally I wanted to invert this StyleGAN Encoder (using ReStyle Encoder) to determine the closest seed values and w-latents for editing, however I am unable to do so due to TADNE's latent dimension (style_dim) being 1024 rather than 512, leading to the encoder model size becomes too big and hard to fit on a GPU. Could I ask if there any other way other than CLIP to determine the closest seed value from a given image or any other possible ways of inverting the TADNE Encoder? Your input on this matter would be much appreciated.

Sorry, I don't come up with any idea about this. But it doesn't matter if you could determine the closest seed with DeepDanbooru, right? So, maybe we should think about why it doesn't work in your case first.

BTW, I haven't tried training the ReStyle Encoder myself and don't have much knowledge about it, so this question is simply out of curiosity but, isn't it possible to train the ReStyle Encoder even with an A100 GPU? I think an A100 instance on GCP has 40 GB of VRAM. Also, have you tried training it with mixed precision?

Sign up or log in to comment