From scratch, or not?

by ppbrown - opened Jul 7, 2024

Jul 7, 2024

I cant seem to find a clear answer in the huggingface model cards, etc:

Are these models created from scratch, just using the sdxl architecture?
Or are they trained on top of sdxl base?
I'm thinking from scratch, but I need an explicit statement of that please?

Ferarn

Aug 24, 2024

I know it's been a while, but here's the paper it's based on. Yes, in terms of any actual visual information used, no in terms of derived technologies like machine vision for the purposes of captioning. https://arxiv.org/pdf/2310.16825

ppbrown

Aug 24, 2024

thanks for the reply.... not understanding how the words match up to my question.

wading through the paper, they say that they use "the sdxl unet".

it is unclear whether that means "they used just the ARCHITECTURE, but trained the model from scratch", or that they used
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/diffusion_pytorch_model.safetensors

Skylion007

CommonCanvas org Aug 26, 2024

The unet models weights are trained from scratch.

Skylion007 changed discussion status to closed Aug 26, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment