Text-to-Image
Diffusers
ONNX
Safetensors
English
StableDiffusionXLPipeline
common-canvas
stable-diffusion
sdxl
Inference Endpoints

From scratch, or not?

#5
by ppbrown - opened

I cant seem to find a clear answer in the huggingface model cards, etc:

Are these models created from scratch, just using the sdxl architecture?
Or are they trained on top of sdxl base?
I'm thinking from scratch, but I need an explicit statement of that please?

I know it's been a while, but here's the paper it's based on. Yes, in terms of any actual visual information used, no in terms of derived technologies like machine vision for the purposes of captioning. https://arxiv.org/pdf/2310.16825

thanks for the reply.... not understanding how the words match up to my question.

wading through the paper, they say that they use "the sdxl unet".

it is unclear whether that means "they used just the ARCHITECTURE, but trained the model from scratch", or that they used
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/diffusion_pytorch_model.safetensors

CommonCanvas org

The unet models weights are trained from scratch.

Skylion007 changed discussion status to closed

Sign up or log in to comment