S3Diff / README.md
zhangap's picture
Update README.md
41b5e9b verified
|
raw
history blame
2.93 kB
metadata
license: apache-2.0
pipeline_tag: image-to-image

S3Diff Model Card

This model card focuses on the models associated with the S3Diff, available here.

Model Details

  • Developed by: Aiping Zhang

  • Model type: Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

  • Model Description: This is the model used in Paper.

  • Resources for more information: GitHub Repository.

  • Cite as:

    @article{2024s3diff,
      author    = {Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, Xiaochun Cao},
      title     = {Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors},
      journal   = {arxiv},
      year      = {2024},
    }
    

Limitations and Bias

Limitations

  • S3Diff requires a tiled operation for generating a high-resolution image, which would largely increase the inference time.
  • S3Diff sometimes cannot keep 100% fidelity due to its generative nature.
  • S3Diff sometimes cannot generate perfect details under complex real-world scenarios.

Bias

While our model is based on a pre-trained SD-Turbo model, currently we do not observe obvious bias in generated results. We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images. Such strong conditions make our model less likely to be affected.

Training

Training Data The model developer used the following dataset for training the model:

  • Our model is finetuned on LSDIR + 10K samples from FFHQ datasets.

Training Procedure S3Diff is an image super-resolution model finetuned on SD-Turbo, further equipped with a degradation-guided LoRA and online negative prompting.

  • Following SD-Turbo, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
  • The LR images are fed to the degradation estimation network, trained by mm-realsr, to predict degradation scores.
  • We only inject LoRA layers into the VAE encoder and UNet.
  • The total loss includes an L2 Loss, an LPIPS loss, and a GAN loss.

We currently provide the following checkpoints:

Evaluation Results

See Paper for details.