camenduru commited on
Commit
fadbfd6
1 Parent(s): 2d6c1a2

thanks to Iceclear ❤

Browse files
.gitattributes CHANGED
@@ -25,7 +25,6 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ pipeline_tag: image-to-image
4
+ ---
5
+ # StableSR Model Card
6
+ This model card focuses on the models associated with the StableSR, available [here](https://github.com/IceClear/StableSR).
7
+
8
+ ## Model Details
9
+ - **Developed by:** Jianyi Wang
10
+ - **Model type:** Diffusion-based image super-resolution model
11
+ - **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
12
+ - **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
13
+ - **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
14
+ - **Cite as:**
15
+
16
+ @InProceedings{wang2023exploiting,
17
+ author = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change},
18
+ title = {Exploiting Diffusion Prior for Real-World Image Super-Resolution},
19
+ booktitle = {arXiv preprint arXiv:2305.07015},
20
+ year = {2023},
21
+ }
22
+
23
+ # Uses
24
+ Please refer to [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
25
+
26
+ ## Limitations and Bias
27
+
28
+ ### Limitations
29
+
30
+ - StableSR still requires multiple steps for generating an image, which is much slower than GAN-based approaches, especially for large images beyond 512 or 768.
31
+ - StableSR sometimes cannot keep 100% fidelity due to its generative nature.
32
+ - StableSR sometimes cannot generate perfect details under complex real-world scenarios.
33
+
34
+ ### Bias
35
+ While our model is based on a pre-trained Stable Diffusion model, currently we do not observe obvious bias in generated results.
36
+ We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images.
37
+ Such strong conditions make our model less likely to be affected.
38
+
39
+
40
+ ## Training
41
+
42
+ **Training Data**
43
+ The model developer used the following dataset for training the model:
44
+
45
+ - Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
46
+ - We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
47
+
48
+ **Training Procedure**
49
+ StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
50
+
51
+ - Following Stable Diffusion, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
52
+ - The latent representations are fed to the time-aware encoder as guidance.
53
+ - The loss is the same as Stable Diffusion.
54
+ - After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
55
+ - The autoencoder model is fixed and only CFW is trainable.
56
+ - The loss is similar to training an autoencoder, except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
57
+
58
+ We currently provide the following checkpoints:
59
+
60
+ - [stablesr_000117.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/stablesr_000117.ckpt): Diffusion model finetuned on [SD2.1-512base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) with DF2K_OST dataset for 117 epochs.
61
+ - [vqgan_cfw_00011.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/vqgan_cfw_00011.ckpt): CFW module with fixed autoencoder trained on synthetic paired data for 11 epochs.
62
+ - [stablesr_768v_000139.ckpt](https://huggingface.co/Iceclear/StableSR/blob/main/stablesr_768v_000139.ckpt): Diffusion model finetuned on [SD2.1-768v](https://huggingface.co/stabilityai/stable-diffusion-2-1) with DF2K_OST dataset for 139 epochs.
63
+
64
+ ## Evaluation Results
65
+ See [Paper](https://arxiv.org/abs/2305.07015) for details.
ldmsr4x_finetune_119.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4977665bc20d5976c6cfafbb914aca162578f4012252ef2df4839a718be12da
3
+ size 2039892291
stablesr_000117.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8862bf3fd11c5b8fe82fb8a4618a1c74a29e0301a190bd6c2e84d68986ef9cb
3
+ size 6481647231
stablesr_768v_000139.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5f83b544035b4bf24ab1d7aa86e0f83328e9ce121efb2c850f833178be3d10b
3
+ size 6481647231
vqgan_cfw_00011.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b20ed9a80e9bdbf9e76ff1642a6a5428ca3427b08b779248a88bfbba2e74e8e
3
+ size 959719471
webui_512v_models.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9484ce3614c7964e8dd0ab9b053e7e728f7ea458d264c3aebb2175ffbf9c4f0
3
+ size 1273589575
webui_768v_139.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c6b969948fe692998b33433c0f554506aaf8a39cbd2b36a0db5a72c5ecaa4df
3
+ size 422185645