Iceclear commited on
Commit
a52f4c1
1 Parent(s): 25c8cd7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md CHANGED
@@ -1,3 +1,65 @@
1
  ---
2
  license: other
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ pipeline_tag: image-to-image
4
  ---
5
+ # StableSR Model Card
6
+ This model card focuses on the models associated with the StableSR, available [here](https://github.com/IceClear/StableSR).
7
+
8
+ ## Model Details
9
+ - **Developed by:** Jianyi Wang
10
+ - **Model type:** Diffusion-based image super-resolution model
11
+ - **Language(s):** English
12
+ - **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
13
+ - **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
14
+ - **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
15
+ - **Cite as:**
16
+
17
+ @InProceedings{wang2023exploiting,
18
+ author = {Wang, Jianyi and Yue, Zongsheng and Zhou, Shangchen and Chan, Kelvin CK and Loy, Chen Change},
19
+ title = {Exploiting Diffusion Prior for Real-World Image Super-Resolution},
20
+ booktitle = {arXiv preprint arXiv:2305.07015},
21
+ year = {2023},
22
+ }
23
+
24
+ # Uses
25
+ Please refer to [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
26
+
27
+ ## Limitations and Bias
28
+
29
+ ### Limitations
30
+
31
+ - TBD
32
+
33
+ ### Bias
34
+ While our model is based on a pre-trained Stable Diffusion model, currently we do not observe obvious bias in generated results.
35
+ We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images.
36
+ Such strong conditions make our model less likely to be affected.
37
+
38
+
39
+ ## Training
40
+
41
+ **Training Data**
42
+ The model developers used the following dataset for training the model:
43
+
44
+ - Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
45
+ - We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
46
+
47
+ **Training Procedure**
48
+ StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
49
+
50
+ - Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
51
+ - The latent representations are fed to the time-aware encoder as guidance.
52
+ - The loss is the same as Stable Diffusion.
53
+ - After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
54
+ - The VQGAN model is fixed and only CFW is trainable.
55
+ - The loss is similar to training a VQGAN except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
56
+
57
+ We currently provide the following checkpoints, for various versions:
58
+
59
+ - `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
60
+ - `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
61
+
62
+ ## Evaluation Results
63
+ See [Paper](https://arxiv.org/abs/2305.07015) for details.
64
+
65
+