Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,6 @@ This model card focuses on the models associated with the StableSR, available [h
|
|
8 |
## Model Details
|
9 |
- **Developed by:** Jianyi Wang
|
10 |
- **Model type:** Diffusion-based image super-resolution model
|
11 |
-
- **Language(s):** English
|
12 |
- **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
|
13 |
- **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
|
14 |
- **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
|
@@ -39,7 +38,7 @@ Such strong conditions make our model less likely to be affected.
|
|
39 |
## Training
|
40 |
|
41 |
**Training Data**
|
42 |
-
The model
|
43 |
|
44 |
- Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
|
45 |
- We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
|
@@ -47,19 +46,17 @@ The model developers used the following dataset for training the model:
|
|
47 |
**Training Procedure**
|
48 |
StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
|
49 |
|
50 |
-
- Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
|
51 |
- The latent representations are fed to the time-aware encoder as guidance.
|
52 |
- The loss is the same as Stable Diffusion.
|
53 |
- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
|
54 |
- The VQGAN model is fixed and only CFW is trainable.
|
55 |
-
- The loss is similar to training a VQGAN except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
|
56 |
|
57 |
-
We currently provide the following checkpoints
|
58 |
|
59 |
- `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
|
60 |
- `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
|
61 |
|
62 |
## Evaluation Results
|
63 |
-
See [Paper](https://arxiv.org/abs/2305.07015) for details.
|
64 |
-
|
65 |
-
|
|
|
8 |
## Model Details
|
9 |
- **Developed by:** Jianyi Wang
|
10 |
- **Model type:** Diffusion-based image super-resolution model
|
|
|
11 |
- **License:** [S-Lab License 1.0](https://github.com/IceClear/StableSR/blob/main/LICENSE.txt)
|
12 |
- **Model Description:** This is the model used in [Paper](https://arxiv.org/abs/2305.07015).
|
13 |
- **Resources for more information:** [GitHub Repository](https://github.com/IceClear/StableSR).
|
|
|
38 |
## Training
|
39 |
|
40 |
**Training Data**
|
41 |
+
The model developer used the following dataset for training the model:
|
42 |
|
43 |
- Our diffusion model is finetuned on DF2K (DIV2K and Flickr2K) + OST datasets, available [here](https://github.com/xinntao/Real-ESRGAN/blob/master/docs/Training.md).
|
44 |
- We further generate 100k synthetic LR-HR pairs on DF2K_OST using the finetuned diffusion model for training the CFW module.
|
|
|
46 |
**Training Procedure**
|
47 |
StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
|
48 |
|
49 |
+
- Following Stable Diffusion, images are encoded through the fixed VQGAN encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
|
50 |
- The latent representations are fed to the time-aware encoder as guidance.
|
51 |
- The loss is the same as Stable Diffusion.
|
52 |
- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
|
53 |
- The VQGAN model is fixed and only CFW is trainable.
|
54 |
+
- The loss is similar to training a VQGAN, except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
|
55 |
|
56 |
+
We currently provide the following checkpoints:
|
57 |
|
58 |
- `stablesr_000117.ckpt`: Diffusion model finetuned on DF2K_OST dataset for 117 epochs.
|
59 |
- `vqgan_cfw_00011.ckpt`: CFW module with fixed VQGAN trained on synthetic paired data for 11 epochs.
|
60 |
|
61 |
## Evaluation Results
|
62 |
+
See [Paper](https://arxiv.org/abs/2305.07015) for details.
|
|
|
|