Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,60 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
thumbnail: https://algolzw.github.io/daclip-uir/static/images/teaser.jpg
|
4 |
---
|
5 |
+
|
6 |
+
|
7 |
+
# Model Card: daclip-uir ViT-B/32 - irsde
|
8 |
+
|
9 |
+
## Model Details
|
10 |
+
|
11 |
+
### Model Description
|
12 |
+
|
13 |
+
This model extends the CLIP to a degradation-aware version (DA-CLIP) which predicts both degradation embedding and clean content embedding from corrupted images. Then we can use these embeddings to improve image restoration performance and help unified image restoration. The base CLIP model is pretrained ViT-B/32 and the base diffusion model for image restoration is [IR-SDE](https://arxiv.org/abs/2301.11699).
|
14 |
+
|
15 |
+
|
16 |
+
### Documents
|
17 |
+
|
18 |
+
Controlling Vision-Language Models for Universal Image Restoration - [paper](https://arxiv.org/abs/2310.01018).
|
19 |
+
|
20 |
+
|
21 |
+
### Intended Use
|
22 |
+
|
23 |
+
The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore image degradation with language models. Researchers in computer vision can use it to further improve their models' performance. We also encourage users who are interested in our work to train their own models with larger dataset and more degradation types.
|
24 |
+
|
25 |
+
|
26 |
+
### Performance
|
27 |
+
|
28 |
+
We have evaluated the performance of DA-CLIP and the downstream diffusion model on 10 different image restoration datasets:
|
29 |
+
|
30 |
+
- GoPro: Motion-blur
|
31 |
+
- RESIDE-6k: haze
|
32 |
+
- LIVE1: JPEG-compress
|
33 |
+
- LOL: Low-light
|
34 |
+
- CBSD68: Noisy
|
35 |
+
- RainDrop: Raindrop
|
36 |
+
- Rain100H: Rainy
|
37 |
+
- SRD: Shadowed
|
38 |
+
- Snow100K-L: Snowy
|
39 |
+
- CelebaHQ-256: Inpainting
|
40 |
+
|
41 |
+
### Limitations
|
42 |
+
The current pretrained model is still difficult to process some real-world images which might have distribution shifts with our training dataset (captured from different devices or with different resolutions or degradations). We regard it as a future work and will try to make our model more practical!
|
43 |
+
We also found that directly resizing input images will lead a poor performance for most tasks. We could try to add the resize step into the training but it always destroys the image quality due to interpolation.
|
44 |
+
|
45 |
+
|
46 |
+
#### Contact
|
47 |
+
If you have any question, please contact: ziwei.luo@it.uu.se
|
48 |
+
|
49 |
+
|
50 |
+
### Citations
|
51 |
+
If our code helps your research or work, please consider citing our paper:
|
52 |
+
|
53 |
+
```
|
54 |
+
@article{luo2023controlling,
|
55 |
+
title={Controlling Vision-Language Models for Universal Image Restoration},
|
56 |
+
author={Luo, Ziwei and Gustafsson, Fredrik K and Zhao, Zheng and Sj{\"o}lund, Jens and Sch{\"o}n, Thomas B},
|
57 |
+
journal={arXiv preprint arXiv:2310.01018},
|
58 |
+
year={2023}
|
59 |
+
}
|
60 |
+
```
|