Edit model card

Model Card: daclip-uir ViT-B/32 - irsde

Model Details

Model Description

This model extends the CLIP to a degradation-aware version (DA-CLIP) which predicts both degradation embedding and clean content embedding from corrupted images. Then we can use these embeddings to improve image restoration performance and help unified image restoration. The base CLIP model is pretrained ViT-B/32 and the base diffusion model for image restoration is IR-SDE.

Documents

Controlling Vision-Language Models for Universal Image Restoration - paper.

Intended Use

The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore image degradation with language models. Researchers in computer vision can use it to further improve their models' performance. We also encourage users who are interested in our work to train their own models with larger dataset and more degradation types.

Performance

We have evaluated the performance of DA-CLIP and the downstream diffusion model on 10 different image restoration datasets:

  • GoPro: Motion-blur
  • RESIDE-6k: haze
  • LIVE1: JPEG-compress
  • LOL: Low-light
  • CBSD68: Noisy
  • RainDrop: Raindrop
  • Rain100H: Rainy
  • SRD: Shadowed
  • Snow100K-L: Snowy
  • CelebaHQ-256: Inpainting

Limitations

The current pretrained model is still difficult to process some real-world images which might have distribution shifts with our training dataset (captured from different devices or with different resolutions or degradations). We regard it as a future work and will try to make our model more practical! We also found that directly resizing input images will lead a poor performance for most tasks. We could try to add the resize step into the training but it always destroys the image quality due to interpolation.

Contact

If you have any question, please contact: ziwei.luo@it.uu.se

Citations

If our code helps your research or work, please consider citing our paper:

@article{luo2023controlling,
  title={Controlling Vision-Language Models for Universal Image Restoration},
  author={Luo, Ziwei and Gustafsson, Fredrik K and Zhao, Zheng and Sj{\"o}lund, Jens and Sch{\"o}n, Thomas B},
  journal={arXiv preprint arXiv:2310.01018},
  year={2023}
}
Downloads last month
0
Unable to determine this model's library. Check the docs .