RCAN model trained on DIV2K

RCAN is a very deep residual channel attention network for super resolution trained on DIV2K. It was introduced in the paper Image Super-Resolution Using Very Deep Residual Channel Attention Networks in 2018 by Yulun Zhang et al. and first released in this repository.

We develop a modified version that could be supported by AMD Ryzen AI.

Model description

RCAN is an advanced algorithm for single image super resolution. Our modified version is smaller than the original version. It is based deep learning techniques and is capable of X2 super resolution.

Intended uses & limitations

You can use the raw model for super resolution. See the model hub to look for all available RCAN models.

How to use

Installation

Follow Ryzen AI Installation to prepare the environment for Ryzen AI. Run the following script to install pre-requisites for this model.

pip install -r requirements.txt

Data Preparation (optional: for accuracy evaluation)

Download the benchmark(https://cv.snu.ac.kr/research/EDSR/benchmark.tar) dataset.
Organize the dataset directory as follows:

└── dataset
     └── benchmark
          ├── Set5
               ├── HR
               |   ├── baby.png
               |   ├── ...
               └── LR_bicubic
                   └──X2
                      ├──babyx2.png
                      ├── ...
          ├── Set14
          ├── ...

Test & Evaluation

Code snippet from infer_onnx.py on how to use

    parser = argparse.ArgumentParser(description='RCAN SISR')
    parser.add_argument('--onnx_path', type=str, default='RCAN_int8_NHWC.onnx',
                    help='onnx path')
    parser.add_argument('--image_path', default='test_data/test.png',
                    help='path of your image')
    parser.add_argument('--output_path', default='test_data/sr.png',
                    help='path of your image')
    parser.add_argument('--ipu', action='store_true',
                    help='use ipu')
    parser.add_argument('--provider_config', type=str, default=None,
                    help='provider config path')
    args = parser.parse_args()

    if args.ipu:
        providers = ["VitisAIExecutionProvider"]
        provider_options = [{"config_file": args.provider_config}]
    else:
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
        provider_options = None
    onnx_file_name = args.onnx_path
    image_path = args.image_path
    output_path = args.output_path

    ort_session = onnxruntime.InferenceSession(onnx_file_name,  providers=providers, provider_options=provider_options) 
    lr = cv2.imread(image_path)[np.newaxis,:,:,:].transpose((0,3,1,2)).astype(np.float32)
    sr = tiling_inference(ort_session, lr, 8, (56, 56))
    sr = np.clip(sr, 0, 255)
    sr = sr.squeeze().transpose((1,2,0)).astype(np.uint8)
    sr = cv2.imwrite(output_path, sr)

Run inference for a single image

python infer_onnx.py --onnx_path RCAN_int8_NHWC.onnx --image_path /Path/To/Your/Image --ipu --provider_config Path/To/vaip_config.json

Test accuracy of the quantized model

python eval_onnx.py --onnx_path RCAN_int8_NHWC.onnx --data_test Set5 --ipu --provider_config Path/To/vaip_config.json

Performance

Method	Scale	Flops	Set5
RCAN-S (float)	X2	24.5G	37.531 / 0.958
RCAN-S (INT8)	X2	24.5G	37.150 / 0.955

Note: the Flops is calculated with the output resolution is 360x640

@inproceedings{zhang2018image,
  title={Image super-resolution using very deep residual channel attention networks},
  author={Zhang, Yulun and Li, Kunpeng and Li, Kai and Wang, Lichen and Zhong, Bineng and Fu, Yun},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={286--301},
  year={2018}
}