Evaluation results

#7
by Kubuxu - opened

I ran an evaluation on the new version of your VAE vs others from StabilityAI.

Evaluation on COCO val-2017, 256x256, RandomCrop with padding
Metrics:
LPIPS: https://github.com/richzhang/PerceptualSimilarity/ (lower better) and structural similarity index measure via skimage.metrics (higher better)
Metrics given as: mean [79% credibility interval]

Model (format) LPIPS ↓ SSIM ↑
sdxl-madebyollin/sdxl-vae-fp16-fix 6d10734 (float32) 0.056 [0.021, 0.101] 0.73 [0.54, 0.88]
sdxl-madebyollin/sdxl-vae-fp16-fix 6d10734 (bfloat16) 0.058 [0.022, 0.103] 0.72 [0.53, 0.88]
sdxl-madebyollin/sdxl-vae-fp16-fix 6d10734 (float16) 0.056 [0.021, 0.101] 0.73 [0.54, 0.88]
stabilityai/sdxl-vae 0.9 c1b803c (float32) 0.055 [0.021, 0.097] 0.73 [0.54, 0.89]
stabilityai/sdxl-vae 0.9 c1b803c (bfloat16) 0.058 [0.023, 0.101] 0.72 [0.53, 0.88]
stabilityai/sdxl-vae 0.9 c1b803c (float16) nan nan
stabilityai/sdxl-vae "1.0" 0cbce97 (float32) 0.055 [0.021, 0.097] 0.71 [0.53, 0.87]
stabilityai/sdxl-vae "1.0" 0cbce97 (bfloat16) 0.058 [0.023, 0.101] 0.70 [0.52, 0.86]
stabilityai/sdxl-vae "1.0" 0cbce97 (float16) nan nan
stabilityai/sd-vae-ft-mse-original 840000 (float32) 0.057 [0.022, 0.101] 0.70 [0.51, 0.87]
stabilityai/sd-vae-ft-mse-original 840000 (bfloat16) 0.057 [0.022, 0.101] 0.70 [0.51, 0.86]
stabilityai/sd-vae-ft-mse-original 840000 (float16) 0.057 [0.022, 0.101] 0.70 [0.51, 0.87]
stabilityai/sd-vae-ft-ena-original 560000 (float32) 0.055 [0.021, 0.096] 0.69 [0.49, 0.86]
stabilityai/sd-vae-ft-ena-original 560000 (bfloat16) 0.055 [0.021, 0.095] 0.69 [0.49, 0.86]
stabilityai/sd-vae-ft-ena-original 560000 (float16) 0.055 [0.021, 0.096] 0.69 [0.49, 0.86]
kl-f8 (float32) 0.063 [0.028, 0.105] 0.67 [0.48, 0.84]

On MSE it also looks quite favourably:
stabilityai/sdxl-vae c1b803c: 0.0048 [0.0008, 0.0105] (f32), MSE: 0.0049 [0.0008, 0.0108] (bf16)
madebyollin/sdxl-vae-fp16-fix 6d10734: 0.0048 [0.0008, 0.0105] (f32, bf16, f16)

The previous version had a bit to ask for:
madebyollin/sdxl-vae-fp16-fix (old 2023-07-12 dd10706):
f32: LPIPS: 0.0691 [0.0291, 0.1181], MSE: 0.0054 [0.0011, 0.0114]
bf16: LPIPS: 0.0699 [0.0298, 0.1188], MSE: 0.0053 [0.0010, 0.0114]
f16: LPIPS: 0.0692 [0.0290, 0.1184], MSE: 0.0054 [0.0011, 0.0114]

Great job with the new one!!
And the speed and VRAM gains for not using upcast are very significant.

ty for the benchmark

Kubuxu changed discussion title from Basic evaluation results to Evaluation results

Sign up or log in to comment