semdisdiffae / _results_appendix_semantic.md
data-archetype's picture
Upload folder using huggingface_hub
b32916f verified

7. Results

Reconstruction quality evaluated on a curated set of test images covering photographs, book covers, and documents. Flux.1 VAE (patch 8, 16 channels) is included as a reference at the same 12x compression ratio as the c64 variant.

7.1 Interactive Viewer

Open full-resolution comparison viewer — side-by-side reconstructions, RGB deltas, and latent PCA with adjustable image size.

7.2 Inference Settings

Setting Value
Sampler ddim
Steps 1
Schedule linear
Seed 42
PDG no_path_dropg
Batch size (timing) 4

All models run in bfloat16. Timings measured on an NVIDIA RTX Pro 6000 (Blackwell).

7.3 Global Metrics

Metric semdisdiffae (1 step) Flux.2 VAE
Avg PSNR (dB) 35.78 34.16
Avg encode (ms/image) 2.5 46.1
Avg decode (ms/image) 5.5 91.8

7.4 Per-Image PSNR (dB)

Image semdisdiffae (1 step) Flux.2 VAE
p640x1536:94623 35.44 33.50
p640x1536:94624 31.33 30.03
p640x1536:94625 35.05 33.98
p640x1536:94626 33.21 31.53
p640x1536:94627 32.54 30.53
p640x1536:94628 29.80 28.88
p960x1024:216264 46.37 45.39
p960x1024:216265 29.70 27.80
p960x1024:216266 47.15 46.20
p960x1024:216267 40.99 39.23
p960x1024:216268 38.47 36.13
p960x1024:216269 32.74 30.24
p960x1024:216270 36.23 34.18
p960x1024:216271 44.41 42.18
p704x1472:94699 43.80 41.79
p704x1472:94700 32.83 32.08
p704x1472:94701 39.00 37.90
p704x1472:94702 34.52 32.50
p704x1472:94703 32.81 31.35
p704x1472:94704 33.38 31.84
p704x1472:94705 39.70 37.44
p704x1472:94706 35.12 33.66
r256_p1344x704:15577 31.02 29.98
r256_p1344x704:15578 32.38 30.79
r256_p1344x704:15579 33.27 31.83
r256_p1344x704:15580 37.84 36.03
r256_p1344x704:15581 38.57 36.94
r256_p1344x704:15582 33.41 32.10
r256_p1344x704:15583 36.67 34.54
r256_p1344x704:15584 33.23 31.76
r256_p896x1152:144131 35.30 33.60
r256_p896x1152:144132 36.99 35.32
r256_p896x1152:144133 39.69 37.33
r256_p896x1152:144134 36.01 34.47
r256_p896x1152:144135 31.20 29.87
r256_p896x1152:144136 37.51 35.68
r256_p896x1152:144137 33.83 32.86
r256_p896x1152:144138 27.39 25.63
VAE_accuracy_test_image 36.64 35.25