Update README.md
Browse files
README.md
CHANGED
|
@@ -29,15 +29,15 @@ encoder or decoder, enabling efficient inference at any resolution.
|
|
| 29 |
|
| 30 |
## Key Features
|
| 31 |
|
| 32 |
-
- **Fast**: ~3 ms/img encode, ~6 ms/img decode (1 step) on
|
| 33 |
faster than Flux.2 VAE
|
| 34 |
- **High fidelity**: 38.6 dB mean PSNR (2k images), exceeding Flux.2 VAE (37.0 dB)
|
| 35 |
- **Semantically structured latents**: DINOv2-aligned, producing latents with
|
| 36 |
clear semantic segmentation visible in PCA projections
|
| 37 |
- **Comparable downstream convergence**: empirically matches the downstream
|
| 38 |
-
diffusion training convergence speed of Flux.2 and PS-VAE
|
| 39 |
- **Pure convolutional**: no attention in encoder/decoder, O(n) in spatial resolution
|
| 40 |
-
- **VP diffusion decoder**: single-step DDIM for PSNR-optimal, multi-step
|
| 41 |
with PDG for perceptual sharpening
|
| 42 |
|
| 43 |
## Architecture
|
|
|
|
| 29 |
|
| 30 |
## Key Features
|
| 31 |
|
| 32 |
+
- **Fast**: ~3 ms/img encode, ~6 ms/img decode (1 step) on Blackwell RTX Pro 6000 — significantly
|
| 33 |
faster than Flux.2 VAE
|
| 34 |
- **High fidelity**: 38.6 dB mean PSNR (2k images), exceeding Flux.2 VAE (37.0 dB)
|
| 35 |
- **Semantically structured latents**: DINOv2-aligned, producing latents with
|
| 36 |
clear semantic segmentation visible in PCA projections
|
| 37 |
- **Comparable downstream convergence**: empirically matches the downstream
|
| 38 |
+
diffusion training convergence speed of Flux.2 and PS-VAE
|
| 39 |
- **Pure convolutional**: no attention in encoder/decoder, O(n) in spatial resolution
|
| 40 |
+
- **VP diffusion decoder**: single-step DDIM for PSNR-optimal, optional multi-step
|
| 41 |
with PDG for perceptual sharpening
|
| 42 |
|
| 43 |
## Architecture
|