data-archetype
/

semdisdiffae

image-reconstruction

image-tokenizer

semantic-alignment

Model card Files Files and versions

data-archetype commited on 4 days ago

Commit

cd3fa86

·

verified ·

1 Parent(s): 6f1ba8d

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -29,15 +29,15 @@ encoder or decoder, enabling efficient inference at any resolution.
 ## Key Features
-- **Fast**: ~3 ms/img encode, ~6 ms/img decode (1 step) on H200 — significantly
   faster than Flux.2 VAE
 - **High fidelity**: 38.6 dB mean PSNR (2k images), exceeding Flux.2 VAE (37.0 dB)
 - **Semantically structured latents**: DINOv2-aligned, producing latents with
   clear semantic segmentation visible in PCA projections
 - **Comparable downstream convergence**: empirically matches the downstream
-  diffusion training convergence speed of Flux.2 and PS-VAE v2
 - **Pure convolutional**: no attention in encoder/decoder, O(n) in spatial resolution
-- **VP diffusion decoder**: single-step DDIM for PSNR-optimal, multi-step
   with PDG for perceptual sharpening
 ## Architecture

 ## Key Features
+- **Fast**: ~3 ms/img encode, ~6 ms/img decode (1 step) on Blackwell RTX Pro 6000 — significantly
   faster than Flux.2 VAE
 - **High fidelity**: 38.6 dB mean PSNR (2k images), exceeding Flux.2 VAE (37.0 dB)
 - **Semantically structured latents**: DINOv2-aligned, producing latents with
   clear semantic segmentation visible in PCA projections
 - **Comparable downstream convergence**: empirically matches the downstream
+  diffusion training convergence speed of Flux.2 and PS-VAE
 - **Pure convolutional**: no attention in encoder/decoder, O(n) in spatial resolution
+- **VP diffusion decoder**: single-step DDIM for PSNR-optimal, optional multi-step
   with PDG for perceptual sharpening
 ## Architecture