Upload technical_report_semantic.md with huggingface_hub
Browse files- technical_report_semantic.md +20 -8
technical_report_semantic.md
CHANGED
|
@@ -378,16 +378,14 @@ main encoder/decoder remain purely convolutional.
|
|
| 378 |
### 4.3 Noisy Alignment
|
| 379 |
|
| 380 |
Unlike standard representation alignment which operates on clean latents,
|
| 381 |
-
we align **noisy** latent versions. The noise level
|
| 382 |
-
Beta(2,2) distribution (concentrated around
|
| 383 |
-
linear interpolation:
|
| 384 |
|
| 385 |
-
|
| 386 |
-
z_noisy = (1 - 蟿) 路 z + 蟿 路 蔚, 蔚 ~ N(0, I), 蟿 ~ Beta(2, 2)
|
| 387 |
-
```
|
| 388 |
|
| 389 |
-
The projection head receives both the noisy latents and the noise level
|
| 390 |
-
(via its AdaLN conditioning). This trains the head to extract semantic
|
| 391 |
information even from partially corrupted latents, improving robustness
|
| 392 |
for downstream diffusion models which operate on noised latent inputs.
|
| 393 |
|
|
@@ -580,6 +578,20 @@ z_sampled = posterior.sample()
|
|
| 580 |
|
| 581 |
---
|
| 582 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 583 |
## 9. Results
|
| 584 |
|
| 585 |
## 7. Results
|
|
|
|
| 378 |
### 4.3 Noisy Alignment
|
| 379 |
|
| 380 |
Unlike standard representation alignment which operates on clean latents,
|
| 381 |
+
we align **noisy** latent versions. The noise level \\(\tau\\) is sampled from a
|
| 382 |
+
\\(\text{Beta}(2,2)\\) distribution (concentrated around \\(\tau = 0.5\\)) using
|
| 383 |
+
flow matching linear interpolation:
|
| 384 |
|
| 385 |
+
$$z_\text{noisy} = (1 - \tau) \, z + \tau \, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I), \quad \tau \sim \text{Beta}(2, 2)$$
|
|
|
|
|
|
|
| 386 |
|
| 387 |
+
The projection head receives both the noisy latents and the noise level
|
| 388 |
+
\\(\tau\\) (via its AdaLN conditioning). This trains the head to extract semantic
|
| 389 |
information even from partially corrupted latents, improving robustness
|
| 390 |
for downstream diffusion models which operate on noised latent inputs.
|
| 391 |
|
|
|
|
| 578 |
|
| 579 |
---
|
| 580 |
|
| 581 |
+
## Citation
|
| 582 |
+
|
| 583 |
+
```bibtex
|
| 584 |
+
@misc{semdisdiffae,
|
| 585 |
+
title = {SemDisDiffAE: A Semantically Disentangled Diffusion Autoencoder with FCDM Blocks},
|
| 586 |
+
author = {data-archetype},
|
| 587 |
+
year = {2026},
|
| 588 |
+
month = apr,
|
| 589 |
+
url = {https://huggingface.co/data-archetype/semdisdiffae},
|
| 590 |
+
}
|
| 591 |
+
```
|
| 592 |
+
|
| 593 |
+
---
|
| 594 |
+
|
| 595 |
## 9. Results
|
| 596 |
|
| 597 |
## 7. Results
|