Update README with VAE integration and verified datasets
Browse files
README.md
CHANGED
|
@@ -9,84 +9,92 @@ LiquidDiffusion is a **first-of-its-kind** image generation model that replaces
|
|
| 9 |
### Key Properties
|
| 10 |
- β
**Zero attention layers** β fully convolutional + liquid time-gating
|
| 11 |
- β
**Fully parallelizable** β no ODE solvers, no sequential scanning, no recurrence
|
| 12 |
-
- β
**
|
| 13 |
- β
**Fits 16GB VRAM** β tiny config runs 256px at batch=8 on T4 GPU
|
| 14 |
- β
**Simple training** β Rectified Flow (MSE velocity prediction, no noise schedule)
|
| 15 |
-
- β
**6 verified datasets**
|
| 16 |
|
| 17 |
-
## Quick Start
|
| 18 |
|
| 19 |
-
Open
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
### Verified Datasets (all tested β)
|
| 24 |
-
|
| 25 |
-
| Dataset | Size | Content |
|
| 26 |
-
|---------|------|---------|
|
| 27 |
-
| `nielsr/CelebA-faces` | 202K | Celebrity faces |
|
| 28 |
-
| `huggan/flowers-102-categories` | 8K | Flowers |
|
| 29 |
-
| `reach-vb/pokemon-blip-captions` | 833 | Pokemon art |
|
| 30 |
-
| `huggan/anime-faces` | 21K | Anime faces |
|
| 31 |
-
| `huggan/AFHQv2` | 16K | Cat/dog/wild animals |
|
| 32 |
-
| `Norod78/cartoon-blip-captions` | 2K | Cartoon characters |
|
| 33 |
|
| 34 |
## Architecture
|
| 35 |
|
| 36 |
```
|
| 37 |
-
|
| 38 |
-
β
|
| 39 |
-
|
| 40 |
-
β
|
| 41 |
-
β Conv Head β Velocity prediction
|
| 42 |
```
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
|
| 50 |
-
###
|
| 51 |
-
|
| 52 |
-
Based on CfC Eq.10: `x(t) = Ο(-fΒ·t) β g + (1 - Ο(-fΒ·t)) β h`
|
| 53 |
|
| 54 |
```python
|
| 55 |
-
#
|
| 56 |
-
gate =
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
Ξ± = exp(-softplus(Ο) * |t_emb_mean|)
|
| 61 |
-
output = Ξ± * input + (1 - Ξ±) * cfc_out
|
| 62 |
```
|
| 63 |
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
## Model Configs
|
| 67 |
|
| 68 |
-
| Config |
|
| 69 |
-
|--------|--------
|
| 70 |
-
| tiny |
|
| 71 |
-
| small |
|
|
|
|
| 72 |
|
| 73 |
-
## Training
|
| 74 |
|
|
|
|
| 75 |
```python
|
| 76 |
-
x_t = (1 - t)
|
| 77 |
-
v_target = noise - x0
|
| 78 |
-
loss = MSE(model(x_t, t), v_target) #
|
| 79 |
```
|
| 80 |
|
|
|
|
|
|
|
| 81 |
## References
|
| 82 |
|
| 83 |
| Paper | Contribution |
|
| 84 |
|-------|-------------|
|
| 85 |
| [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
|
| 86 |
-
| [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE
|
| 87 |
| [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
|
| 88 |
| [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
|
| 89 |
-
| [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM
|
| 90 |
| [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |
|
| 91 |
|
| 92 |
## Files
|
|
@@ -95,12 +103,11 @@ loss = MSE(model(x_t, t), v_target) # simple MSE β no noise schedule!
|
|
| 95 |
βββ liquid_diffusion/
|
| 96 |
β βββ __init__.py
|
| 97 |
β βββ model.py # Full model architecture
|
| 98 |
-
β βββ trainer.py #
|
| 99 |
-
βββ LiquidDiffusion_Training.ipynb # Complete Colab notebook
|
| 100 |
βββ test_model.py
|
| 101 |
βββ README.md
|
| 102 |
```
|
| 103 |
|
| 104 |
## License
|
| 105 |
-
|
| 106 |
MIT
|
|
|
|
| 9 |
### Key Properties
|
| 10 |
- β
**Zero attention layers** β fully convolutional + liquid time-gating
|
| 11 |
- β
**Fully parallelizable** β no ODE solvers, no sequential scanning, no recurrence
|
| 12 |
+
- β
**Latent space training** β uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
|
| 13 |
- β
**Fits 16GB VRAM** β tiny config runs 256px at batch=8 on T4 GPU
|
| 14 |
- β
**Simple training** β Rectified Flow (MSE velocity prediction, no noise schedule)
|
| 15 |
+
- β
**6 verified datasets** β all tested and working with streaming support
|
| 16 |
|
| 17 |
+
## Quick Start (Colab)
|
| 18 |
|
| 19 |
+
1. Open `LiquidDiffusion_Training.ipynb` in Colab
|
| 20 |
+
2. Select GPU runtime (T4)
|
| 21 |
+
3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β animal faces)
|
| 22 |
+
4. Run all cells β training starts, samples generated every 500 steps
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Architecture
|
| 25 |
|
| 26 |
```
|
| 27 |
+
Pixel Image (3Γ256Γ256)
|
| 28 |
+
β [Frozen SD-VAE Encode] β Latent (4Γ32Γ32)
|
| 29 |
+
β [LiquidDiffusion U-Net] β Velocity prediction (4Γ32Γ32)
|
| 30 |
+
β [Frozen SD-VAE Decode] β Generated Image (3Γ256Γ256)
|
|
|
|
| 31 |
```
|
| 32 |
|
| 33 |
+
Each **LiquidDiffusionBlock** contains:
|
| 34 |
+
1. **AdaLN** β timestep conditioning via learned scale/shift
|
| 35 |
+
2. **ParallelCfCBlock** β the core liquid neural network layer (CfC Eq.10)
|
| 36 |
+
3. **MultiScaleSpatialMix** β 3Γ3+5Γ5+7Γ7 depthwise conv + global pooling (replaces attention)
|
| 37 |
+
4. **FeedForward** β channel mixing via 1Γ1 conv
|
| 38 |
|
| 39 |
+
### The ParallelCfC Block
|
|
|
|
|
|
|
| 40 |
|
| 41 |
```python
|
| 42 |
+
# CfC Eq.10 adapted for images:
|
| 43 |
+
gate = Ο(time_a(t_emb) Β· f(features) - time_b(t_emb)) # liquid time-gating
|
| 44 |
+
out = gate Β· g(features) + (1 - gate) Β· h(features) # CfC interpolation
|
| 45 |
+
Ξ± = exp(-Ξ» Β· |t_emb|) # liquid relaxation
|
| 46 |
+
output = Ξ± Β· input + (1 - Ξ±) Β· out # time-aware residual
|
|
|
|
|
|
|
| 47 |
```
|
| 48 |
|
| 49 |
+
## Verified Datasets
|
| 50 |
+
|
| 51 |
+
All tested and working (with streaming support):
|
| 52 |
+
|
| 53 |
+
| Dataset | Images | Description | Native Resolution |
|
| 54 |
+
|---------|--------|-------------|-------------------|
|
| 55 |
+
| `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512Γ512 |
|
| 56 |
+
| `nielsr/CelebA-faces` | 202K | Celebrity faces | 178Γ218 |
|
| 57 |
+
| `huggan/flowers-102-categories` | 8K | Flower photographs | Variable |
|
| 58 |
+
| `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280Γ1280 |
|
| 59 |
+
| `huggan/anime-faces` | 63K | Anime faces | 64Γ64 |
|
| 60 |
+
| `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512Γ512 |
|
| 61 |
+
|
| 62 |
+
## VAE
|
| 63 |
+
|
| 64 |
+
Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training):
|
| 65 |
+
- 4 latent channels, 8Γ spatial downscale
|
| 66 |
+
- PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
|
| 67 |
+
- ~160MB VRAM in fp16
|
| 68 |
+
- Scaling factor: 0.18215
|
| 69 |
|
| 70 |
## Model Configs
|
| 71 |
|
| 72 |
+
| Config | Params | 256px VRAM (w/ VAE) | 512px VRAM |
|
| 73 |
+
|--------|--------|---------------------|------------|
|
| 74 |
+
| tiny | ~23M | ~6 GB | ~12 GB |
|
| 75 |
+
| small | ~69M | ~10 GB | ~20 GB |
|
| 76 |
+
| base | ~154M | ~16 GB | ~30 GB |
|
| 77 |
|
| 78 |
+
## Training
|
| 79 |
|
| 80 |
+
**Objective**: Rectified Flow β simple MSE on velocity
|
| 81 |
```python
|
| 82 |
+
x_t = (1 - t) Β· x0 + t Β· noise # linear interpolation
|
| 83 |
+
v_target = noise - x0 # constant velocity
|
| 84 |
+
loss = MSE(model(x_t, t), v_target) # that's it!
|
| 85 |
```
|
| 86 |
|
| 87 |
+
**Sampling**: Euler ODE integration, 25-50 steps
|
| 88 |
+
|
| 89 |
## References
|
| 90 |
|
| 91 |
| Paper | Contribution |
|
| 92 |
|-------|-------------|
|
| 93 |
| [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
|
| 94 |
+
| [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE |
|
| 95 |
| [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
|
| 96 |
| [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
|
| 97 |
+
| [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion |
|
| 98 |
| [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |
|
| 99 |
|
| 100 |
## Files
|
|
|
|
| 103 |
βββ liquid_diffusion/
|
| 104 |
β βββ __init__.py
|
| 105 |
β βββ model.py # Full model architecture
|
| 106 |
+
β βββ trainer.py # Trainer + dataset utilities
|
| 107 |
+
βββ LiquidDiffusion_Training.ipynb # Complete Colab notebook
|
| 108 |
βββ test_model.py
|
| 109 |
βββ README.md
|
| 110 |
```
|
| 111 |
|
| 112 |
## License
|
|
|
|
| 113 |
MIT
|