krystv commited on
Commit
20523ee
Β·
verified Β·
1 Parent(s): c2b4760

Update README with VAE integration and verified datasets

Browse files
Files changed (1) hide show
  1. README.md +58 -51
README.md CHANGED
@@ -9,84 +9,92 @@ LiquidDiffusion is a **first-of-its-kind** image generation model that replaces
9
  ### Key Properties
10
  - βœ… **Zero attention layers** β€” fully convolutional + liquid time-gating
11
  - βœ… **Fully parallelizable** β€” no ODE solvers, no sequential scanning, no recurrence
12
- - βœ… **Pretrained VAE** β€” uses `stabilityai/sd-vae-ft-mse` for efficient latent-space training
13
  - βœ… **Fits 16GB VRAM** β€” tiny config runs 256px at batch=8 on T4 GPU
14
  - βœ… **Simple training** β€” Rectified Flow (MSE velocity prediction, no noise schedule)
15
- - βœ… **6 verified datasets** ready to use
16
 
17
- ## Quick Start
18
 
19
- Open the Colab notebook, pick your dataset from the dropdown, run all cells:
20
-
21
- **`LiquidDiffusion_Training.ipynb`**
22
-
23
- ### Verified Datasets (all tested βœ“)
24
-
25
- | Dataset | Size | Content |
26
- |---------|------|---------|
27
- | `nielsr/CelebA-faces` | 202K | Celebrity faces |
28
- | `huggan/flowers-102-categories` | 8K | Flowers |
29
- | `reach-vb/pokemon-blip-captions` | 833 | Pokemon art |
30
- | `huggan/anime-faces` | 21K | Anime faces |
31
- | `huggan/AFHQv2` | 16K | Cat/dog/wild animals |
32
- | `Norod78/cartoon-blip-captions` | 2K | Cartoon characters |
33
 
34
  ## Architecture
35
 
36
  ```
37
- Input (noisy latent 4ch) β†’ Conv Stem
38
- β†’ Encoder [LiquidDiffusionBlock Γ— N, with downsampling]
39
- β†’ Bottleneck [LiquidDiffusionBlock Γ— 2]
40
- β†’ Decoder [LiquidDiffusionBlock Γ— N, with upsampling + skip fusion]
41
- β†’ Conv Head β†’ Velocity prediction
42
  ```
43
 
44
- ### VAE Integration
45
- - **Encoder**: `stabilityai/sd-vae-ft-mse` (83M params, frozen)
46
- - **Latent space**: 4 channels, 8Γ— spatial downscale
47
- - **256px image β†’ 32Γ—32Γ—4 latent** (64Γ— fewer pixels to process!)
48
- - **Pre-caching**: Encode dataset once, then train without VAE on GPU (saves ~160MB VRAM)
49
 
50
- ### ParallelCfCBlock (Novel Contribution)
51
-
52
- Based on CfC Eq.10: `x(t) = Οƒ(-fΒ·t) βŠ™ g + (1 - Οƒ(-fΒ·t)) βŠ™ h`
53
 
54
  ```python
55
- # Three CfC heads from shared backbone
56
- gate = sigmoid(time_a(t_emb) * f(features) - time_b(t_emb))
57
- cfc_out = gate * g(features) + (1 - gate) * h(features)
58
-
59
- # Liquid relaxation residual
60
- α = exp(-softplus(ρ) * |t_emb_mean|)
61
- output = Ξ± * input + (1 - Ξ±) * cfc_out
62
  ```
63
 
64
- **Key insight**: Diffusion timestep `t` IS the liquid time constant. CfC gate naturally adapts to noise level.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  ## Model Configs
67
 
68
- | Config | Channels | Blocks | Params | 256px VRAM | Best For |
69
- |--------|----------|--------|--------|------------|----------|
70
- | tiny | [64, 128, 256] | [2, 2, 4] | ~23M | ~6 GB | Quick experiments, T4 |
71
- | small | [96, 192, 384] | [2, 3, 6] | ~69M | ~10 GB | Quality 256px, T4/A10G |
 
72
 
73
- ## Training Objective: Rectified Flow
74
 
 
75
  ```python
76
- x_t = (1 - t) * x0 + t * noise # linear interpolation
77
- v_target = noise - x0 # constant velocity
78
- loss = MSE(model(x_t, t), v_target) # simple MSE β€” no noise schedule!
79
  ```
80
 
 
 
81
  ## References
82
 
83
  | Paper | Contribution |
84
  |-------|-------------|
85
  | [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
86
- | [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE, stability |
87
  | [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
88
  | [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
89
- | [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM beats attention in diffusion |
90
  | [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |
91
 
92
  ## Files
@@ -95,12 +103,11 @@ loss = MSE(model(x_t, t), v_target) # simple MSE β€” no noise schedule!
95
  β”œβ”€β”€ liquid_diffusion/
96
  β”‚ β”œβ”€β”€ __init__.py
97
  β”‚ β”œβ”€β”€ model.py # Full model architecture
98
- β”‚ └── trainer.py # Rectified Flow trainer + dataset utils
99
- β”œβ”€β”€ LiquidDiffusion_Training.ipynb # Complete Colab notebook (VAE + 6 datasets)
100
  β”œβ”€β”€ test_model.py
101
  └── README.md
102
  ```
103
 
104
  ## License
105
-
106
  MIT
 
9
  ### Key Properties
10
  - βœ… **Zero attention layers** β€” fully convolutional + liquid time-gating
11
  - βœ… **Fully parallelizable** β€” no ODE solvers, no sequential scanning, no recurrence
12
+ - βœ… **Latent space training** β€” uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
13
  - βœ… **Fits 16GB VRAM** β€” tiny config runs 256px at batch=8 on T4 GPU
14
  - βœ… **Simple training** β€” Rectified Flow (MSE velocity prediction, no noise schedule)
15
+ - βœ… **6 verified datasets** β€” all tested and working with streaming support
16
 
17
+ ## Quick Start (Colab)
18
 
19
+ 1. Open `LiquidDiffusion_Training.ipynb` in Colab
20
+ 2. Select GPU runtime (T4)
21
+ 3. Pick a dataset from the dropdown (default: huggan/AFHQv2 β€” animal faces)
22
+ 4. Run all cells β†’ training starts, samples generated every 500 steps
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Architecture
25
 
26
  ```
27
+ Pixel Image (3Γ—256Γ—256)
28
+ β†’ [Frozen SD-VAE Encode] β†’ Latent (4Γ—32Γ—32)
29
+ β†’ [LiquidDiffusion U-Net] β†’ Velocity prediction (4Γ—32Γ—32)
30
+ β†’ [Frozen SD-VAE Decode] β†’ Generated Image (3Γ—256Γ—256)
 
31
  ```
32
 
33
+ Each **LiquidDiffusionBlock** contains:
34
+ 1. **AdaLN** β€” timestep conditioning via learned scale/shift
35
+ 2. **ParallelCfCBlock** β€” the core liquid neural network layer (CfC Eq.10)
36
+ 3. **MultiScaleSpatialMix** β€” 3Γ—3+5Γ—5+7Γ—7 depthwise conv + global pooling (replaces attention)
37
+ 4. **FeedForward** β€” channel mixing via 1Γ—1 conv
38
 
39
+ ### The ParallelCfC Block
 
 
40
 
41
  ```python
42
+ # CfC Eq.10 adapted for images:
43
+ gate = Οƒ(time_a(t_emb) Β· f(features) - time_b(t_emb)) # liquid time-gating
44
+ out = gate Β· g(features) + (1 - gate) Β· h(features) # CfC interpolation
45
+ Ξ± = exp(-Ξ» Β· |t_emb|) # liquid relaxation
46
+ output = Ξ± Β· input + (1 - Ξ±) Β· out # time-aware residual
 
 
47
  ```
48
 
49
+ ## Verified Datasets
50
+
51
+ All tested and working (with streaming support):
52
+
53
+ | Dataset | Images | Description | Native Resolution |
54
+ |---------|--------|-------------|-------------------|
55
+ | `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512Γ—512 |
56
+ | `nielsr/CelebA-faces` | 202K | Celebrity faces | 178Γ—218 |
57
+ | `huggan/flowers-102-categories` | 8K | Flower photographs | Variable |
58
+ | `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280Γ—1280 |
59
+ | `huggan/anime-faces` | 63K | Anime faces | 64Γ—64 |
60
+ | `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512Γ—512 |
61
+
62
+ ## VAE
63
+
64
+ Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training):
65
+ - 4 latent channels, 8Γ— spatial downscale
66
+ - PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
67
+ - ~160MB VRAM in fp16
68
+ - Scaling factor: 0.18215
69
 
70
  ## Model Configs
71
 
72
+ | Config | Params | 256px VRAM (w/ VAE) | 512px VRAM |
73
+ |--------|--------|---------------------|------------|
74
+ | tiny | ~23M | ~6 GB | ~12 GB |
75
+ | small | ~69M | ~10 GB | ~20 GB |
76
+ | base | ~154M | ~16 GB | ~30 GB |
77
 
78
+ ## Training
79
 
80
+ **Objective**: Rectified Flow β€” simple MSE on velocity
81
  ```python
82
+ x_t = (1 - t) Β· x0 + t Β· noise # linear interpolation
83
+ v_target = noise - x0 # constant velocity
84
+ loss = MSE(model(x_t, t), v_target) # that's it!
85
  ```
86
 
87
+ **Sampling**: Euler ODE integration, 25-50 steps
88
+
89
  ## References
90
 
91
  | Paper | Contribution |
92
  |-------|-------------|
93
  | [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form |
94
+ | [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE |
95
  | [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation |
96
  | [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion |
97
+ | [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion |
98
  | [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training |
99
 
100
  ## Files
 
103
  β”œβ”€β”€ liquid_diffusion/
104
  β”‚ β”œβ”€β”€ __init__.py
105
  β”‚ β”œβ”€β”€ model.py # Full model architecture
106
+ β”‚ └── trainer.py # Trainer + dataset utilities
107
+ β”œβ”€β”€ LiquidDiffusion_Training.ipynb # Complete Colab notebook
108
  β”œβ”€β”€ test_model.py
109
  └── README.md
110
  ```
111
 
112
  ## License
 
113
  MIT