Pacific-Prime commited on
Commit
d1eb833
·
verified ·
1 Parent(s): f2165c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -79
README.md CHANGED
@@ -1,79 +1,90 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- tags:
4
- - vae
5
- - image-generation
6
- - diffusion
7
- - inl-diffusion
8
- library_name: pytorch
9
- pipeline_tag: image-to-image
10
- ---
11
-
12
- # INL-Diffusion VAE
13
-
14
- Variational Autoencoder for INL-Diffusion image generation pipeline.
15
-
16
- ## Architecture
17
-
18
- **89M parameters** | 256x256 images | 4-channel latent space
19
-
20
- ### Encoder
21
- $$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$
22
-
23
- Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).
24
-
25
- ### Decoder
26
- $$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$
27
-
28
- ### Loss Function
29
- $$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$
30
-
31
- Where:
32
- - $\mathcal{L}_{\text{recon}} = \|x - \hat{x}\|_1$ (L1 reconstruction)
33
- - $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
34
- - $\mathcal{L}_{\text{perceptual}}$ uses VGG features
35
-
36
- ## Config
37
-
38
- | Parameter | Value |
39
- |-----------|-------|
40
- | Image size | 256x256 |
41
- | Latent dim | 4 |
42
- | Base channels | 128 |
43
- | Channel mult | [1, 2, 4, 4] |
44
- | Res blocks | 2 |
45
-
46
- ## Usage
47
-
48
- ```python
49
- from safetensors.torch import load_file
50
- from inl_diffusion.vae import INLVAE
51
-
52
- # Load
53
- state_dict = load_file("model.safetensors")
54
- vae = INLVAE(image_size=256, base_channels=128, latent_dim=4)
55
- vae.load_state_dict(state_dict)
56
-
57
- # Encode
58
- latents = vae.encode(images) # [B, 4, 32, 32]
59
-
60
- # Decode
61
- reconstructed = vae.decode(latents) # [B, 3, 256, 256]
62
- ```
63
-
64
- ## Training
65
-
66
- Trained on WikiArt (81K images) for 15K steps with:
67
- - Batch size: 16
68
- - Learning rate: 1e-4
69
- - Mixed precision: bf16
70
-
71
- ### Training Curves
72
-
73
- ![Training Curves](training_curves.png)
74
-
75
- ## License
76
-
77
- CC BY-NC 4.0 - Attribution-NonCommercial
78
-
79
- Commercial use requires explicit permission from the author.
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - vae
5
+ - image-generation
6
+ - diffusion
7
+ - complexity-diffusion
8
+ library_name: pytorch
9
+ pipeline_tag: image-to-image
10
+ ---
11
+
12
+ # Complexity-Diffusion VAE
13
+
14
+ Variational Autoencoder for Complexity-Diffusion image generation pipeline.
15
+
16
+ ## Architecture
17
+
18
+ **89M parameters** | 256x256 images | 4-channel latent space
19
+
20
+ ### Encoder
21
+ $$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$
22
+
23
+ Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).
24
+
25
+ ### Decoder
26
+ $$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$
27
+
28
+ ### Loss Function
29
+ $$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$
30
+
31
+ Where:
32
+ - $\mathcal{L}_{\text{recon}} = \|x - \hat{x}\|_1$ (L1 reconstruction)
33
+ - $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
34
+ - $\mathcal{L}_{\text{perceptual}}$ uses VGG features
35
+
36
+ ## Config
37
+
38
+ | Parameter | Value |
39
+ |-----------|-------|
40
+ | Image size | 256x256 |
41
+ | Latent dim | 4 |
42
+ | Base channels | 128 |
43
+ | Channel mult | [1, 2, 4, 4] |
44
+ | Res blocks | 2 |
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ from safetensors.torch import load_file
50
+ from complexity_diffusion.vae import ComplexityVAE
51
+
52
+ # Load
53
+ state_dict = load_file("model.safetensors")
54
+ vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4)
55
+ vae.load_state_dict(state_dict)
56
+
57
+ # Encode
58
+ latents = vae.encode(images) # [B, 4, 32, 32]
59
+
60
+ # Decode
61
+ reconstructed = vae.decode(latents) # [B, 3, 256, 256]
62
+ ```
63
+
64
+ ## Training
65
+
66
+ Trained on WikiArt (81K images) for 15K steps with:
67
+ - Batch size: 16
68
+ - Learning rate: 1e-4
69
+ - Mixed precision: bf16
70
+
71
+ ### Training Curves
72
+
73
+ ![Training Curves](training_curves.png)
74
+
75
+ ## Part of Complexity Deep Ecosystem
76
+
77
+ This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging:
78
+ - **INL Dynamics** for stable latent space training
79
+ - **Token-Routed architecture** for efficient processing
80
+
81
+ ## Links
82
+
83
+ - [Complexity Deep](https://huggingface.co/Pacific-Prime)
84
+ - [PyPI Package](https://pypi.org/project/complexity-deep/)
85
+
86
+ ## License
87
+
88
+ CC BY-NC 4.0 - Attribution-NonCommercial
89
+
90
+ Commercial use requires explicit permission from the author.