AxionLab-official commited on
Commit
40b58e8
·
verified ·
1 Parent(s): 21cc58b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +134 -3
README.md CHANGED
@@ -1,3 +1,134 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - svjack/pokemon-blip-captions-en-zh
5
+ pipeline_tag: unconditional-image-generation
6
+ tags:
7
+ - diffusion
8
+ - tiny
9
+ - pokemon
10
+ - U-Net
11
+ - from_scratch
12
+ - 9m
13
+ - pokepixels
14
+ - pixels
15
+ - diff
16
+ - diffusers
17
+ ---
18
+
19
+ # PokéPixels1-9M (CPU)
20
+
21
+ A minimal diffusion model trained **from scratch on CPU**.
22
+
23
+ This project explores the lower limits of diffusion models:
24
+ **How small and simple can a diffusion model be while still producing recognizable images?**
25
+
26
+ ---
27
+
28
+ ## 🧠 Overview
29
+
30
+ TinyPokemonDiffusion is a lightweight DDPM-based generative model trained on Pokémon images.
31
+
32
+ Despite its small size and CPU-only training, the model learns:
33
+ - Color distributions
34
+ - Basic shapes
35
+ - Early-stage object structure
36
+
37
+ ---
38
+
39
+ ## ⚙️ Specifications
40
+
41
+ | Component | Value |
42
+ |------------------|------|
43
+ | Parameters | ~9M |
44
+ | Resolution | 64x64 |
45
+ | Training Device | CPU (Ryzen 5 5600G) |
46
+ | Training Time | ~5.5 hours |
47
+ | Dataset | pokemon-blip-captions |
48
+ | Architecture | Custom UNet |
49
+ | Precision | float32 |
50
+
51
+ ---
52
+
53
+ ## 🧪 Features
54
+
55
+ - Full DDPM implementation from scratch
56
+ - Custom UNet with attention blocks
57
+ - CPU-optimized training
58
+ - Deterministic sampling (seed support)
59
+ - Config-driven architecture
60
+
61
+ ---
62
+
63
+ ## 🖼️ Results
64
+
65
+ The model generates:
66
+
67
+ - Coherent color palettes
68
+ - Recognizable Pokémon-like silhouettes
69
+ - Early-stage structure formation
70
+
71
+ Limitations:
72
+ - Blurry outputs
73
+ - Weak spatial consistency
74
+ - No semantic understanding
75
+
76
+ ---
77
+
78
+ ## 🚀 Usage
79
+
80
+ ### Generate images
81
+
82
+ ```bash
83
+ python generate.py \
84
+ --checkpoint model.pt \
85
+ --n_images 8 \
86
+ --steps 50 \
87
+ --seed 42
88
+ 📁 Output
89
+
90
+ Generated images are saved as a horizontal grid:
91
+
92
+ outputs/generated.png
93
+
94
+ >> ⚠️ Limitations
95
+
96
+ Unconditional model (no prompts)
97
+
98
+ Limited dataset diversity
99
+ Early training stage
100
+ No DDIM (yet)
101
+
102
+ >> 🔬 Research Direction
103
+
104
+ This project demonstrates that:
105
+
106
+ Diffusion models can learn meaningful visual structure even at extremely small scales.
107
+
108
+ Future work:
109
+
110
+ Conditional generation (class-based)
111
+ Text-to-image (v2.0)
112
+ DDIM sampling
113
+ Larger model variants
114
+ 💡 Motivation
115
+
116
+ Most diffusion research focuses on scaling up.
117
+
118
+ This project explores the opposite direction:
119
+
120
+ What is the minimum viable diffusion model?
121
+
122
+ 📜 License
123
+
124
+ MIT
125
+
126
+ 🙌 Acknowledgments
127
+
128
+ Hugging Face datasets
129
+ PyTorch
130
+ The open-source AI community
131
+
132
+ ⭐ If you like this project:
133
+
134
+ Give it a star and follow the evolution to v2.0(conditional) 🚀