AxionLab-Co
/

PokePixels1-9M

Unconditional Image Generation

Model card Files Files and versions

AxionLab-official commited on 12 days ago

Commit

40b58e8

·

verified ·

1 Parent(s): 21cc58b

Update README.md

Files changed (1) hide show

README.md +134 -3

README.md CHANGED Viewed

@@ -1,3 +1,134 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- svjack/pokemon-blip-captions-en-zh
+pipeline_tag: unconditional-image-generation
+tags:
+- diffusion
+- tiny
+- pokemon
+- U-Net
+- from_scratch
+- 9m
+- pokepixels
+- pixels
+- diff
+- diffusers
+---
+# PokéPixels1-9M (CPU)
+A minimal diffusion model trained **from scratch on CPU**.
+This project explores the lower limits of diffusion models:
+**How small and simple can a diffusion model be while still producing recognizable images?**
+---
+## 🧠 Overview
+TinyPokemonDiffusion is a lightweight DDPM-based generative model trained on Pokémon images.
+Despite its small size and CPU-only training, the model learns:
+- Color distributions
+- Basic shapes
+- Early-stage object structure
+---
+## ⚙️ Specifications
+| Component        | Value |
+|------------------|------|
+| Parameters       | ~9M |
+| Resolution       | 64x64 |
+| Training Device  | CPU (Ryzen 5 5600G) |
+| Training Time    | ~5.5 hours |
+| Dataset          | pokemon-blip-captions |
+| Architecture     | Custom UNet |
+| Precision        | float32 |
+---
+## 🧪 Features
+- Full DDPM implementation from scratch
+- Custom UNet with attention blocks
+- CPU-optimized training
+- Deterministic sampling (seed support)
+- Config-driven architecture
+---
+## 🖼️ Results
+The model generates:
+- Coherent color palettes
+- Recognizable Pokémon-like silhouettes
+- Early-stage structure formation
+Limitations:
+- Blurry outputs
+- Weak spatial consistency
+- No semantic understanding
+---
+## 🚀 Usage
+### Generate images
+```bash
+python generate.py \
+  --checkpoint model.pt \
+  --n_images 8 \
+  --steps 50 \
+  --seed 42
+📁 Output
+Generated images are saved as a horizontal grid:
+outputs/generated.png
+>> ⚠️ Limitations
+Unconditional model (no prompts)
+Limited dataset diversity
+Early training stage
+No DDIM (yet)
+>> 🔬 Research Direction
+This project demonstrates that:
+Diffusion models can learn meaningful visual structure even at extremely small scales.
+Future work:
+Conditional generation (class-based)
+Text-to-image (v2.0)
+DDIM sampling
+Larger model variants
+💡 Motivation
+Most diffusion research focuses on scaling up.
+This project explores the opposite direction:
+What is the minimum viable diffusion model?
+📜 License
+MIT
+🙌 Acknowledgments
+Hugging Face datasets
+PyTorch
+The open-source AI community
+⭐ If you like this project:
+Give it a star and follow the evolution to v2.0(conditional) 🚀