harsh99 commited on
Commit
b9e9532
Β·
1 Parent(s): 16d759f

readme file updates

Browse files
Files changed (1) hide show
  1. README.md +126 -143
README.md CHANGED
@@ -1,13 +1,11 @@
1
- # stable-diffusion
2
-
3
  # 🎨 Stable Diffusion & CatVTON Implementation
4
 
5
  <div align="center">
6
 
7
- ![Stable Diffusion](https://img.shields.io/badge/Stable%20Diffusion-From%20Scratch-blue?style=for-the-badge&logo=pytorch)
8
  ![CatVTON](https://img.shields.io/badge/CatVTON-Virtual%20Try--On-purple?style=for-the-badge)
9
- ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white)
10
- ![Python](https://img.shields.io/badge/Python-3.10.9-green?style=for-the-badge&logo=python&logoColor=white)
11
 
12
  *A comprehensive implementation of Stable Diffusion from scratch with CatVTON virtual try-on capabilities*
13
 
@@ -15,193 +13,199 @@
15
 
16
  ---
17
 
18
- ## πŸ“‹ Table of Contents
19
 
20
- - [🌟 Overview](#-overview)
21
- - [πŸ—οΈ Project Structure](#️-project-structure)
22
- - [πŸš€ Features](#-features)
23
- - [βš™οΈ Setup & Installation](#️-setup--installation)
24
- - [πŸ“₯ Model Downloads](#-model-downloads)
25
- - [🎯 CatVTON Integration](#-catvton-integration)
26
- - [πŸ“š References](#-references)
27
- - [πŸ‘€ Author](#-author)
28
- - [πŸ“œ License](#-license)
29
 
30
  ---
31
 
32
- ## 🌟 Overview
33
 
34
- This project implements **Stable Diffusion from scratch** using PyTorch, with an additional **CatVTON (Virtual Cloths Try-On)** model built on top of stable-diffusion. The implementation includes:
35
 
36
- - ✨ Complete Stable Diffusion pipeline built from ground up **(Branch: Main)**
37
- - 🎭 CatVTON model for virtual garment try-on **(Branch: CatVTON)**
38
- - 🧠 Custom attention mechanisms and CLIP integration
39
- - πŸ”„ DDPM (Denoising Diffusion Probabilistic Models) implementation
40
- - πŸ–ΌοΈ Inpainting capabilities using pretrained weights
41
 
42
  ---
43
 
44
- ## πŸ—οΈ Project Structure
45
 
46
- ```
47
  stable-diffusion/
48
- β”œβ”€β”€ πŸ“ Core Components
49
  β”‚ β”œβ”€β”€ attention.py # Attention mechanisms
50
- β”‚ β”œβ”€β”€ clip.py # CLIP model implementation
51
- β”‚ β”œβ”€β”€ ddpm.py # DDPM sampler
52
- β”‚ β”œβ”€β”€ decoder.py # VAE decoder
53
- β”‚ β”œβ”€β”€ encoder.py # VAE encoder
54
- β”‚ β”œβ”€β”€ diffusion.py # Diffusion process
55
- β”‚ β”œβ”€β”€ model.py # Defining model & loading pre-trained weights
56
- β”‚ └── pipeline.py # Main pipeline
57
  β”‚
58
- β”œβ”€β”€ πŸ“ Utilities & Interface
59
- β”‚ β”œβ”€β”€ interface.py # User interface
60
- β”‚ β”œβ”€β”€ model_converter.py # Model conversion utilities
61
- β”‚ └── requirements.txt # Dependencies
62
  β”‚
63
- β”œβ”€β”€ πŸ“ Data & Models
64
- β”‚ β”œβ”€β”€ vocab.json # Tokenizer vocabulary
65
- β”‚ β”œβ”€β”€ merges.txt # BPE merges
66
- β”‚ β”œβ”€β”€ inkpunk-diffusion-v1.ckpt # Inkpunk model weights
67
- β”‚ └── sd-v1-5-inpainting.ckpt # Inpainting model weights
68
  β”‚
69
- β”œβ”€β”€ πŸ“ Sample Data
70
- β”‚ β”œβ”€β”€ person.jpg # Person image for try-on
71
- β”‚ β”œβ”€β”€ garment.jpg # Garment image
72
- β”‚ β”œβ”€β”€ agnostic_mask.png # Segmentation mask
73
- β”‚ β”œβ”€β”€ dog.jpg # Test image
74
- β”‚ β”œβ”€β”€ image.png # Generated sample
75
- β”‚ └── zalando-hd-resized.zip # Dataset
76
  β”‚
77
- └── πŸ“ Notebooks & Documentation
78
- β”œβ”€β”€ test.ipynb # Testing notebook
79
- └── README.md # This file
80
  ```
81
 
82
  ---
83
 
84
- ## πŸš€ Features
 
 
 
 
 
 
 
 
85
 
86
- ### 🎨 Stable Diffusion Core
87
- - **From-scratch implementation** of Stable Diffusion architecture
88
- - **Custom CLIP** text encoder integration
89
- - **VAE encoder/decoder** for latent space operations
90
- - **DDPM sampling** with configurable steps
91
- - **Attention mechanisms** optimized for diffusion
92
 
93
- ### πŸ‘• CatVTON Capabilities
94
- - **Virtual garment try-on** using inpainting
95
- - **Person-garment alignment** and fitting
96
- - **Mask-based inpainting** for realistic results
97
 
98
  ---
99
 
100
- ## βš™οΈ Setup & Installation
101
 
102
  ### Prerequisites
103
- - Python 3.10.9
104
- - CUDA-compatible GPU (recommended)
105
- - Git
106
 
107
- ### 1. Clone Repository
 
 
 
 
 
108
  ```bash
109
  git clone https://github.com/Harsh-Kesharwani/stable-diffusion.git
110
  cd stable-diffusion
111
- git checkout CatVTON # Switch to CatVTON branch to use virtual-try-on model
112
  ```
113
 
114
- ### 2. Create Virtual Environment
 
115
  ```bash
116
  conda create -n stable-diffusion python=3.10.9
117
  conda activate stable-diffusion
118
  ```
119
 
120
- ### 3. Install Dependencies
 
121
  ```bash
122
  pip install -r requirements.txt
123
  ```
124
 
125
- ### 4. Verify Installation
 
126
  ```bash
127
- python -c "import torch; print(f'PyTorch version: {torch.__version__}')"
128
- python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
129
  ```
130
 
131
  ---
132
 
133
- ## πŸ“₯ Model Downloads
 
 
 
 
 
134
 
135
- ### Required Files
136
 
137
- #### 1. Tokenizer Files
138
- Download from [Stable Diffusion v1.4 Tokenizer](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main/tokenizer):
139
- - `vocab.json`
140
- - `merges.txt`
141
 
142
- #### 2. Model Checkpoints
143
- - **Inkpunk Diffusion**: Download `inkpunk-diffusion-v1.ckpt` from [Envvi/Inkpunk-Diffusion](https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main)
144
- - **Inpainting Model**: Download from [Stable Diffusion v1.5 Inpainting](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting)
145
 
146
  ### Download Script
 
147
  ```bash
148
- # Create data directory if it doesn't exist
149
  mkdir -p data
150
-
151
- # Download tokenizer files
152
- wget -O vocab.json "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json"
153
- wget -O merges.txt "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/merges.txt"
154
-
155
- # Note: Large model files need to be downloaded manually from HuggingFace
156
  ```
157
 
158
  ---
159
 
160
- ### Interactive Interface
 
 
 
 
 
 
 
 
 
 
 
161
  ```bash
162
  python interface.py
163
  ```
164
 
165
  ---
166
 
167
- ## 🎯 CatVTON Integration
168
 
169
- The CatVTON model extends the base Stable Diffusion with specialized capabilities for virtual garment try-on:
170
 
171
- ### Key Components
172
- 1. **Inpainting Pipeline**: Uses `sd-v1-5-inpainting.ckpt` for mask-based generation
173
- 2. **Garment Alignment**: Automatic alignment of garments to person pose
174
- 3. **Mask Generation**: Automated or manual mask creation for try-on regions
175
- ---
176
 
177
- ## πŸ“š References
178
 
179
- ### πŸ“– Implementation Guides
180
- - [Implementing Stable Diffusion from Scratch - Medium](https://medium.com/@sayedebad.777/implementing-stable-diffusion-from-scratch-using-pytorch-f07d50efcd97)
181
- - [Stable Diffusion Implementation - YouTube](https://www.youtube.com/watch?v=ZBKpAp_6TGI)
182
 
183
- ### πŸ€— HuggingFace Resources
184
- - [Diffusers: Adapt a Model](https://huggingface.co/docs/diffusers/training/adapt_a_model)
185
- - [Stable Diffusion v1.5 Inpainting](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting)
186
- - [CompVis Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4)
187
- - [Inkpunk Diffusion](https://huggingface.co/Envvi/Inkpunk-Diffusion)
188
 
189
- ### πŸ“„ Academic Papers
190
- - Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models
191
- - DDPM: Denoising Diffusion Probabilistic Models
192
- - CatVTON: Category-aware Virtual Try-On Network
193
 
194
  ---
195
 
196
- ## πŸ‘€ Author
197
 
198
  <div align="center">
199
 
200
  **Harsh Kesharwani**
201
 
202
- [![GitHub](https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Harsh-Kesharwani)
203
- [![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/harsh-kesharwani/)
204
- [![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:harshkesharwani777@gmail.com)
205
 
206
  *Passionate about AI, Computer Vision, and Generative Models*
207
 
@@ -209,46 +213,25 @@ The CatVTON model extends the base Stable Diffusion with specialized capabilitie
209
 
210
  ---
211
 
212
- ## 🀝 Contributing
213
-
214
- Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
215
-
216
- ### Development Setup
217
- 1. Fork the repository
218
- 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
219
- 3. Commit your changes (`git commit -m 'Add some amazing feature'`)
220
- 4. Push to the branch (`git push origin feature/amazing-feature`)
221
- 5. Open a Pull Request
222
-
223
- ---
224
-
225
- ## πŸ“œ License
226
 
227
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
228
 
229
  ---
230
 
231
- ## πŸ™ Acknowledgments
232
 
233
- - CompVis team for the original Stable Diffusion implementation
234
- - HuggingFace for providing pre-trained weights, dataset and references.
235
- - The open-source community for various implementations and tutorials
236
- - Zalando Research for the fashion dataset
237
 
238
  ---
239
 
240
  <div align="center">
241
 
242
- **⭐ Star this repository if you find it helpful!**
243
 
244
  *Built with ❀️ by [Harsh Kesharwani](https://www.linkedin.com/in/harsh-kesharwani/)*
245
 
246
  </div>
247
-
248
-
249
- <!-- 1. Download `vocab.json` and `merges.txt` from https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main/tokenizer and save them in the `data` folder
250
- 1. Download `inkpunk-diffusion-v1.ckpt` from https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main and save it in the `data` folder -->
251
-
252
- <!-- IMPORTANT REFRRENCE
253
- 3. https://huggingface.co/docs/diffusers/training/adapt_a_model -->
254
- <!-- 4. https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting -->
 
 
 
1
  # 🎨 Stable Diffusion & CatVTON Implementation
2
 
3
  <div align="center">
4
 
5
+ ![Stable Diffusion](https://img.shields.io/badge/Stable%20Diffusion-From%20Scratch-blue?style=for-the-badge\&logo=pytorch) <br>
6
  ![CatVTON](https://img.shields.io/badge/CatVTON-Virtual%20Try--On-purple?style=for-the-badge)
7
+ ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge\&logo=pytorch\&logoColor=white)
8
+ ![Python](https://img.shields.io/badge/Python-3.10.9-green?style=for-the-badge\&logo=python\&logoColor=white)
9
 
10
  *A comprehensive implementation of Stable Diffusion from scratch with CatVTON virtual try-on capabilities*
11
 
 
13
 
14
  ---
15
 
16
+ ## Table of Contents
17
 
18
+ * [Overview](#overview)
19
+ * [Project Structure](#project-structure)
20
+ * [Features](#features)
21
+ * [Setup & Installation](#setup--installation)
22
+ * [Model Downloads](#model-downloads)
23
+ * [CatVTON Integration](#catvton-integration)
24
+ * [References](#references)
25
+ * [Author](#author)
26
+ * [License](#license)
27
 
28
  ---
29
 
30
+ ## Overview
31
 
32
+ This project implements **Stable Diffusion from scratch** using PyTorch, extended with **CatVTON (Virtual Cloth Try-On)** for realistic fashion try-on.
33
 
34
+ * Complete Stable Diffusion pipeline (Branch: `main`)
35
+ * CatVTON virtual try-on extension (Branch: `CatVTON`)
36
+ * DDPM-based denoising, VAE, and custom attention
37
+ * Inpainting and text-to-image capabilities
 
38
 
39
  ---
40
 
41
+ ## Project Structure
42
 
43
+ ```text
44
  stable-diffusion/
45
+ β”œβ”€β”€ Core Components
46
  β”‚ β”œβ”€β”€ attention.py # Attention mechanisms
47
+ β”‚ β”œβ”€β”€ clip.py # CLIP model
48
+ β”‚ β”œβ”€β”€ ddpm.py # DDPM sampler
49
+ β”‚ β”œβ”€β”€ decoder.py # VAE decoder
50
+ β”‚ β”œβ”€β”€ encoder.py # VAE encoder
51
+ β”‚ β”œβ”€β”€ diffusion.py # Diffusion logic
52
+ β”‚ β”œβ”€β”€ model.py # Weight loading
53
+ β”‚ └── pipeline.py # Main pipeline logic
54
  β”‚
55
+ β”œβ”€β”€ Utilities & Interface
56
+ β”‚ β”œβ”€β”€ interface.py # Interactive script
57
+ β”‚ β”œβ”€β”€ model_converter.py # Weight conversion utilities
58
+ β”‚ └── requirements.txt # Python dependencies
59
  β”‚
60
+ β”œβ”€β”€ Data & Models
61
+ β”‚ β”œβ”€β”€ vocab.json
62
+ β”‚ β”œβ”€β”€ merges.txt
63
+ β”‚ β”œβ”€β”€ inkpunk-diffusion-v1.ckpt
64
+ β”‚ └── sd-v1-5-inpainting.ckpt
65
  β”‚
66
+ β”œβ”€β”€ Sample Data
67
+ β”‚ β”œβ”€β”€ person.jpg
68
+ β”‚ β”œβ”€β”€ garment.jpg
69
+ β”‚ β”œβ”€β”€ agnostic_mask.png
70
+ β”‚ β”œβ”€β”€ dog.jpg
71
+ β”‚ β”œβ”€β”€ image.png
72
+ β”‚ └── zalando-hd-resized.zip
73
  β”‚
74
+ └── Notebooks & Docs
75
+ β”œβ”€β”€ test.ipynb
76
+ └── README.md
77
  ```
78
 
79
  ---
80
 
81
+ ## Features
82
+
83
+ ### Stable Diffusion Core
84
+
85
+ * From-scratch implementation with modular architecture
86
+ * Custom CLIP encoder integration
87
+ * Latent space generation using VAE
88
+ * DDPM sampling process
89
+ * Self-attention mechanisms for denoising
90
 
91
+ ### CatVTON Capabilities
 
 
 
 
 
92
 
93
+ * Virtual try-on using inpainting
94
+ * Pose-aligned garment fitting
95
+ * Segmentation mask based garment overlay
 
96
 
97
  ---
98
 
99
+ ## Setup & Installation
100
 
101
  ### Prerequisites
 
 
 
102
 
103
+ * Python 3.10.9
104
+ * CUDA-compatible GPU
105
+ * Git, Conda or venv
106
+
107
+ ### Clone Repository
108
+
109
  ```bash
110
  git clone https://github.com/Harsh-Kesharwani/stable-diffusion.git
111
  cd stable-diffusion
112
+ git checkout CatVTON # for try-on features
113
  ```
114
 
115
+ ### Create Environment
116
+
117
  ```bash
118
  conda create -n stable-diffusion python=3.10.9
119
  conda activate stable-diffusion
120
  ```
121
 
122
+ ### Install Requirements
123
+
124
  ```bash
125
  pip install -r requirements.txt
126
  ```
127
 
128
+ ### Test Installation
129
+
130
  ```bash
131
+ python -c "import torch; print(torch.__version__)"
132
+ python -c "import torch; print(torch.cuda.is_available())"
133
  ```
134
 
135
  ---
136
 
137
+ ## Model Downloads
138
+
139
+ ### Tokenizer Files (from SD v1.4)
140
+
141
+ * `vocab.json`
142
+ * `merges.txt`
143
 
144
+ Download from: [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main/tokenizer)
145
 
146
+ ### Model Checkpoints
 
 
 
147
 
148
+ * `inkpunk-diffusion-v1.ckpt`: [Inkpunk Model](https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main)
149
+ * `sd-v1-5-inpainting.ckpt`: [Inpainting Weights](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting)
 
150
 
151
  ### Download Script
152
+
153
  ```bash
 
154
  mkdir -p data
155
+ wget -O data/vocab.json "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json"
156
+ wget -O data/merges.txt "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/merges.txt"
 
 
 
 
157
  ```
158
 
159
  ---
160
 
161
+ ## CatVTON Integration
162
+
163
+ The CatVTON extension allows realistic cloth try-on using Stable Diffusion inpainting.
164
+
165
+ ### Highlights
166
+
167
+ * `sd-v1-5-inpainting.ckpt` for image completion
168
+ * Garment alignment to human pose
169
+ * Agnostic segmentation mask usage
170
+
171
+ Run the interface:
172
+
173
  ```bash
174
  python interface.py
175
  ```
176
 
177
  ---
178
 
179
+ ## References
180
 
181
+ ### Articles & Guides
182
 
183
+ * [Stable Diffusion from Scratch (Medium)](https://medium.com/@sayedebad.777/implementing-stable-diffusion-from-scratch-using-pytorch-f07d50efcd97)
184
+ * [YouTube: Diffusion Implementation](https://www.youtube.com/watch?v=ZBKpAp_6TGI)
 
 
 
185
 
186
+ ### HuggingFace Resources
187
 
188
+ * [Stable Diffusion v1.5 Inpainting](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting)
189
+ * [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)
190
+ * [Inkpunk Diffusion](https://huggingface.co/Envvi/Inkpunk-Diffusion)
191
 
192
+ ### Papers
 
 
 
 
193
 
194
+ * Stable Diffusion: Latent Diffusion Models
195
+ * DDPM: Denoising Diffusion Probabilistic Models
196
+ * CatVTON: Category-aware Try-On Network
 
197
 
198
  ---
199
 
200
+ ## Author
201
 
202
  <div align="center">
203
 
204
  **Harsh Kesharwani**
205
 
206
+ [![GitHub](https://img.shields.io/badge/GitHub-100000?style=for-the-badge\&logo=github\&logoColor=white)](https://github.com/Harsh-Kesharwani)
207
+ [![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2?style=for-the-badge\&logo=linkedin\&logoColor=white)](https://www.linkedin.com/in/harsh-kesharwani/)
208
+ [![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge\&logo=gmail\&logoColor=white)](mailto:harshkesharwani777@gmail.com)
209
 
210
  *Passionate about AI, Computer Vision, and Generative Models*
211
 
 
213
 
214
  ---
215
 
216
+ ## License
 
 
 
 
 
 
 
 
 
 
 
 
 
217
 
218
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
219
 
220
  ---
221
 
222
+ ## Acknowledgments
223
 
224
+ * CompVis team for Stable Diffusion
225
+ * HuggingFace for models and APIs
226
+ * Zalando Research for dataset
227
+ * Open-source contributors and educators
228
 
229
  ---
230
 
231
  <div align="center">
232
 
233
+ **⭐ Star this repo if you found it helpful!**
234
 
235
  *Built with ❀️ by [Harsh Kesharwani](https://www.linkedin.com/in/harsh-kesharwani/)*
236
 
237
  </div>