AlexKM commited on
Commit
8db961d
1 Parent(s): f82518a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +327 -0
README.md ADDED
@@ -0,0 +1,327 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VQGAN-CLIP Overview
2
+
3
+ A repo for running VQGAN+CLIP locally. This started out as a Katherine Crowson VQGAN+CLIP derived Google colab notebook.
4
+
5
+ <a href="https://replicate.ai/nerdyrodent/vqgan-clip"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=blue"></a>
6
+
7
+ Original notebook: [![Open In Colab][colab-badge]][colab-notebook]
8
+
9
+ [colab-notebook]: <https://colab.research.google.com/drive/1ZAus_gn2RhTZWzOWUpPERNC0Q8OhZRTZ>
10
+ [colab-badge]: <https://colab.research.google.com/assets/colab-badge.svg>
11
+
12
+ Some example images:
13
+
14
+ <img src="./samples/Cartoon3.png" width="256px"></img><img src="./samples/Cartoon.png" width="256px"></img><img src="./samples/Cartoon2.png" width="256px"></img>
15
+ <img src="./samples/Bedroom.png" width="256px"></img><img src="./samples/DemonBiscuits.png" width="256px"></img><img src="./samples/Football.png" width="256px"></img>
16
+ <img src="./samples/Fractal_Landscape3.png" width="256px"></img><img src="./samples/Games_5.png" width="256px"></img>
17
+
18
+ Environment:
19
+
20
+ * Tested on Ubuntu 20.04
21
+ * GPU: Nvidia RTX 3090
22
+ * Typical VRAM requirements:
23
+ * 24 GB for a 900x900 image
24
+ * 10 GB for a 512x512 image
25
+ * 8 GB for a 380x380 image
26
+
27
+ You may also be interested in [CLIP Guided Diffusion](https://github.com/nerdyrodent/CLIP-Guided-Diffusion)
28
+
29
+ ## Set up
30
+
31
+ This example uses [Anaconda](https://www.anaconda.com/products/individual#Downloads) to manage virtual Python environments.
32
+
33
+ Create a new virtual Python environment for VQGAN-CLIP:
34
+
35
+ ```sh
36
+ conda create --name vqgan python=3.9
37
+ conda activate vqgan
38
+ ```
39
+
40
+ Install Pytorch in the new enviroment:
41
+
42
+ Note: This installs the CUDA version of Pytorch, if you want to use an AMD graphics card, read the [AMD section below](#using-an-amd-graphics-card).
43
+
44
+ ```sh
45
+ pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
46
+ ```
47
+
48
+ Install other required Python packages:
49
+
50
+ ```sh
51
+ pip install ftfy regex tqdm omegaconf pytorch-lightning IPython kornia imageio imageio-ffmpeg einops torch_optimizer
52
+ ```
53
+
54
+ Or use the ```requirements.txt``` file, which includes version numbers.
55
+
56
+ Clone required repositories:
57
+
58
+ ```sh
59
+ git clone 'https://github.com/nerdyrodent/VQGAN-CLIP'
60
+ cd VQGAN-CLIP
61
+ git clone 'https://github.com/openai/CLIP'
62
+ git clone 'https://github.com/CompVis/taming-transformers'
63
+ ```
64
+
65
+ Note: In my development environment both CLIP and taming-transformers are present in the local directory, and so aren't present in the `requirements.txt` or `vqgan.yml` files.
66
+
67
+ As an alternative, you can also pip install taming-transformers and CLIP.
68
+
69
+ You will also need at least 1 VQGAN pretrained model. E.g.
70
+
71
+ ```sh
72
+ mkdir checkpoints
73
+
74
+ curl -L -o checkpoints/vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' #ImageNet 16384
75
+ curl -L -o checkpoints/vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1' #ImageNet 16384
76
+ ```
77
+ Note that users of ```curl``` on Microsoft Windows should use double quotes.
78
+
79
+ The `download_models.sh` script is an optional way to download a number of models. By default, it will download just 1 model.
80
+
81
+ See <https://github.com/CompVis/taming-transformers#overview-of-pretrained-models> for more information about VQGAN pre-trained models, including download links.
82
+
83
+ By default, the model .yaml and .ckpt files are expected in the `checkpoints` directory.
84
+ See <https://github.com/CompVis/taming-transformers> for more information on datasets and models.
85
+
86
+ Video guides are also available:
87
+ * Linux - https://www.youtube.com/watch?v=1Esb-ZjO7tw
88
+ * Windows - https://www.youtube.com/watch?v=XH7ZP0__FXs
89
+
90
+ ### Using an AMD graphics card
91
+
92
+ Note: This hasn't been tested yet.
93
+
94
+ ROCm can be used for AMD graphics cards instead of CUDA. You can check if your card is supported here:
95
+ <https://github.com/RadeonOpenCompute/ROCm#supported-gpus>
96
+
97
+ Install ROCm accordng to the instructions and don't forget to add the user to the video group:
98
+ <https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html>
99
+
100
+ The usage and set up instructions above are the same, except for the line where you install Pytorch.
101
+ Instead of `pip install torch==1.9.0+cu111 ...`, use the one or two lines which are displayed here (select Pip -> Python-> ROCm):
102
+ <https://pytorch.org/get-started/locally/>
103
+
104
+ ### Using the CPU
105
+
106
+ If no graphics card can be found, the CPU is automatically used and a warning displayed.
107
+
108
+ Regardless of an available graphics card, the CPU can also be used by adding this command line argument: `-cd cpu`
109
+
110
+ This works with the CUDA version of Pytorch, even without CUDA drivers installed, but doesn't seem to work with ROCm as of now.
111
+
112
+ ### Uninstalling
113
+
114
+ Remove the Python enviroment:
115
+
116
+ ```sh
117
+ conda remove --name vqgan --all
118
+ ```
119
+
120
+ and delete the `VQGAN-CLIP` directory.
121
+
122
+ ## Run
123
+
124
+ To generate images from text, specify your text prompt as shown in the example below:
125
+
126
+ ```sh
127
+ python generate.py -p "A painting of an apple in a fruit bowl"
128
+ ```
129
+
130
+ <img src="./samples/A_painting_of_an_apple_in_a_fruitbowl.png" width="256px"></img>
131
+
132
+ ## Multiple prompts
133
+
134
+ Text and image prompts can be split using the pipe symbol in order to allow multiple prompts.
135
+ You can also use a colon followed by a number to set a weight for that prompt. For example:
136
+
137
+ ```sh
138
+ python generate.py -p "A painting of an apple in a fruit bowl | psychedelic | surreal:0.5 | weird:0.25"
139
+ ```
140
+
141
+ <img src="./samples/Apple_weird.png" width="256px"></img>
142
+
143
+ Image prompts can be split in the same way. For example:
144
+
145
+ ```sh
146
+ python generate.py -p "A picture of a bedroom with a portrait of Van Gogh" -ip "samples/VanGogh.jpg | samples/Bedroom.png"
147
+ ```
148
+
149
+ ### Story mode
150
+
151
+ Sets of text prompts can be created using the caret symbol, in order to generate a sort of story mode. For example:
152
+
153
+ ```sh
154
+ python generate.py -p "A painting of a sunflower|photo:-1 ^ a painting of a rose ^ a painting of a tulip ^ a painting of a daisy flower ^ a photograph of daffodil" -cpe 1500 -zvid -i 6000 -zse 10 -vl 20 -zsc 1.005 -opt Adagrad -lr 0.15 -se 6000
155
+ ```
156
+
157
+
158
+ ## "Style Transfer"
159
+
160
+ An input image with style text and a low number of iterations can be used create a sort of "style transfer" effect. For example:
161
+
162
+ ```sh
163
+ python generate.py -p "A painting in the style of Picasso" -ii samples/VanGogh.jpg -i 80 -se 10 -opt AdamW -lr 0.25
164
+ ```
165
+
166
+ | Output | Style |
167
+ | ------------------------------------------------------------- | ----------- |
168
+ | <img src="./samples/vvg_picasso.png" width="256px"></img> | Picasso |
169
+ | <img src="./samples/vvg_sketch.png" width="256px"></img> | Sketch |
170
+ | <img src="./samples/vvg_psychedelic.png" width="256px"></img> | Psychedelic |
171
+
172
+ A video style transfer effect can be achived by specifying a directory of video frames in `video_style_dir`. Output will be saved in the steps directory, using the original video frame filenames. You can also use this as a sort of "batch mode" if you have a directory of images you want to apply a style to. This can also be combined with Story Mode if you don't wish to apply the same style to every images, but instead roll through a list of styles.
173
+
174
+ ## Feedback example
175
+
176
+ By feeding back the generated images and making slight changes, some interesting effects can be created.
177
+
178
+ The example `zoom.sh` shows this by applying a zoom and rotate to generated images, before feeding them back in again.
179
+ To use `zoom.sh`, specifying a text prompt, output filename and number of frames. E.g.
180
+
181
+ ```sh
182
+ ./zoom.sh "A painting of a red telephone box spinning through a time vortex" Telephone.png 150
183
+ ```
184
+ If you don't have ImageMagick installed, you can install it with ```sudo apt install imagemagick```
185
+
186
+ <img src="./samples/zoom.gif" width="256px"></img>
187
+
188
+ There is also a simple zoom video creation option available. For example:
189
+ ```sh
190
+ python generate.py -p "The inside of a sphere" -zvid -i 4500 -zse 20 -vl 10 -zsc 0.97 -opt Adagrad -lr 0.15 -se 4500
191
+ ```
192
+
193
+ ## Random text example
194
+
195
+ Use `random.sh` to make a batch of images from random text. Edit the text and number of generated images to your taste!
196
+
197
+ ```sh
198
+ ./random.sh
199
+ ```
200
+
201
+ ## Advanced options
202
+
203
+ To view the available options, use "-h".
204
+
205
+ ```sh
206
+ python generate.py -h
207
+ ```
208
+
209
+ ```sh
210
+ usage: generate.py [-h] [-p PROMPTS] [-ip IMAGE_PROMPTS] [-i MAX_ITERATIONS] [-se DISPLAY_FREQ]
211
+ [-s SIZE SIZE] [-ii INIT_IMAGE] [-in INIT_NOISE] [-iw INIT_WEIGHT] [-m CLIP_MODEL]
212
+ [-conf VQGAN_CONFIG] [-ckpt VQGAN_CHECKPOINT] [-nps [NOISE_PROMPT_SEEDS ...]]
213
+ [-npw [NOISE_PROMPT_WEIGHTS ...]] [-lr STEP_SIZE] [-cuts CUTN] [-cutp CUT_POW] [-sd SEED]
214
+ [-opt {Adam,AdamW,Adagrad,Adamax,DiffGrad,AdamP,RAdam,RMSprop}] [-o OUTPUT] [-vid] [-zvid]
215
+ [-zs ZOOM_START] [-zse ZOOM_FREQUENCY] [-zsc ZOOM_SCALE] [-cpe PROMPT_FREQUENCY]
216
+ [-vl VIDEO_LENGTH] [-ofps OUTPUT_VIDEO_FPS] [-ifps INPUT_VIDEO_FPS] [-d]
217
+ [-aug {Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} [{Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} ...]]
218
+ [-cd CUDA_DEVICE]
219
+ ```
220
+
221
+ ```sh
222
+ optional arguments:
223
+ -h, --help show this help message and exit
224
+ -p PROMPTS, --prompts PROMPTS
225
+ Text prompts
226
+ -ip IMAGE_PROMPTS, --image_prompts IMAGE_PROMPTS
227
+ Image prompts / target image
228
+ -i MAX_ITERATIONS, --iterations MAX_ITERATIONS
229
+ Number of iterations
230
+ -se DISPLAY_FREQ, --save_every DISPLAY_FREQ
231
+ Save image iterations
232
+ -s SIZE SIZE, --size SIZE SIZE
233
+ Image size (width height) (default: [512, 512])
234
+ -ii INIT_IMAGE, --init_image INIT_IMAGE
235
+ Initial image
236
+ -in INIT_NOISE, --init_noise INIT_NOISE
237
+ Initial noise image (pixels or gradient)
238
+ -iw INIT_WEIGHT, --init_weight INIT_WEIGHT
239
+ Initial weight
240
+ -m CLIP_MODEL, --clip_model CLIP_MODEL
241
+ CLIP model (e.g. ViT-B/32, ViT-B/16)
242
+ -conf VQGAN_CONFIG, --vqgan_config VQGAN_CONFIG
243
+ VQGAN config
244
+ -ckpt VQGAN_CHECKPOINT, --vqgan_checkpoint VQGAN_CHECKPOINT
245
+ VQGAN checkpoint
246
+ -nps [NOISE_PROMPT_SEEDS ...], --noise_prompt_seeds [NOISE_PROMPT_SEEDS ...]
247
+ Noise prompt seeds
248
+ -npw [NOISE_PROMPT_WEIGHTS ...], --noise_prompt_weights [NOISE_PROMPT_WEIGHTS ...]
249
+ Noise prompt weights
250
+ -lr STEP_SIZE, --learning_rate STEP_SIZE
251
+ Learning rate
252
+ -cuts CUTN, --num_cuts CUTN
253
+ Number of cuts
254
+ -cutp CUT_POW, --cut_power CUT_POW
255
+ Cut power
256
+ -sd SEED, --seed SEED
257
+ Seed
258
+ -opt, --optimiser {Adam,AdamW,Adagrad,Adamax,DiffGrad,AdamP,RAdam,RMSprop}
259
+ Optimiser
260
+ -o OUTPUT, --output OUTPUT
261
+ Output file
262
+ -vid, --video Create video frames?
263
+ -zvid, --zoom_video Create zoom video?
264
+ -zs ZOOM_START, --zoom_start ZOOM_START
265
+ Zoom start iteration
266
+ -zse ZOOM_FREQUENCY, --zoom_save_every ZOOM_FREQUENCY
267
+ Save zoom image iterations
268
+ -zsc ZOOM_SCALE, --zoom_scale ZOOM_SCALE
269
+ Zoom scale
270
+ -cpe PROMPT_FREQUENCY, --change_prompt_every PROMPT_FREQUENCY
271
+ Prompt change frequency
272
+ -vl VIDEO_LENGTH, --video_length VIDEO_LENGTH
273
+ Video length in seconds
274
+ -ofps OUTPUT_VIDEO_FPS, --output_video_fps OUTPUT_VIDEO_FPS
275
+ Create an interpolated video (Nvidia GPU only) with this fps (min 10. best set to 30 or 60)
276
+ -ifps INPUT_VIDEO_FPS, --input_video_fps INPUT_VIDEO_FPS
277
+ When creating an interpolated video, use this as the input fps to interpolate from (>0 & <ofps)
278
+ -d, --deterministic Enable cudnn.deterministic?
279
+ -aug, --augments {Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} [{Ji,Sh,Gn,Pe,Ro,Af,Et,Ts,Cr,Er,Re} ...]
280
+ Enabled augments
281
+ -cd CUDA_DEVICE, --cuda_device CUDA_DEVICE
282
+ Cuda device to use
283
+ ```
284
+
285
+ ## Troubleshooting
286
+
287
+ ### CUSOLVER_STATUS_INTERNAL_ERROR
288
+
289
+ For example:
290
+
291
+ `RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)`
292
+
293
+ Make sure you have specified the correct size for the image.
294
+
295
+ ### RuntimeError: CUDA out of memory
296
+
297
+ For example:
298
+
299
+ `RuntimeError: CUDA out of memory. Tried to allocate 150.00 MiB (GPU 0; 23.70 GiB total capacity; 21.31 GiB already allocated; 78.56 MiB free; 21.70 GiB reserved in total by PyTorch)`
300
+
301
+ Your request doesn't fit into your GPU's VRAM. Reduce the image size and/or number of cuts.
302
+
303
+
304
+ ## Citations
305
+
306
+ ```bibtex
307
+ @misc{unpublished2021clip,
308
+ title = {CLIP: Connecting Text and Images},
309
+ author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
310
+ year = {2021}
311
+ }
312
+ ```
313
+
314
+ ```bibtex
315
+ @misc{esser2020taming,
316
+ title={Taming Transformers for High-Resolution Image Synthesis},
317
+ author={Patrick Esser and Robin Rombach and Björn Ommer},
318
+ year={2020},
319
+ eprint={2012.09841},
320
+ archivePrefix={arXiv},
321
+ primaryClass={cs.CV}
322
+ }
323
+ ```
324
+
325
+ Katherine Crowson - <https://github.com/crowsonkb>
326
+
327
+ Public Domain images from Open Access Images at the Art Institute of Chicago - <https://www.artic.edu/open-access/open-access-images>