shaw commited on
Commit
7f59441
1 Parent(s): 865a0e4

revert readme

Browse files
Files changed (1) hide show
  1. README.md +178 -3
README.md CHANGED
@@ -1,3 +1,178 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Stable-Dreamfusion
2
+
3
+ A pytorch implementation of the text-to-3D model **Dreamfusion**, powered by the [Stable Diffusion](https://github.com/CompVis/stable-diffusion) text-to-2D model.
4
+
5
+ The original paper's project page: [_DreamFusion: Text-to-3D using 2D Diffusion_](https://dreamfusion3d.github.io/).
6
+
7
+ Colab notebook for usage: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MXT3yfOFvO0ooKEfiUUvTKwUkrrlCHpF?usp=sharing)
8
+
9
+ Examples generated from text prompt `a high quality photo of a pineapple` viewed with the GUI in real time:
10
+
11
+ https://user-images.githubusercontent.com/25863658/194241493-f3e68f78-aefe-479e-a4a8-001424a61b37.mp4
12
+
13
+ ### [Gallery](https://github.com/ashawkey/stable-dreamfusion/issues/1) | [Update Logs](assets/update_logs.md)
14
+
15
+ # Important Notice
16
+ This project is a **work-in-progress**, and contains lots of differences from the paper. Also, many features are still not implemented now. **The current generation quality cannot match the results from the original paper, and many prompts still fail badly!**
17
+
18
+
19
+ ## Notable differences from the paper
20
+ * Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently, 10000 training steps take about 3 hours to train on a V100.
21
+ * We use the [multi-resolution grid encoder](https://github.com/NVlabs/instant-ngp/) to implement the NeRF backbone (implementation from [torch-ngp](https://github.com/ashawkey/torch-ngp)), which enables much faster rendering (~10FPS at 800x800).
22
+ * We use the Adam optimizer with a larger initial learning rate.
23
+
24
+
25
+ ## TODOs
26
+ * Alleviate the multi-face [Janus problem](https://twitter.com/poolio/status/1578045212236034048).
27
+ * Better mesh (improve the surface quality).
28
+
29
+ # Install
30
+
31
+ ```bash
32
+ git clone https://github.com/ashawkey/stable-dreamfusion.git
33
+ cd stable-dreamfusion
34
+ ```
35
+
36
+ **Important**: To download the Stable Diffusion model checkpoint, you should provide your [access token](https://huggingface.co/settings/tokens). You could choose either of the following ways:
37
+ * Run `huggingface-cli login` and enter your token.
38
+ * Create a file called `TOKEN` under this directory (i.e., `stable-dreamfusion/TOKEN`) and copy your token into it.
39
+
40
+ ### Install with pip
41
+ ```bash
42
+ pip install -r requirements.txt
43
+
44
+ # (optional) install nvdiffrast for exporting textured mesh (--save_mesh)
45
+ pip install git+https://github.com/NVlabs/nvdiffrast/
46
+
47
+ # (optional) install the tcnn backbone if using --tcnn
48
+ pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
49
+
50
+ # (optional) install CLIP guidance for the dreamfield setting
51
+ pip install git+https://github.com/openai/CLIP.git
52
+
53
+ ```
54
+
55
+ ### Build extension (optional)
56
+ By default, we use [`load`](https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime.
57
+ We also provide the `setup.py` to build each extension:
58
+ ```bash
59
+ # install all extension modules
60
+ bash scripts/install_ext.sh
61
+
62
+ # if you want to install manually, here is an example:
63
+ pip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)
64
+ ```
65
+
66
+ ### Tested environments
67
+ * Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.
68
+
69
+
70
+ # Usage
71
+
72
+ First time running will take some time to compile the CUDA extensions.
73
+
74
+ ```bash
75
+ ### stable-dreamfusion setting
76
+ ## train with text prompt (with the default settings)
77
+ # `-O` equals `--cuda_ray --fp16 --dir_text`
78
+ # `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.
79
+ # `--fp16` enables half-precision training.
80
+ # `--dir_text` enables view-dependent prompting.
81
+ python main.py --text "a hamburger" --workspace trial -O
82
+
83
+ # if the above command fails to generate things (learns an empty scene), maybe try:
84
+ # 1. disable random lambertian shading, simply use albedo as color:
85
+ python main.py --text "a hamburger" --workspace trial -O --albedo_iters 10000 # i.e., set --albedo_iters >= --iters, which is default to 10000
86
+ # 2. use a smaller density regularization weight:
87
+ python main.py --text "a hamburger" --workspace trial -O --lambda_entropy 1e-5
88
+
89
+ # you can also train in a GUI to visualize the training progress:
90
+ python main.py --text "a hamburger" --workspace trial -O --gui
91
+
92
+ # A Gradio GUI is also possible (with less options):
93
+ python gradio_app.py # open in web browser
94
+
95
+ ## after the training is finished:
96
+ # test (exporting 360 video)
97
+ python main.py --workspace trial -O --test
98
+ # also save a mesh (with obj, mtl, and png texture)
99
+ python main.py --workspace trial -O --test --save_mesh
100
+ # test with a GUI (free view control!)
101
+ python main.py --workspace trial -O --test --gui
102
+
103
+ ### dreamfields (CLIP) setting
104
+ python main.py --text "a hamburger" --workspace trial_clip -O --guidance clip
105
+ python main.py --text "a hamburger" --workspace trial_clip -O --test --gui --guidance clip
106
+ ```
107
+
108
+ # Code organization & Advanced tips
109
+
110
+ This is a simple description of the most important implementation details.
111
+ If you are interested in improving this repo, this might be a starting point.
112
+ Any contribution would be greatly appreciated!
113
+
114
+ * The SDS loss is located at `./nerf/sd.py > StableDiffusion > train_step`:
115
+ ```python
116
+ # 1. we need to interpolate the NeRF rendering to 512x512, to feed it to SD's VAE.
117
+ pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corners=False)
118
+ # 2. image (512x512) --- VAE --> latents (64x64), this is SD's difference from Imagen.
119
+ latents = self.encode_imgs(pred_rgb_512)
120
+ ... # timestep sampling, noise adding and UNet noise predicting
121
+ # 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
122
+ w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
123
+ grad = w * (noise_pred - noise)
124
+ latents.backward(gradient=grad, retain_graph=True)
125
+ ```
126
+ * Other regularizations are in `./nerf/utils.py > Trainer > train_step`.
127
+ * The generation seems quite sensitive to regularizations on weights_sum (alphas for each ray). The original opacity loss tends to make NeRF disappear (zero density everywhere), so we use an entropy loss to replace it for now (encourages alpha to be either 0 or 1).
128
+ * NeRF Rendering core function: `./nerf/renderer.py > NeRFRenderer > run_cuda`.
129
+ * the occupancy grid based training acceleration (instant-ngp like, enabled by `--cuda_ray`) may harm the generation progress, since once a grid cell is marked as empty, rays won't pass it later...
130
+ * Not using `--cuda_ray` also works now:
131
+ ```bash
132
+ # `-O2` equals `--fp16 --dir_text`
133
+ python main.py --text "a hamburger" --workspace trial -O2 # faster training, but slower rendering
134
+ ```
135
+ Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
136
+ More testing is needed...
137
+ * Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
138
+ * light direction: current implementation use a plane light source, instead of a point light source...
139
+ * View-dependent prompting: `./nerf/provider.py > get_view_direction`.
140
+ * ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?
141
+ * Network backbone (`./nerf/network*.py`) can be chosen by the `--backbone` option, but `tcnn` and `vanilla` are not well tested.
142
+ * Spatial density bias (gaussian density blob): `./nerf/network*.py > NeRFNetwork > gaussian`.
143
+
144
+ # Acknowledgement
145
+
146
+ * The amazing original work: [_DreamFusion: Text-to-3D using 2D Diffusion_](https://dreamfusion3d.github.io/).
147
+ ```
148
+ @article{poole2022dreamfusion,
149
+ author = {Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben},
150
+ title = {DreamFusion: Text-to-3D using 2D Diffusion},
151
+ journal = {arXiv},
152
+ year = {2022},
153
+ }
154
+ ```
155
+
156
+ * Huge thanks to the [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and the [diffusers](https://github.com/huggingface/diffusers) library.
157
+
158
+ ```
159
+ @misc{rombach2021highresolution,
160
+ title={High-Resolution Image Synthesis with Latent Diffusion Models},
161
+ author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
162
+ year={2021},
163
+ eprint={2112.10752},
164
+ archivePrefix={arXiv},
165
+ primaryClass={cs.CV}
166
+ }
167
+
168
+ @misc{von-platen-etal-2022-diffusers,
169
+ author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
170
+ title = {Diffusers: State-of-the-art diffusion models},
171
+ year = {2022},
172
+ publisher = {GitHub},
173
+ journal = {GitHub repository},
174
+ howpublished = {\url{https://github.com/huggingface/diffusers}}
175
+ }
176
+ ```
177
+
178
+ * The GUI is developed with [DearPyGui](https://github.com/hoffstadt/DearPyGui).