shaw commited on
Commit
0b18231
2 Parent(s): ea3ddc8 47d0083

Merge branch 'ashawkey:main' into main

Browse files
assets/update_logs.md CHANGED
@@ -1,3 +1,7 @@
 
 
 
 
1
  ### 2022.10.5
2
  * Basic reproduction finished.
3
  * Non --cuda_ray, --tcnn are not working, need to fix.
 
1
+ ### 2022.10.9
2
+ * The shading (partially) starts to work, at least it won't make scene empty. For some prompts, it shows better results (less severe Janus problem). The textureless rendering mode is still disabled.
3
+ * Enable shading by default (--albedo_iters 1000).
4
+
5
  ### 2022.10.5
6
  * Basic reproduction finished.
7
  * Non --cuda_ray, --tcnn are not working, need to fix.
docker/Dockerfile ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
2
+
3
+ # Remove any third-party apt sources to avoid issues with expiring keys.
4
+ RUN rm -f /etc/apt/sources.list.d/*.list
5
+
6
+ RUN apt-get update
7
+
8
+ RUN DEBIAN_FRONTEND=noninteractive TZ=Europe/MADRID apt-get install -y tzdata
9
+
10
+ # Install some basic utilities
11
+ RUN apt-get install -y \
12
+ curl \
13
+ ca-certificates \
14
+ sudo \
15
+ git \
16
+ bzip2 \
17
+ libx11-6 \
18
+ python3 \
19
+ python3-pip \
20
+ libglfw3-dev \
21
+ libgles2-mesa-dev \
22
+ libglib2.0-0 \
23
+ && rm -rf /var/lib/apt/lists/*
24
+
25
+
26
+ # Create a working directory
27
+ RUN mkdir /app
28
+ WORKDIR /app
29
+
30
+ RUN cd /app
31
+ RUN git clone https://github.com/ashawkey/stable-dreamfusion.git
32
+
33
+
34
+ RUN pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
35
+
36
+ WORKDIR /app/stable-dreamfusion
37
+
38
+ RUN pip3 install -r requirements.txt
39
+ RUN pip3 install git+https://github.com/NVlabs/nvdiffrast/
40
+
41
+ # Needs nvidia runtime, if you have "No CUDA runtime is found" error: https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime, first answer
42
+ RUN pip3 install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
43
+
44
+ RUN pip3 install git+https://github.com/openai/CLIP.git
45
+ RUN bash scripts/install_ext.sh
46
+
47
+
48
+
49
+
50
+
51
+ # Set the default command to python3
52
+ #CMD ["python3"]
53
+
docker/README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Docker installation
2
+
3
+ ## Build image
4
+ To build the docker image on your own machine, which may take 15-30 mins:
5
+ ```
6
+ docker build -t stable-dreamfusion:latest .
7
+ ```
8
+
9
+ If you have the error **No CUDA runtime is found** when building the wheels for tiny-cuda-nn you need to setup the nvidia-runtime for docker.
10
+ ```
11
+ sudo apt-get install nvidia-container-runtime
12
+ ```
13
+ Then edit `/etc/docker/daemon.json` and add the default-runtime:
14
+ ```
15
+ {
16
+ "runtimes": {
17
+ "nvidia": {
18
+ "path": "nvidia-container-runtime",
19
+ "runtimeArgs": []
20
+ }
21
+ },
22
+ "default-runtime": "nvidia"
23
+ }
24
+ ```
25
+ And restart docker:
26
+ ```
27
+ sudo systemctl restart docker
28
+ ```
29
+ Now you can build tiny-cuda-nn inside docker.
30
+
31
+ ## Download image
32
+ To download the image (~6GB) instead:
33
+ ```
34
+ docker pull supercabb/stable-dreamfusion:3080_0.0.1
35
+ docker tag supercabb/stable-dreamfusion:3080_0.0.1 stable-dreamfusion
36
+ ```
37
+
38
+ ## Use image
39
+
40
+ You can launch an interactive shell inside the container:
41
+
42
+ ```
43
+ docker run --gpus all -it --rm -v $(cd ~ && pwd):/mnt stable-dreamfusion /bin/bash
44
+ ```
45
+ From this shell, all the code in the repo should work.
46
+
47
+ To run any single command `<command...>` inside the docker container:
48
+ ```
49
+ docker run --gpus all -it --rm -v $(cd ~ && pwd):/mnt stable-dreamfusion /bin/bash -c "<command...>"
50
+ ```
51
+ To train:
52
+ ```
53
+ export TOKEN="#HUGGING FACE ACCESS TOKEN#"
54
+ docker run --gpus all -it --rm -v $(cd ~ && pwd):/mnt stable-dreamfusion /bin/bash -c "echo ${TOKEN} > TOKEN \
55
+ && python3 main.py --text \"a hamburger\" --workspace trial -O"
56
+
57
+ ```
58
+ Run test without gui:
59
+ ```
60
+ export PATH_TO_WORKSPACE="#PATH_TO_WORKSPACE#"
61
+ docker run --gpus all -it --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix:ro -v $(cd ~ && pwd):/mnt \
62
+ -v $(cd ${PATH_TO_WORKSPACE} && pwd):/app/stable-dreamfusion/trial stable-dreamfusion /bin/bash -c "python3 \
63
+ main.py --workspace trial -O --test"
64
+ ```
65
+ Run test with gui:
66
+ ```
67
+ export PATH_TO_WORKSPACE="#PATH_TO_WORKSPACE#"
68
+ xhost +
69
+ docker run --gpus all -it --rm -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix:ro -v $(cd ~ && pwd):/mnt \
70
+ -v $(cd ${PATH_TO_WORKSPACE} && pwd):/app/stable-dreamfusion/trial stable-dreamfusion /bin/bash -c "python3 \
71
+ main.py --workspace trial -O --test --gui"
72
+ xhost -
73
+ ```
74
+
75
+
76
+
77
+
78
+
79
+
80
+
gradio_app.py ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import argparse
3
+
4
+ from nerf.provider import NeRFDataset
5
+ from nerf.utils import *
6
+
7
+ import gradio as gr
8
+ import gc
9
+
10
+ print(f'[INFO] loading options..')
11
+
12
+ # fake config object, this should not be used in CMD, only allow change from gradio UI.
13
+ parser = argparse.ArgumentParser()
14
+ parser.add_argument('--text', default=None, help="text prompt")
15
+ # parser.add_argument('-O', action='store_true', help="equals --fp16 --cuda_ray --dir_text")
16
+ # parser.add_argument('-O2', action='store_true', help="equals --fp16 --dir_text")
17
+ parser.add_argument('--test', action='store_true', help="test mode")
18
+ parser.add_argument('--save_mesh', action='store_true', help="export an obj mesh with texture")
19
+ parser.add_argument('--eval_interval', type=int, default=10, help="evaluate on the valid set every interval epochs")
20
+ parser.add_argument('--workspace', type=str, default='trial_gradio')
21
+ parser.add_argument('--guidance', type=str, default='stable-diffusion', help='choose from [stable-diffusion, clip]')
22
+ parser.add_argument('--seed', type=int, default=0)
23
+
24
+ ### training options
25
+ parser.add_argument('--iters', type=int, default=10000, help="training iters")
26
+ parser.add_argument('--lr', type=float, default=1e-3, help="initial learning rate")
27
+ parser.add_argument('--ckpt', type=str, default='latest')
28
+ parser.add_argument('--cuda_ray', action='store_true', help="use CUDA raymarching instead of pytorch")
29
+ parser.add_argument('--max_steps', type=int, default=1024, help="max num steps sampled per ray (only valid when using --cuda_ray)")
30
+ parser.add_argument('--num_steps', type=int, default=64, help="num steps sampled per ray (only valid when not using --cuda_ray)")
31
+ parser.add_argument('--upsample_steps', type=int, default=64, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
32
+ parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
33
+ parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
34
+ parser.add_argument('--albedo_iters', type=int, default=1000, help="training iters that only use albedo shading")
35
+ # model options
36
+ parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
37
+ parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
38
+ # network backbone
39
+ parser.add_argument('--fp16', action='store_true', help="use amp mixed precision training")
40
+ parser.add_argument('--backbone', type=str, default='grid', help="nerf backbone, choose from [grid, tcnn, vanilla]")
41
+ # rendering resolution in training, decrease this if CUDA OOM.
42
+ parser.add_argument('--w', type=int, default=64, help="render width for NeRF in training")
43
+ parser.add_argument('--h', type=int, default=64, help="render height for NeRF in training")
44
+ parser.add_argument('--jitter_pose', action='store_true', help="add jitters to the randomly sampled camera poses")
45
+
46
+ ### dataset options
47
+ parser.add_argument('--bound', type=float, default=1, help="assume the scene is bounded in box(-bound, bound)")
48
+ parser.add_argument('--dt_gamma', type=float, default=0, help="dt_gamma (>=0) for adaptive ray marching. set to 0 to disable, >0 to accelerate rendering (but usually with worse quality)")
49
+ parser.add_argument('--min_near', type=float, default=0.1, help="minimum near distance for camera")
50
+ parser.add_argument('--radius_range', type=float, nargs='*', default=[1.0, 1.5], help="training camera radius range")
51
+ parser.add_argument('--fovy_range', type=float, nargs='*', default=[40, 70], help="training camera fovy range")
52
+ parser.add_argument('--dir_text', action='store_true', help="direction-encode the text prompt, by appending front/side/back/overhead view")
53
+ parser.add_argument('--angle_overhead', type=float, default=30, help="[0, angle_overhead] is the overhead region")
54
+ parser.add_argument('--angle_front', type=float, default=60, help="[0, angle_front] is the front region, [180, 180+angle_front] the back region, otherwise the side region.")
55
+
56
+ parser.add_argument('--lambda_entropy', type=float, default=1e-4, help="loss scale for alpha entropy")
57
+ parser.add_argument('--lambda_opacity', type=float, default=0, help="loss scale for alpha value")
58
+ parser.add_argument('--lambda_orient', type=float, default=1e-2, help="loss scale for orientation")
59
+
60
+ ### GUI options
61
+ parser.add_argument('--gui', action='store_true', help="start a GUI")
62
+ parser.add_argument('--W', type=int, default=800, help="GUI width")
63
+ parser.add_argument('--H', type=int, default=800, help="GUI height")
64
+ parser.add_argument('--radius', type=float, default=3, help="default GUI camera radius from center")
65
+ parser.add_argument('--fovy', type=float, default=60, help="default GUI camera fovy")
66
+ parser.add_argument('--light_theta', type=float, default=60, help="default GUI light direction in [0, 180], corresponding to elevation [90, -90]")
67
+ parser.add_argument('--light_phi', type=float, default=0, help="default GUI light direction in [0, 360), azimuth")
68
+ parser.add_argument('--max_spp', type=int, default=1, help="GUI rendering max sample per pixel")
69
+
70
+ opt = parser.parse_args()
71
+
72
+ # default to use -O !!!
73
+ opt.fp16 = True
74
+ opt.dir_text = True
75
+ opt.cuda_ray = True
76
+ # opt.lambda_entropy = 1e-4
77
+ # opt.lambda_opacity = 0
78
+
79
+ if opt.backbone == 'vanilla':
80
+ from nerf.network import NeRFNetwork
81
+ elif opt.backbone == 'tcnn':
82
+ from nerf.network_tcnn import NeRFNetwork
83
+ elif opt.backbone == 'grid':
84
+ from nerf.network_grid import NeRFNetwork
85
+ else:
86
+ raise NotImplementedError(f'--backbone {opt.backbone} is not implemented!')
87
+
88
+ print(opt)
89
+
90
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
91
+
92
+ print(f'[INFO] loading models..')
93
+
94
+ if opt.guidance == 'stable-diffusion':
95
+ from nerf.sd import StableDiffusion
96
+ guidance = StableDiffusion(device)
97
+ elif opt.guidance == 'clip':
98
+ from nerf.clip import CLIP
99
+ guidance = CLIP(device)
100
+ else:
101
+ raise NotImplementedError(f'--guidance {opt.guidance} is not implemented.')
102
+
103
+ train_loader = NeRFDataset(opt, device=device, type='train', H=opt.h, W=opt.w, size=100).dataloader()
104
+ valid_loader = NeRFDataset(opt, device=device, type='val', H=opt.H, W=opt.W, size=5).dataloader()
105
+ test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W, size=100).dataloader()
106
+
107
+ print(f'[INFO] everything loaded!')
108
+
109
+ trainer = None
110
+ model = None
111
+
112
+ # define UI
113
+
114
+ with gr.Blocks(css=".gradio-container {max-width: 512px; margin: auto;}") as demo:
115
+
116
+ # title
117
+ gr.Markdown('[Stable-DreamFusion](https://github.com/ashawkey/stable-dreamfusion) Text-to-3D Example')
118
+
119
+ # inputs
120
+ prompt = gr.Textbox(label="Prompt", max_lines=1, value="a DSLR photo of a koi fish")
121
+ iters = gr.Slider(label="Iters", minimum=1000, maximum=20000, value=5000, step=100)
122
+ seed = gr.Slider(label="Seed", minimum=0, maximum=2147483647, step=1, randomize=True)
123
+ button = gr.Button('Generate')
124
+
125
+ # outputs
126
+ image = gr.Image(label="image", visible=True)
127
+ video = gr.Video(label="video", visible=False)
128
+ logs = gr.Textbox(label="logging")
129
+
130
+ # gradio main func
131
+ def submit(text, iters, seed):
132
+
133
+ global trainer, model
134
+
135
+ # seed
136
+ opt.seed = seed
137
+ opt.text = text
138
+ opt.iters = iters
139
+
140
+ seed_everything(seed)
141
+
142
+ # clean up
143
+ if trainer is not None:
144
+ del model
145
+ del trainer
146
+ gc.collect()
147
+ torch.cuda.empty_cache()
148
+ print('[INFO] clean up!')
149
+
150
+ # simply reload everything...
151
+ model = NeRFNetwork(opt)
152
+ optimizer = lambda model: torch.optim.Adam(model.get_params(opt.lr), betas=(0.9, 0.99), eps=1e-15)
153
+ scheduler = lambda optimizer: optim.lr_scheduler.LambdaLR(optimizer, lambda iter: 0.1 ** min(iter / opt.iters, 1))
154
+
155
+ trainer = Trainer('df', opt, model, guidance, device=device, workspace=opt.workspace, optimizer=optimizer, ema_decay=0.95, fp16=opt.fp16, lr_scheduler=scheduler, use_checkpoint=opt.ckpt, eval_interval=opt.eval_interval, scheduler_update_every_step=True)
156
+
157
+ # train (every ep only contain 8 steps, so we can get some vis every ~10s)
158
+ STEPS = 8
159
+ max_epochs = np.ceil(opt.iters / STEPS).astype(np.int32)
160
+
161
+ # we have to get the explicit training loop out here to yield progressive results...
162
+ loader = iter(valid_loader)
163
+
164
+ start_t = time.time()
165
+
166
+ for epoch in range(max_epochs):
167
+
168
+ trainer.train_gui(train_loader, step=STEPS)
169
+
170
+ # manual test and get intermediate results
171
+ try:
172
+ data = next(loader)
173
+ except StopIteration:
174
+ loader = iter(valid_loader)
175
+ data = next(loader)
176
+
177
+ trainer.model.eval()
178
+
179
+ if trainer.ema is not None:
180
+ trainer.ema.store()
181
+ trainer.ema.copy_to()
182
+
183
+ with torch.no_grad():
184
+ with torch.cuda.amp.autocast(enabled=trainer.fp16):
185
+ preds, preds_depth = trainer.test_step(data, perturb=False)
186
+
187
+ if trainer.ema is not None:
188
+ trainer.ema.restore()
189
+
190
+ pred = preds[0].detach().cpu().numpy()
191
+ # pred_depth = preds_depth[0].detach().cpu().numpy()
192
+
193
+ pred = (pred * 255).astype(np.uint8)
194
+
195
+ yield {
196
+ image: gr.update(value=pred, visible=True),
197
+ video: gr.update(visible=False),
198
+ logs: f"training iters: {epoch * STEPS} / {iters}, lr: {trainer.optimizer.param_groups[0]['lr']:.6f}",
199
+ }
200
+
201
+
202
+ # test
203
+ trainer.test(test_loader)
204
+
205
+ results = glob.glob(os.path.join(opt.workspace, 'results', '*rgb*.mp4'))
206
+ assert results is not None, "cannot retrieve results!"
207
+ results.sort(key=lambda x: os.path.getmtime(x)) # sort by mtime
208
+
209
+ end_t = time.time()
210
+
211
+ yield {
212
+ image: gr.update(visible=False),
213
+ video: gr.update(value=results[-1], visible=True),
214
+ logs: f"Generation Finished in {(end_t - start_t)/ 60:.4f} minutes!",
215
+ }
216
+
217
+
218
+ button.click(
219
+ submit,
220
+ [prompt, iters, seed],
221
+ [image, video, logs]
222
+ )
223
+
224
+ # concurrency_count: only allow ONE running progress, else GPU will OOM.
225
+ demo.queue(concurrency_count=1)
226
+
227
+ demo.launch()
main.py CHANGED
@@ -23,16 +23,16 @@ if __name__ == '__main__':
23
  parser.add_argument('--seed', type=int, default=0)
24
 
25
  ### training options
26
- parser.add_argument('--iters', type=int, default=15000, help="training iters")
27
  parser.add_argument('--lr', type=float, default=1e-3, help="initial learning rate")
28
  parser.add_argument('--ckpt', type=str, default='latest')
29
  parser.add_argument('--cuda_ray', action='store_true', help="use CUDA raymarching instead of pytorch")
30
  parser.add_argument('--max_steps', type=int, default=1024, help="max num steps sampled per ray (only valid when using --cuda_ray)")
31
- parser.add_argument('--num_steps', type=int, default=128, help="num steps sampled per ray (only valid when not using --cuda_ray)")
32
- parser.add_argument('--upsample_steps', type=int, default=0, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
33
  parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
34
  parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
35
- parser.add_argument('--albedo_iters', type=int, default=15000, help="training iters that only use albedo shading")
36
  # model options
37
  parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
38
  parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
@@ -40,8 +40,9 @@ if __name__ == '__main__':
40
  parser.add_argument('--fp16', action='store_true', help="use amp mixed precision training")
41
  parser.add_argument('--backbone', type=str, default='grid', help="nerf backbone, choose from [grid, tcnn, vanilla]")
42
  # rendering resolution in training, decrease this if CUDA OOM.
43
- parser.add_argument('--w', type=int, default=128, help="render width for NeRF in training")
44
- parser.add_argument('--h', type=int, default=128, help="render height for NeRF in training")
 
45
 
46
  ### dataset options
47
  parser.add_argument('--bound', type=float, default=1, help="assume the scene is bounded in box(-bound, bound)")
@@ -51,9 +52,10 @@ if __name__ == '__main__':
51
  parser.add_argument('--fovy_range', type=float, nargs='*', default=[40, 70], help="training camera fovy range")
52
  parser.add_argument('--dir_text', action='store_true', help="direction-encode the text prompt, by appending front/side/back/overhead view")
53
  parser.add_argument('--angle_overhead', type=float, default=30, help="[0, angle_overhead] is the overhead region")
54
- parser.add_argument('--angle_front', type=float, default=30, help="[0, angle_front] is the front region, [180, 180+angle_front] the back region, otherwise the side region.")
55
 
56
  parser.add_argument('--lambda_entropy', type=float, default=1e-4, help="loss scale for alpha entropy")
 
57
  parser.add_argument('--lambda_orient', type=float, default=1e-2, help="loss scale for orientation")
58
 
59
  ### GUI options
@@ -71,10 +73,16 @@ if __name__ == '__main__':
71
  if opt.O:
72
  opt.fp16 = True
73
  opt.dir_text = True
 
74
  opt.cuda_ray = True
 
 
 
75
  elif opt.O2:
76
  opt.fp16 = True
77
  opt.dir_text = True
 
 
78
 
79
  if opt.backbone == 'vanilla':
80
  from nerf.network import NeRFNetwork
@@ -98,7 +106,7 @@ if __name__ == '__main__':
98
  if opt.test:
99
  guidance = None # no need to load guidance model at test
100
 
101
- trainer = Trainer('ngp', opt, model, guidance, device=device, workspace=opt.workspace, fp16=opt.fp16, use_checkpoint=opt.ckpt)
102
 
103
  if opt.gui:
104
  gui = NeRFGUI(opt, trainer)
@@ -127,10 +135,10 @@ if __name__ == '__main__':
127
 
128
  train_loader = NeRFDataset(opt, device=device, type='train', H=opt.h, W=opt.w, size=100).dataloader()
129
 
130
- # decay to 0.01 * init_lr at last iter step
131
- scheduler = lambda optimizer: optim.lr_scheduler.LambdaLR(optimizer, lambda iter: 0.01 ** min(iter / opt.iters, 1))
132
 
133
- trainer = Trainer('ngp', opt, model, guidance, device=device, workspace=opt.workspace, optimizer=optimizer, ema_decay=0.95, fp16=opt.fp16, lr_scheduler=scheduler, use_checkpoint=opt.ckpt, eval_interval=opt.eval_interval)
134
 
135
  if opt.gui:
136
  trainer.train_loader = train_loader # attach dataloader to trainer
 
23
  parser.add_argument('--seed', type=int, default=0)
24
 
25
  ### training options
26
+ parser.add_argument('--iters', type=int, default=10000, help="training iters")
27
  parser.add_argument('--lr', type=float, default=1e-3, help="initial learning rate")
28
  parser.add_argument('--ckpt', type=str, default='latest')
29
  parser.add_argument('--cuda_ray', action='store_true', help="use CUDA raymarching instead of pytorch")
30
  parser.add_argument('--max_steps', type=int, default=1024, help="max num steps sampled per ray (only valid when using --cuda_ray)")
31
+ parser.add_argument('--num_steps', type=int, default=64, help="num steps sampled per ray (only valid when not using --cuda_ray)")
32
+ parser.add_argument('--upsample_steps', type=int, default=64, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
33
  parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
34
  parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
35
+ parser.add_argument('--albedo_iters', type=int, default=1000, help="training iters that only use albedo shading")
36
  # model options
37
  parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
38
  parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
 
40
  parser.add_argument('--fp16', action='store_true', help="use amp mixed precision training")
41
  parser.add_argument('--backbone', type=str, default='grid', help="nerf backbone, choose from [grid, tcnn, vanilla]")
42
  # rendering resolution in training, decrease this if CUDA OOM.
43
+ parser.add_argument('--w', type=int, default=64, help="render width for NeRF in training")
44
+ parser.add_argument('--h', type=int, default=64, help="render height for NeRF in training")
45
+ parser.add_argument('--jitter_pose', action='store_true', help="add jitters to the randomly sampled camera poses")
46
 
47
  ### dataset options
48
  parser.add_argument('--bound', type=float, default=1, help="assume the scene is bounded in box(-bound, bound)")
 
52
  parser.add_argument('--fovy_range', type=float, nargs='*', default=[40, 70], help="training camera fovy range")
53
  parser.add_argument('--dir_text', action='store_true', help="direction-encode the text prompt, by appending front/side/back/overhead view")
54
  parser.add_argument('--angle_overhead', type=float, default=30, help="[0, angle_overhead] is the overhead region")
55
+ parser.add_argument('--angle_front', type=float, default=60, help="[0, angle_front] is the front region, [180, 180+angle_front] the back region, otherwise the side region.")
56
 
57
  parser.add_argument('--lambda_entropy', type=float, default=1e-4, help="loss scale for alpha entropy")
58
+ parser.add_argument('--lambda_opacity', type=float, default=0, help="loss scale for alpha value")
59
  parser.add_argument('--lambda_orient', type=float, default=1e-2, help="loss scale for orientation")
60
 
61
  ### GUI options
 
73
  if opt.O:
74
  opt.fp16 = True
75
  opt.dir_text = True
76
+ # use occupancy grid to prune ray sampling, faster rendering.
77
  opt.cuda_ray = True
78
+ # opt.lambda_entropy = 1e-4
79
+ # opt.lambda_opacity = 0
80
+
81
  elif opt.O2:
82
  opt.fp16 = True
83
  opt.dir_text = True
84
+ opt.lambda_entropy = 1e-4 # necessary to keep non-empty
85
+ opt.lambda_opacity = 3e-3 # no occupancy grid, so use a stronger opacity loss.
86
 
87
  if opt.backbone == 'vanilla':
88
  from nerf.network import NeRFNetwork
 
106
  if opt.test:
107
  guidance = None # no need to load guidance model at test
108
 
109
+ trainer = Trainer('df', opt, model, guidance, device=device, workspace=opt.workspace, fp16=opt.fp16, use_checkpoint=opt.ckpt)
110
 
111
  if opt.gui:
112
  gui = NeRFGUI(opt, trainer)
 
135
 
136
  train_loader = NeRFDataset(opt, device=device, type='train', H=opt.h, W=opt.w, size=100).dataloader()
137
 
138
+ scheduler = lambda optimizer: optim.lr_scheduler.LambdaLR(optimizer, lambda iter: 0.1 ** min(iter / opt.iters, 1))
139
+ # scheduler = lambda optimizer: optim.lr_scheduler.OneCycleLR(optimizer, max_lr=opt.lr, total_steps=opt.iters, pct_start=0.1)
140
 
141
+ trainer = Trainer('df', opt, model, guidance, device=device, workspace=opt.workspace, optimizer=optimizer, ema_decay=None, fp16=opt.fp16, lr_scheduler=scheduler, use_checkpoint=opt.ckpt, eval_interval=opt.eval_interval, scheduler_update_every_step=True)
142
 
143
  if opt.gui:
144
  trainer.train_loader = train_loader # attach dataloader to trainer
nerf/network.py CHANGED
@@ -52,7 +52,7 @@ class NeRFNetwork(NeRFRenderer):
52
  if self.bg_radius > 0:
53
  self.num_layers_bg = num_layers_bg
54
  self.hidden_dim_bg = hidden_dim_bg
55
- self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=2)
56
  self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
57
 
58
  else:
@@ -80,7 +80,7 @@ class NeRFNetwork(NeRFRenderer):
80
  return sigma, albedo
81
 
82
  # ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
83
- def finite_difference_normal(self, x, epsilon=5e-4):
84
  # x: [N, 3]
85
  dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
86
  dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
@@ -148,10 +148,9 @@ class NeRFNetwork(NeRFRenderer):
148
  }
149
 
150
 
151
- def background(self, x, d):
152
- # x: [N, 2], in [-1, 1]
153
 
154
- h = self.encoder_bg(x) # [N, C]
155
 
156
  h = self.bg_net(h)
157
 
 
52
  if self.bg_radius > 0:
53
  self.num_layers_bg = num_layers_bg
54
  self.hidden_dim_bg = hidden_dim_bg
55
+ self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=3)
56
  self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
57
 
58
  else:
 
80
  return sigma, albedo
81
 
82
  # ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
83
+ def finite_difference_normal(self, x, epsilon=1e-2):
84
  # x: [N, 3]
85
  dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
86
  dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
 
148
  }
149
 
150
 
151
+ def background(self, d):
 
152
 
153
+ h = self.encoder_bg(d) # [N, C]
154
 
155
  h = self.bg_net(h)
156
 
nerf/network_grid.py CHANGED
@@ -57,7 +57,7 @@ class NeRFNetwork(NeRFRenderer):
57
 
58
  # use a very simple network to avoid it learning the prompt...
59
  # self.encoder_bg, self.in_dim_bg = get_encoder('tiledgrid', input_dim=2, num_levels=4, desired_resolution=2048)
60
- self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=2)
61
 
62
  self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
63
 
@@ -87,7 +87,7 @@ class NeRFNetwork(NeRFRenderer):
87
  return sigma, albedo
88
 
89
  # ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
90
- def finite_difference_normal(self, x, epsilon=5e-4):
91
  # x: [N, 3]
92
  dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
93
  dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
@@ -155,10 +155,9 @@ class NeRFNetwork(NeRFRenderer):
155
  }
156
 
157
 
158
- def background(self, x, d):
159
- # x: [N, 2], in [-1, 1]
160
 
161
- h = self.encoder_bg(x) # [N, C]
162
 
163
  h = self.bg_net(h)
164
 
 
57
 
58
  # use a very simple network to avoid it learning the prompt...
59
  # self.encoder_bg, self.in_dim_bg = get_encoder('tiledgrid', input_dim=2, num_levels=4, desired_resolution=2048)
60
+ self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=3)
61
 
62
  self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
63
 
 
87
  return sigma, albedo
88
 
89
  # ref: https://github.com/zhaofuq/Instant-NSR/blob/main/nerf/network_sdf.py#L192
90
+ def finite_difference_normal(self, x, epsilon=1e-2):
91
  # x: [N, 3]
92
  dx_pos, _ = self.common_forward((x + torch.tensor([[epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
93
  dx_neg, _ = self.common_forward((x + torch.tensor([[-epsilon, 0.00, 0.00]], device=x.device)).clamp(-self.bound, self.bound))
 
155
  }
156
 
157
 
158
+ def background(self, d):
 
159
 
160
+ h = self.encoder_bg(d) # [N, C]
161
 
162
  h = self.bg_net(h)
163
 
nerf/network_tcnn.py CHANGED
@@ -4,6 +4,7 @@ import torch.nn.functional as F
4
 
5
  from activation import trunc_exp
6
  from .renderer import NeRFRenderer
 
7
 
8
  import numpy as np
9
  import tinycudann as tcnn
@@ -65,19 +66,9 @@ class NeRFNetwork(NeRFRenderer):
65
  self.num_layers_bg = num_layers_bg
66
  self.hidden_dim_bg = hidden_dim_bg
67
 
68
- self.encoder_bg = tcnn.Encoding(
69
- n_input_dims=2,
70
- encoding_config={
71
- "otype": "HashGrid",
72
- "n_levels": 4,
73
- "n_features_per_level": 2,
74
- "log2_hashmap_size": 16,
75
- "base_resolution": 16,
76
- "per_level_scale": 1.5,
77
- },
78
- )
79
-
80
- self.bg_net = MLP(8, 3, hidden_dim_bg, num_layers_bg, bias=True)
81
 
82
  else:
83
  self.bg_net = None
@@ -156,11 +147,10 @@ class NeRFNetwork(NeRFRenderer):
156
  }
157
 
158
 
159
- def background(self, x, d):
160
  # x: [N, 2], in [-1, 1]
161
 
162
- h = (x + 1) / (2 * 1) # to [0, 1]
163
- h = self.encoder_bg(h) # [N, C]
164
 
165
  h = self.bg_net(h)
166
 
 
4
 
5
  from activation import trunc_exp
6
  from .renderer import NeRFRenderer
7
+ from encoding import get_encoder
8
 
9
  import numpy as np
10
  import tinycudann as tcnn
 
66
  self.num_layers_bg = num_layers_bg
67
  self.hidden_dim_bg = hidden_dim_bg
68
 
69
+ self.encoder_bg, self.in_dim_bg = get_encoder('frequency', input_dim=3)
70
+
71
+ self.bg_net = MLP(self.in_dim_bg, 3, hidden_dim_bg, num_layers_bg, bias=True)
 
 
 
 
 
 
 
 
 
 
72
 
73
  else:
74
  self.bg_net = None
 
147
  }
148
 
149
 
150
+ def background(self, d):
151
  # x: [N, 2], in [-1, 1]
152
 
153
+ h = self.encoder_bg(d) # [N, C]
 
154
 
155
  h = self.bg_net(h)
156
 
nerf/provider.py CHANGED
@@ -55,7 +55,7 @@ def get_view_direction(thetas, phis, overhead, front):
55
  return res
56
 
57
 
58
- def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 150], phi_range=[0, 360], return_dirs=False, angle_overhead=30, angle_front=60):
59
  ''' generate random poses from an orbit camera
60
  Args:
61
  size: batch size of generated poses.
@@ -82,16 +82,23 @@ def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 150], phi_ra
82
  radius * torch.sin(thetas) * torch.cos(phis),
83
  ], dim=-1) # [B, 3]
84
 
 
 
85
  # jitters
86
- centers = centers + (torch.rand_like(centers) * 0.2 - 0.1)
87
- targets = torch.randn_like(centers) * 0.2
 
88
 
89
  # lookat
90
  forward_vector = safe_normalize(targets - centers)
91
  up_vector = torch.FloatTensor([0, -1, 0]).to(device).unsqueeze(0).repeat(size, 1)
92
  right_vector = safe_normalize(torch.cross(forward_vector, up_vector, dim=-1))
 
 
 
 
 
93
 
94
- up_noise = torch.randn_like(up_vector) * 0.02
95
  up_vector = safe_normalize(torch.cross(right_vector, forward_vector, dim=-1) + up_noise)
96
 
97
  poses = torch.eye(4, dtype=torch.float, device=device).unsqueeze(0).repeat(size, 1, 1)
@@ -170,7 +177,7 @@ class NeRFDataset:
170
 
171
  if self.training:
172
  # random pose on the fly
173
- poses, dirs = rand_poses(B, self.device, radius_range=self.radius_range, return_dirs=self.opt.dir_text, angle_overhead=self.opt.angle_overhead, angle_front=self.opt.angle_front)
174
 
175
  # random focal
176
  fov = random.random() * (self.fovy_range[1] - self.fovy_range[0]) + self.fovy_range[0]
 
55
  return res
56
 
57
 
58
+ def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 100], phi_range=[0, 360], return_dirs=False, angle_overhead=30, angle_front=60, jitter=False):
59
  ''' generate random poses from an orbit camera
60
  Args:
61
  size: batch size of generated poses.
 
82
  radius * torch.sin(thetas) * torch.cos(phis),
83
  ], dim=-1) # [B, 3]
84
 
85
+ targets = 0
86
+
87
  # jitters
88
+ if jitter:
89
+ centers = centers + (torch.rand_like(centers) * 0.2 - 0.1)
90
+ targets = targets + torch.randn_like(centers) * 0.2
91
 
92
  # lookat
93
  forward_vector = safe_normalize(targets - centers)
94
  up_vector = torch.FloatTensor([0, -1, 0]).to(device).unsqueeze(0).repeat(size, 1)
95
  right_vector = safe_normalize(torch.cross(forward_vector, up_vector, dim=-1))
96
+
97
+ if jitter:
98
+ up_noise = torch.randn_like(up_vector) * 0.02
99
+ else:
100
+ up_noise = 0
101
 
 
102
  up_vector = safe_normalize(torch.cross(right_vector, forward_vector, dim=-1) + up_noise)
103
 
104
  poses = torch.eye(4, dtype=torch.float, device=device).unsqueeze(0).repeat(size, 1, 1)
 
177
 
178
  if self.training:
179
  # random pose on the fly
180
+ poses, dirs = rand_poses(B, self.device, radius_range=self.radius_range, return_dirs=self.opt.dir_text, angle_overhead=self.opt.angle_overhead, angle_front=self.opt.angle_front, jitter=self.opt.jitter_pose)
181
 
182
  # random focal
183
  fov = random.random() * (self.fovy_range[1] - self.fovy_range[0]) + self.fovy_range[0]
nerf/renderer.py CHANGED
@@ -420,8 +420,8 @@ class NeRFRenderer(nn.Module):
420
  # mix background color
421
  if self.bg_radius > 0:
422
  # use the bg model to calculate bg_color
423
- sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
424
- bg_color = self.background(sph, rays_d.reshape(-1, 3)) # [N, 3]
425
  elif bg_color is None:
426
  bg_color = 1
427
 
@@ -526,8 +526,8 @@ class NeRFRenderer(nn.Module):
526
  if self.bg_radius > 0:
527
 
528
  # use the bg model to calculate bg_color
529
- sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
530
- bg_color = self.background(sph, rays_d) # [N, 3]
531
 
532
  elif bg_color is None:
533
  bg_color = 1
 
420
  # mix background color
421
  if self.bg_radius > 0:
422
  # use the bg model to calculate bg_color
423
+ # sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
424
+ bg_color = self.background(rays_d.reshape(-1, 3)) # [N, 3]
425
  elif bg_color is None:
426
  bg_color = 1
427
 
 
526
  if self.bg_radius > 0:
527
 
528
  # use the bg model to calculate bg_color
529
+ # sph = raymarching.sph_from_ray(rays_o, rays_d, self.bg_radius) # [N, 2] in [-1, 1]
530
+ bg_color = self.background(rays_d) # [N, 3]
531
 
532
  elif bg_color is None:
533
  bg_color = 1
nerf/sd.py CHANGED
@@ -17,10 +17,10 @@ class StableDiffusion(nn.Module):
17
  try:
18
  with open('./TOKEN', 'r') as f:
19
  self.token = f.read().replace('\n', '') # remove the last \n!
20
- print(f'[INFO] successfully loaded hugging face user token!')
21
  except FileNotFoundError as e:
22
- print(e)
23
- print(f'[INFO] Please first create a file called TOKEN and copy your hugging face access token into it to download stable diffusion checkpoints.')
24
 
25
  self.device = device
26
  self.num_train_timesteps = 1000
@@ -94,9 +94,9 @@ class StableDiffusion(nn.Module):
94
  noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
95
  noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
96
 
97
- # w(t), alpha_t * sigma_t^2
98
- # w = (1 - self.alphas[t])
99
- w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
100
  grad = w * (noise_pred - noise)
101
 
102
  # clip grad for stable training?
 
17
  try:
18
  with open('./TOKEN', 'r') as f:
19
  self.token = f.read().replace('\n', '') # remove the last \n!
20
+ print(f'[INFO] loaded hugging face access token from ./TOKEN!')
21
  except FileNotFoundError as e:
22
+ self.token = True
23
+ print(f'[INFO] try to load hugging face access token from the default place, make sure you have run `huggingface-cli login`.')
24
 
25
  self.device = device
26
  self.num_train_timesteps = 1000
 
94
  noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
95
  noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
96
 
97
+ # w(t), sigma_t^2
98
+ w = (1 - self.alphas[t])
99
+ # w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
100
  grad = w * (noise_pred - noise)
101
 
102
  # clip grad for stable training?
nerf/utils.py CHANGED
@@ -195,9 +195,6 @@ class Trainer(object):
195
  self.scheduler_update_every_step = scheduler_update_every_step
196
  self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu')
197
  self.console = Console()
198
-
199
- # text prompt
200
- ref_text = self.opt.text
201
 
202
  model.to(self.device)
203
  if self.world_size > 1:
@@ -208,20 +205,13 @@ class Trainer(object):
208
  # guide model
209
  self.guidance = guidance
210
 
 
211
  if self.guidance is not None:
212
- assert ref_text is not None, 'Training must provide a text prompt!'
213
-
214
  for p in self.guidance.parameters():
215
  p.requires_grad = False
216
 
217
- if not self.opt.dir_text:
218
- self.text_z = self.guidance.get_text_embeds([ref_text])
219
- else:
220
- self.text_z = []
221
- for d in ['front', 'side', 'back', 'side', 'overhead', 'bottom']:
222
- text = f"{ref_text}, {d} view"
223
- text_z = self.guidance.get_text_embeds([text])
224
- self.text_z.append(text_z)
225
 
226
  else:
227
  self.text_z = None
@@ -257,7 +247,7 @@ class Trainer(object):
257
  "results": [], # metrics[0], or valid_loss
258
  "checkpoints": [], # record path of saved ckpt, to automatically remove old ckpt
259
  "best_result": None,
260
- }
261
 
262
  # auto fix
263
  if len(metrics) == 0 or self.use_loss_as_metric:
@@ -297,6 +287,23 @@ class Trainer(object):
297
  self.log(f"[INFO] Loading {self.use_checkpoint} ...")
298
  self.load_checkpoint(self.use_checkpoint)
299
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300
  def __del__(self):
301
  if self.log_ptr:
302
  self.log_ptr.close()
@@ -330,11 +337,11 @@ class Trainer(object):
330
  if rand > 0.8:
331
  shading = 'albedo'
332
  ambient_ratio = 1.0
333
- elif rand > 0.4:
334
- shading = 'lambertian'
335
- ambient_ratio = 0.1
336
  else:
337
- shading = 'textureless'
338
  ambient_ratio = 0.1
339
 
340
  # _t = time.time()
@@ -343,6 +350,9 @@ class Trainer(object):
343
  pred_rgb = outputs['image'].reshape(B, H, W, 3).permute(0, 3, 1, 2).contiguous() # [1, 3, H, W]
344
  # torch.cuda.synchronize(); print(f'[TIME] nerf render {time.time() - _t:.4f}s')
345
 
 
 
 
346
  # text embeddings
347
  if self.opt.dir_text:
348
  dirs = data['dir'] # [B,]
@@ -352,22 +362,24 @@ class Trainer(object):
352
 
353
  # encode pred_rgb to latents
354
  # _t = time.time()
355
- loss_guidance = self.guidance.train_step(text_z, pred_rgb)
356
  # torch.cuda.synchronize(); print(f'[TIME] total guiding {time.time() - _t:.4f}s')
357
 
358
  # occupancy loss
359
  pred_ws = outputs['weights_sum'].reshape(B, 1, H, W)
360
- # mask_ws = outputs['mask'].reshape(B, 1, H, W) # near < far
361
 
362
- # loss_ws = (pred_ws ** 2 + 0.01).sqrt().mean()
 
 
363
 
364
- alphas = (pred_ws).clamp(1e-5, 1 - 1e-5)
365
- # alphas = alphas ** 2 # skewed entropy, favors 0 over 1
366
- loss_entropy = (- alphas * torch.log2(alphas) - (1 - alphas) * torch.log2(1 - alphas)).mean()
367
-
368
- loss = loss_guidance + self.opt.lambda_entropy * loss_entropy
 
369
 
370
- if 'loss_orient' in outputs:
371
  loss_orient = outputs['loss_orient']
372
  loss = loss + self.opt.lambda_orient * loss_orient
373
 
@@ -442,6 +454,9 @@ class Trainer(object):
442
  ### ------------------------------
443
 
444
  def train(self, train_loader, valid_loader, max_epochs):
 
 
 
445
  if self.use_tensorboardX and self.local_rank == 0:
446
  self.writer = tensorboardX.SummaryWriter(os.path.join(self.workspace, "run", self.name))
447
 
 
195
  self.scheduler_update_every_step = scheduler_update_every_step
196
  self.device = device if device is not None else torch.device(f'cuda:{local_rank}' if torch.cuda.is_available() else 'cpu')
197
  self.console = Console()
 
 
 
198
 
199
  model.to(self.device)
200
  if self.world_size > 1:
 
205
  # guide model
206
  self.guidance = guidance
207
 
208
+ # text prompt
209
  if self.guidance is not None:
210
+
 
211
  for p in self.guidance.parameters():
212
  p.requires_grad = False
213
 
214
+ self.prepare_text_embeddings()
 
 
 
 
 
 
 
215
 
216
  else:
217
  self.text_z = None
 
247
  "results": [], # metrics[0], or valid_loss
248
  "checkpoints": [], # record path of saved ckpt, to automatically remove old ckpt
249
  "best_result": None,
250
+ }
251
 
252
  # auto fix
253
  if len(metrics) == 0 or self.use_loss_as_metric:
 
287
  self.log(f"[INFO] Loading {self.use_checkpoint} ...")
288
  self.load_checkpoint(self.use_checkpoint)
289
 
290
+ # calculate the text embs.
291
+ def prepare_text_embeddings(self):
292
+
293
+ if self.opt.text is None:
294
+ self.log(f"[WARN] text prompt is not provided.")
295
+ self.text_z = None
296
+ return
297
+
298
+ if not self.opt.dir_text:
299
+ self.text_z = self.guidance.get_text_embeds([self.opt.text])
300
+ else:
301
+ self.text_z = []
302
+ for d in ['front', 'side', 'back', 'side', 'overhead', 'bottom']:
303
+ text = f"{self.opt.text}, {d} view"
304
+ text_z = self.guidance.get_text_embeds([text])
305
+ self.text_z.append(text_z)
306
+
307
  def __del__(self):
308
  if self.log_ptr:
309
  self.log_ptr.close()
 
337
  if rand > 0.8:
338
  shading = 'albedo'
339
  ambient_ratio = 1.0
340
+ # elif rand > 0.4:
341
+ # shading = 'textureless'
342
+ # ambient_ratio = 0.1
343
  else:
344
+ shading = 'lambertian'
345
  ambient_ratio = 0.1
346
 
347
  # _t = time.time()
 
350
  pred_rgb = outputs['image'].reshape(B, H, W, 3).permute(0, 3, 1, 2).contiguous() # [1, 3, H, W]
351
  # torch.cuda.synchronize(); print(f'[TIME] nerf render {time.time() - _t:.4f}s')
352
 
353
+ # print(shading)
354
+ # torch_vis_2d(pred_rgb[0])
355
+
356
  # text embeddings
357
  if self.opt.dir_text:
358
  dirs = data['dir'] # [B,]
 
362
 
363
  # encode pred_rgb to latents
364
  # _t = time.time()
365
+ loss = self.guidance.train_step(text_z, pred_rgb)
366
  # torch.cuda.synchronize(); print(f'[TIME] total guiding {time.time() - _t:.4f}s')
367
 
368
  # occupancy loss
369
  pred_ws = outputs['weights_sum'].reshape(B, 1, H, W)
 
370
 
371
+ if self.opt.lambda_opacity > 0:
372
+ loss_opacity = (pred_ws ** 2).mean()
373
+ loss = loss + self.opt.lambda_opacity * loss_opacity
374
 
375
+ if self.opt.lambda_entropy > 0:
376
+ alphas = (pred_ws).clamp(1e-5, 1 - 1e-5)
377
+ # alphas = alphas ** 2 # skewed entropy, favors 0 over 1
378
+ loss_entropy = (- alphas * torch.log2(alphas) - (1 - alphas) * torch.log2(1 - alphas)).mean()
379
+
380
+ loss = loss + self.opt.lambda_entropy * loss_entropy
381
 
382
+ if self.opt.lambda_orient > 0 and 'loss_orient' in outputs:
383
  loss_orient = outputs['loss_orient']
384
  loss = loss + self.opt.lambda_orient * loss_orient
385
 
 
454
  ### ------------------------------
455
 
456
  def train(self, train_loader, valid_loader, max_epochs):
457
+
458
+ assert self.text_z is not None, 'Training must provide a text prompt!'
459
+
460
  if self.use_tensorboardX and self.local_rank == 0:
461
  self.writer = tensorboardX.SummaryWriter(os.path.join(self.workspace, "run", self.name))
462
 
raymarching/src/raymarching.cu CHANGED
@@ -905,7 +905,7 @@ __global__ void kernel_composite_rays(
905
  }
906
 
907
 
908
- void composite_rays(const uint32_t n_alive, const uint32_t n_step, const float T_thresh, at::Tensor rays_alive, at::Tensor rays_t, const at::Tensor sigmas, const at::Tensor rgbs, const at::Tensor deltas, at::Tensor weights, at::Tensor depth, at::Tensor image) {
909
  static constexpr uint32_t N_THREAD = 128;
910
  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
911
  image.scalar_type(), "composite_rays", ([&] {
 
905
  }
906
 
907
 
908
+ void composite_rays(const uint32_t n_alive, const uint32_t n_step, const float T_thresh, at::Tensor rays_alive, at::Tensor rays_t, at::Tensor sigmas, at::Tensor rgbs, at::Tensor deltas, at::Tensor weights, at::Tensor depth, at::Tensor image) {
909
  static constexpr uint32_t N_THREAD = 128;
910
  AT_DISPATCH_FLOATING_TYPES_AND_HALF(
911
  image.scalar_type(), "composite_rays", ([&] {
readme.md CHANGED
@@ -17,13 +17,13 @@ This project is a **work-in-progress**, and contains lots of differences from th
17
 
18
 
19
  ## Notable differences from the paper
20
- * Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently, 15000 training steps take about 5 hours to train on a V100.
21
  * We use the [multi-resolution grid encoder](https://github.com/NVlabs/instant-ngp/) to implement the NeRF backbone (implementation from [torch-ngp](https://github.com/ashawkey/torch-ngp)), which enables much faster rendering (~10FPS at 800x800).
22
  * We use the Adam optimizer with a larger initial learning rate.
23
 
24
 
25
  ## TODOs
26
- * The normal evaluation & shading part.
27
  * Better mesh (improve the surface quality).
28
 
29
  # Install
@@ -33,7 +33,9 @@ git clone https://github.com/ashawkey/stable-dreamfusion.git
33
  cd stable-dreamfusion
34
  ```
35
 
36
- **Important**: To download the Stable Diffusion model checkpoint, you should create a file called `TOKEN` under this directory (i.e., `stable-dreamfusion/TOKEN`) and copy your hugging face [access token](https://huggingface.co/docs/hub/security-tokens) into it.
 
 
37
 
38
  ### Install with pip
39
  ```bash
@@ -71,14 +73,30 @@ First time running will take some time to compile the CUDA extensions.
71
 
72
  ```bash
73
  ### stable-dreamfusion setting
74
- ## train with text prompt
75
  # `-O` equals `--cuda_ray --fp16 --dir_text`
 
 
 
76
  python main.py --text "a hamburger" --workspace trial -O
77
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ## after the training is finished:
79
- # test (exporting 360 video, and an obj mesh with png texture)
80
  python main.py --workspace trial -O --test
81
-
 
82
  # test with a GUI (free view control!)
83
  python main.py --workspace trial -O --test --gui
84
 
@@ -101,7 +119,7 @@ pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corner
101
  latents = self.encode_imgs(pred_rgb_512)
102
  ... # timestep sampling, noise adding and UNet noise predicting
103
  # 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
104
- w = (1 - self.scheduler.alphas_cumprod[t]).to(self.device)
105
  grad = w * (noise_pred - noise)
106
  latents.backward(gradient=grad, retain_graph=True)
107
  ```
@@ -117,7 +135,6 @@ latents.backward(gradient=grad, retain_graph=True)
117
  Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
118
  More testing is needed...
119
  * Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
120
- * use `--albedo_iters 1000` to enable random shading mode after 1000 steps from albedo, lambertian, and textureless.
121
  * light direction: current implementation use a plane light source, instead of a point light source...
122
  * View-dependent prompting: `./nerf/provider.py > get_view_direction`.
123
  * ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?
 
17
 
18
 
19
  ## Notable differences from the paper
20
+ * Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training. Currently, 10000 training steps take about 3 hours to train on a V100.
21
  * We use the [multi-resolution grid encoder](https://github.com/NVlabs/instant-ngp/) to implement the NeRF backbone (implementation from [torch-ngp](https://github.com/ashawkey/torch-ngp)), which enables much faster rendering (~10FPS at 800x800).
22
  * We use the Adam optimizer with a larger initial learning rate.
23
 
24
 
25
  ## TODOs
26
+ * Alleviate the multi-face [Janus problem](https://twitter.com/poolio/status/1578045212236034048).
27
  * Better mesh (improve the surface quality).
28
 
29
  # Install
 
33
  cd stable-dreamfusion
34
  ```
35
 
36
+ **Important**: To download the Stable Diffusion model checkpoint, you should provide your [access token](https://huggingface.co/settings/tokens). You could choose either of the following ways:
37
+ * Run `huggingface-cli login` and enter your token.
38
+ * Create a file called `TOKEN` under this directory (i.e., `stable-dreamfusion/TOKEN`) and copy your token into it.
39
 
40
  ### Install with pip
41
  ```bash
 
73
 
74
  ```bash
75
  ### stable-dreamfusion setting
76
+ ## train with text prompt (with the default settings)
77
  # `-O` equals `--cuda_ray --fp16 --dir_text`
78
+ # `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.
79
+ # `--fp16` enables half-precision training.
80
+ # `--dir_text` enables view-dependent prompting.
81
  python main.py --text "a hamburger" --workspace trial -O
82
 
83
+ # if the above command fails to generate things (learns an empty scene), maybe try:
84
+ # 1. disable random lambertian shading, simply use albedo as color:
85
+ python main.py --text "a hamburger" --workspace trial -O --albedo_iters 10000 # i.e., set --albedo_iters >= --iters, which is default to 10000
86
+ # 2. use a smaller density regularization weight:
87
+ python main.py --text "a hamburger" --workspace trial -O --lambda_entropy 1e-5
88
+
89
+ # you can also train in a GUI to visualize the training progress:
90
+ python main.py --text "a hamburger" --workspace trial -O --gui
91
+
92
+ # A Gradio GUI is also possible (with less options):
93
+ python gradio_app.py # open in web browser
94
+
95
  ## after the training is finished:
96
+ # test (exporting 360 video)
97
  python main.py --workspace trial -O --test
98
+ # also save a mesh (with obj, mtl, and png texture)
99
+ python main.py --workspace trial -O --test --save_mesh
100
  # test with a GUI (free view control!)
101
  python main.py --workspace trial -O --test --gui
102
 
 
119
  latents = self.encode_imgs(pred_rgb_512)
120
  ... # timestep sampling, noise adding and UNet noise predicting
121
  # 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
122
+ w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
123
  grad = w * (noise_pred - noise)
124
  latents.backward(gradient=grad, retain_graph=True)
125
  ```
 
135
  Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
136
  More testing is needed...
137
  * Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
 
138
  * light direction: current implementation use a plane light source, instead of a point light source...
139
  * View-dependent prompting: `./nerf/provider.py > get_view_direction`.
140
  * ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?