enable random lambertian shading in training

Browse files

Files changed (4) hide show

assets/update_logs.md +4 -0
main.py +5 -5
nerf/provider.py +1 -1
readme.md +14 -5

assets/update_logs.md CHANGED Viewed

@@ -1,3 +1,7 @@
 ### 2022.10.5
 * Basic reproduction finished.
 * Non --cuda_ray, --tcnn are not working, need to fix.

+### 2022.10.9
+* The shading (partially) starts to work, at least it won't make scene empty. For some prompts, it shows better results (less severe Janus problem). The textureless rendering mode is still disabled.
+* Enable shading by default (--albedo_iters 1000).
 ### 2022.10.5
 * Basic reproduction finished.
 * Non --cuda_ray, --tcnn are not working, need to fix.

main.py CHANGED Viewed

@@ -32,7 +32,7 @@ if __name__ == '__main__':
     parser.add_argument('--upsample_steps', type=int, default=64, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
     parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
     parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
-    parser.add_argument('--albedo_iters', type=int, default=15000, help="training iters that only use albedo shading")
     # model options
     parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
     parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
@@ -75,14 +75,14 @@ if __name__ == '__main__':
         opt.dir_text = True
         # use occupancy grid to prune ray sampling, faster rendering.
         opt.cuda_ray = True
-        opt.lambda_entropy = 1e-4
-        opt.lambda_opacity = 0
     elif opt.O2:
         opt.fp16 = True
         opt.dir_text = True
-        opt.lambda_entropy = 1e-3
-        opt.lambda_opacity = 1e-3 # no occupancy grid, so use a stronger opacity loss.
     if opt.backbone == 'vanilla':
         from nerf.network import NeRFNetwork

     parser.add_argument('--upsample_steps', type=int, default=64, help="num steps up-sampled per ray (only valid when not using --cuda_ray)")
     parser.add_argument('--update_extra_interval', type=int, default=16, help="iter interval to update extra status (only valid when using --cuda_ray)")
     parser.add_argument('--max_ray_batch', type=int, default=4096, help="batch size of rays at inference to avoid OOM (only valid when not using --cuda_ray)")
+    parser.add_argument('--albedo_iters', type=int, default=1000, help="training iters that only use albedo shading")
     # model options
     parser.add_argument('--bg_radius', type=float, default=1.4, help="if positive, use a background model at sphere(bg_radius)")
     parser.add_argument('--density_thresh', type=float, default=10, help="threshold for density grid to be occupied")
         opt.dir_text = True
         # use occupancy grid to prune ray sampling, faster rendering.
         opt.cuda_ray = True
+        # opt.lambda_entropy = 1e-4
+        # opt.lambda_opacity = 0
     elif opt.O2:
         opt.fp16 = True
         opt.dir_text = True
+        opt.lambda_entropy = 1e-4 # necessary to keep non-empty
+        opt.lambda_opacity = 3e-3 # no occupancy grid, so use a stronger opacity loss.
     if opt.backbone == 'vanilla':
         from nerf.network import NeRFNetwork

nerf/provider.py CHANGED Viewed

@@ -55,7 +55,7 @@ def get_view_direction(thetas, phis, overhead, front):
     return res
-def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 150], phi_range=[0, 360], return_dirs=False, angle_overhead=30, angle_front=60, jitter=False):
     ''' generate random poses from an orbit camera
     Args:
         size: batch size of generated poses.

     return res
+def rand_poses(size, device, radius_range=[1, 1.5], theta_range=[0, 100], phi_range=[0, 360], return_dirs=False, angle_overhead=30, angle_front=60, jitter=False):
     ''' generate random poses from an orbit camera
     Args:
         size: batch size of generated poses.

readme.md CHANGED Viewed

@@ -73,14 +73,24 @@ First time running will take some time to compile the CUDA extensions.
 ```bash
 ### stable-dreamfusion setting
-## train with text prompt
 # `-O` equals `--cuda_ray --fp16 --dir_text`
 python main.py --text "a hamburger" --workspace trial -O
 ## after the training is finished:
-# test (exporting 360 video, and an obj mesh with png texture)
 python main.py --workspace trial -O --test
 # test with a GUI (free view control!)
 python main.py --workspace trial -O --test --gui
@@ -103,7 +113,7 @@ pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corner
 latents = self.encode_imgs(pred_rgb_512)
 ... # timestep sampling, noise adding and UNet noise predicting
 # 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
-w = (1 - self.scheduler.alphas_cumprod[t]).to(self.device)
 grad = w * (noise_pred - noise)
 latents.backward(gradient=grad, retain_graph=True)
 ```
@@ -119,7 +129,6 @@ latents.backward(gradient=grad, retain_graph=True)
         Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
         More testing is needed...
 * Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
-    * use `--albedo_iters 1000` to enable random shading mode after 1000 steps from albedo, lambertian, and textureless.
     * light direction: current implementation use a plane light source, instead of a point light source...
 * View-dependent prompting: `./nerf/provider.py > get_view_direction`.
     * ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?

 ```bash
 ### stable-dreamfusion setting
+## train with text prompt (with the default settings)
 # `-O` equals `--cuda_ray --fp16 --dir_text`
+# `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.
+# `--fp16` enables half-precision training.
+# `--dir_text` enables view-dependent prompting.
 python main.py --text "a hamburger" --workspace trial -O
+# if the above command fails to generate things (learns an empty scene), maybe try:
+# 1. disable random lambertian shading, simply use albedo as color:
+python main.py --text "a hamburger" --workspace trial -O --albedo_iters 15000 # i.e., set --albedo_iters >= --iters, which is default to 15000
+# 2. use a smaller density regularization weight:
+python main.py --text "a hamburger" --workspace trial -O --lambda_entropy 1e-5
 ## after the training is finished:
+# test (exporting 360 video)
 python main.py --workspace trial -O --test
+# also save a mesh (with obj, mtl, and png texture)
+python main.py --workspace trial -O --test --save_mesh
 # test with a GUI (free view control!)
 python main.py --workspace trial -O --test --gui
 latents = self.encode_imgs(pred_rgb_512)
 ... # timestep sampling, noise adding and UNet noise predicting
 # 3. the SDS loss, since UNet part is ignored and cannot simply audodiff, we manually set the grad for latents.
+w = self.alphas[t] ** 0.5 * (1 - self.alphas[t])
 grad = w * (noise_pred - noise)
 latents.backward(gradient=grad, retain_graph=True)
 ```
         Training is faster if only sample 128 points uniformly per ray (5h --> 2.5h).
         More testing is needed...
 * Shading & normal evaluation: `./nerf/network*.py > NeRFNetwork > forward`. Current implementation harms training and is disabled.
     * light direction: current implementation use a plane light source, instead of a point light source...
 * View-dependent prompting: `./nerf/provider.py > get_view_direction`.
     * ues `--angle_overhead, --angle_front` to set the border. How to better divide front/back/side regions?