sjc

Runtime error

App Files Files Community

amankishore commited on Dec 8, 2022

Commit

c255c40

1 Parent(s): 5426b53

Added subpixel rendering!

Browse files

Files changed (5) hide show

README-orig.md +27 -22
README.md +9 -1
app.py +7 -3
highres_final_vis.py +124 -0
voxnerf/vox.py +0 -3

README-orig.md CHANGED Viewed

@@ -9,26 +9,35 @@
 TTI-Chicago, &dagger;Purdue University
-The repository contains Pytorch implementation of Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation.
-> We introduce a method that converts a pretrained 2D diffusion generative model on images into a 3D generative model of radiance fields, without requiring access to any 3D data. The key insight is to interpret diffusion models as learned predictors of a gradient field, often referred to as the score function of the data log-likelihood. We apply the chain rule on the estimated score, hence the name Score Jacobian Chaining (SJC).
 <a href="https://arxiv.org/abs/2212.00774"><img src="https://img.shields.io/badge/arXiv-2212.00774-b31b1b.svg" height=22.5></a>
-<a href="https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=22.5></a>
-<a href="https://pals.ttic.edu/p/score-jacobian-chaining"><img src="https://img.shields.io/website?down_color=lightgrey&down_message=offline&label=Project%20Page&up_color=lightgreen&up_message=online&url=https%3A%2F%2Fpals.ttic.edu%2Fp%2Fscore-jacobian-chaining" height=22.5></a>
 <!-- [ [arxiv](https://arxiv.org/abs/2212.00774) | [project page](https://pals.ttic.edu/p/score-jacobian-chaining) | [colab](https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing ) ] -->
 Many thanks to [dvschultz](https://github.com/dvschultz) for the colab.
 ## License
-Since we use Stable Diffusion, we are releasing under their OpenRAIL license. Otherwise we do not
-identify any components or upstream code that carry restrictive licensing requirements.
-## Structure
-In addition to SJC, the repo also contains an implementation of [Karras sampler](https://arxiv.org/abs/2206.00364),
-and a customized, simple voxel nerf. We provide the abstract parent class based on Karras et. al. and include
-a few types of diffusion model here. See adapt.py.
 ## Installation
@@ -46,8 +55,8 @@ git clone --depth 1 git@github.com:CompVis/taming-transformers.git && pip instal
 ## Downloading checkpoints
 We have bundled a minimal set of things you need to download (SD v1.5 ckpt, gddpm ckpt for LSUN and FFHQ)
-in a tar file, made available at our download server [here](https://dl.ttic.edu/pals/sjc/release.tar).
-It is a single file of 12GB, and you can use wget or curl.
 Remember to __update__ `env.json` to point at the new checkpoint root where you have uncompressed the files.
@@ -57,7 +66,7 @@ Make a new directory to run experiments (the script generates many logging files
 mkdir exp
 cd exp
 ```
-Run the following command to generate a new 3D asset. It takes about 25 minutes on a single A5000 GPU for 10000 steps of optimization.
 ```bash
 python /path/to/sjc/run_sjc.py \
 --sd.prompt "A zoomed out high quality photo of Temple of Heaven" \
@@ -86,15 +95,11 @@ python /path/to/sjc/run_sjc.py \
 `depth_weight` the weighting factor of the center depth loss
-`var_red` whether to use Eq. 16 vs Eq. 15. For some prompts such as Obama we actually see better results with Eq. 15.
 Visualization results are stored in the current directory. In directories named `test_*` there are images (under `view`) and videos (under `view_seq`) rendered at different iterations.
-## TODOs
-- [ ] add sub-pixel rendering script for high quality visualization such as in the teaser.
-- [ ] add script to reproduce 2D experiments in Fig 4. The Fig might need change once it's tied to seeds. Note that for a simple aligned domain like faces, simple scheduling like using a single σ=1.5 could already generate some nice images. But not so for bedrooms; it's too diverse and annealing seems still needed.
 ## To Reproduce the Results in the Paper
 First create a clean directory for your experiment, then run one of the following scripts from that folder:
 ### Trump
@@ -200,19 +205,19 @@ python /path/to/sjc/run_sjc.py --sd.prompt "A pig" --n_steps 10000 --lr 0.05 --s
 ```
 python /path/to/sjc/run_nerf.py
 ```
-Our bundle contains a tar ball for the lego bulldozer dataset. Untar it and it will work.
 ## To Sample 2D images with the Karras Sampler
 ```
 python /path/to/sjc/run_img_sampling.py
 ```
-Use help -h to see the options available. Will expand the details later.
-## Bib
 ```
 @article{sjc,
-      title={Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation},
       author={Wang, Haochen and Du, Xiaodan and Li, Jiahao and Yeh, Raymond A. and Shakhnarovich, Greg},
       journal={arXiv preprint arXiv:2212.00774},
       year={2022},

 TTI-Chicago, &dagger;Purdue University
+Abstract: *A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D data generation. We identify a technical challenge of distribution mismatch that arises in this application, and propose a novel estimation mechanism to resolve it. We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.*
 <a href="https://arxiv.org/abs/2212.00774"><img src="https://img.shields.io/badge/arXiv-2212.00774-b31b1b.svg" height=22.5></a>
+<a href="https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=22.5></a>
+<a href="https://pals.ttic.edu/p/score-jacobian-chaining"><img src="https://img.shields.io/website?down_color=lightgrey&down_message=offline&label=Project%20Page&up_color=lightgreen&up_message=online&url=https%3A%2F%2Fpals.ttic.edu%2Fp%2Fscore-jacobian-chaining" height=22.5></a>
 <!-- [ [arxiv](https://arxiv.org/abs/2212.00774) | [project page](https://pals.ttic.edu/p/score-jacobian-chaining) | [colab](https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing ) ] -->
 Many thanks to [dvschultz](https://github.com/dvschultz) for the colab.
+## Updates
+- We have added subpixel rendering script for final high quality vis. The jittery videos you might have seen should be significantly better now. Please run `python /path/to/sjc/highres_final_vis.py` in the exp folder after the training is complete. There are a few toggles in the script you can play with, but the default is ok. It takes about 5 minutes / 11GB on an A5000, and the extra time is mainly due to SD Decoder.
+- If you are running SJC with a DreamBooth fine-tuned model: the model's output distribution is already significantly narrowed. It might help to use a lower guidance scale `--sd.scale 50.0` for example. Intense mode-seeking is one cause for multi-face problem. We have internally tried DreamBooth with view-dependent prompt fine-tuning. But by and large DreamBooth integration is not ready.
+## TODOs
+- [ ] make seeds configurable. So far all seeds are hardcoded to 0.
+- [ ] add script to reproduce 2D experiments in Fig 4. The Fig might need change once it's tied to seeds. Note that for a simple aligned domain like faces, simple scheduling like using a single σ=1.5 could already generate some nice images. But not so for bedrooms; it's too diverse and annealing seems still needed.
+- [ ] main paper figures did not use subpix rendering; appendix figures did. Replace the main paper figures to make them consistent.
 ## License
+Since we use Stable Diffusion, we are releasing under their OpenRAIL license. Otherwise we do not
+identify any components or upstream code that carry restrictive licensing requirements.
+## Structure
+In addition to SJC, the repo also contains an implementation of [Karras sampler](https://arxiv.org/abs/2206.00364),
+and a customized, simple voxel nerf. We provide the abstract parent class based on Karras et. al. and include
+a few types of diffusion model here. See adapt.py.
 ## Installation
 ## Downloading checkpoints
 We have bundled a minimal set of things you need to download (SD v1.5 ckpt, gddpm ckpt for LSUN and FFHQ)
+in a tar file, made available at our download server [here](https://dl.ttic.edu/pals/sjc/release.tar).
+It is a single file of 12GB, and you can use wget or curl.
 Remember to __update__ `env.json` to point at the new checkpoint root where you have uncompressed the files.
 mkdir exp
 cd exp
 ```
+Run the following command to generate a new 3D asset. It takes about 25 minutes / 10GB GPU mem on a single A5000 GPU for 10000 steps of optimization.
 ```bash
 python /path/to/sjc/run_sjc.py \
 --sd.prompt "A zoomed out high quality photo of Temple of Heaven" \
 `depth_weight` the weighting factor of the center depth loss
+`var_red` whether to use Eq. 16 vs Eq. 15. For some prompts such as Obama we actually see better results with Eq. 15.
 Visualization results are stored in the current directory. In directories named `test_*` there are images (under `view`) and videos (under `view_seq`) rendered at different iterations.
 ## To Reproduce the Results in the Paper
 First create a clean directory for your experiment, then run one of the following scripts from that folder:
 ### Trump
 ```
 python /path/to/sjc/run_nerf.py
 ```
+Our bundle contains a tar ball for the lego bulldozer dataset. Untar it and it will work.
 ## To Sample 2D images with the Karras Sampler
 ```
 python /path/to/sjc/run_img_sampling.py
 ```
+Use help -h to see the options available. Will expand the details later.
+## Bib
 ```
 @article{sjc,
+      title={Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation},
       author={Wang, Haochen and Du, Xiaodan and Li, Jiahao and Yeh, Raymond A. and Shakhnarovich, Greg},
       journal={arXiv preprint arXiv:2212.00774},
       year={2022},

README.md CHANGED Viewed

@@ -10,4 +10,12 @@ pinned: false
 license: creativeml-openrail-m
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: creativeml-openrail-m
 ---
+## Bib
+```
+@article{sjc,
+      title={Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation},
+      author={Wang, Haochen and Du, Xiaodan and Li, Jiahao and Yeh, Raymond A. and Shakhnarovich, Greg},
+      journal={arXiv preprint arXiv:2212.00774},
+      year={2022},
+}
+```

app.py CHANGED Viewed

@@ -16,6 +16,7 @@ from voxnerf.utils import every
 from voxnerf.vis import stitch_vis, bad_vis as nerf_vis
 from run_sjc import render_one_view, tsr_stats
 import gradio as gr
 import gc
@@ -167,22 +168,25 @@ with gr.Blocks(css=css) as demo:
             # TODO: Save Checkpoint
             with torch.no_grad():
                 ckpt = vox.state_dict()
                 H, W = poser.H, poser.W
                 vox.eval()
-                K, poses = poser.sample_test(100)
                 aabb = vox.aabb.T.cpu().numpy()
                 vox = vox.to(device_glb)
                 num_imgs = len(poses)
                 all_images = []
                 for i in (pbar := tqdm(range(num_imgs))):
                     pose = poses[i]
-                    y, depth = render_one_view(vox, aabb, H, W, K, pose)
                     if isinstance(model, StableDiffusion):
                         y = model.decode(y)
                     pane, img, depth = vis_routine(y, depth)

 from voxnerf.vis import stitch_vis, bad_vis as nerf_vis
 from run_sjc import render_one_view, tsr_stats
+from highres_final_vis import highres_render_one_view
 import gradio as gr
 import gc
             # TODO: Save Checkpoint
             with torch.no_grad():
+                n_frames=200
+                factor=4
                 ckpt = vox.state_dict()
                 H, W = poser.H, poser.W
                 vox.eval()
+                K, poses = poser.sample_test(n_frames)
+                del n_frames
+                poses = poses[60:]  # skip the full overhead view; not interesting
                 aabb = vox.aabb.T.cpu().numpy()
                 vox = vox.to(device_glb)
                 num_imgs = len(poses)
                 all_images = []
                 for i in (pbar := tqdm(range(num_imgs))):
                     pose = poses[i]
+                    y, depth = highres_render_one_view(vox, aabb, H, W, K, pose, f=factor)
                     if isinstance(model, StableDiffusion):
                         y = model.decode(y)
                     pane, img, depth = vis_routine(y, depth)

highres_final_vis.py ADDED Viewed

	@@ -0,0 +1,124 @@

+import numpy as np
+import torch
+from einops import rearrange
+from voxnerf.render import subpixel_rays_from_img
+from run_sjc import (
+    SJC, ScoreAdapter, StableDiffusion,
+    tqdm, EventStorage, HeartBeat, EarlyLoopBreak, get_event_storage, get_heartbeat, optional_load_config, read_stats,
+    vis_routine, stitch_vis, latest_ckpt,
+    scene_box_filter, render_ray_bundle, as_torch_tsrs,
+    device_glb
+)
+# the SD deocder is very memory hungry; the latent image cannot be too large
+# for a graphics card with < 12 GB memory, set this to 128; quality already good
+# if your card has 12 to 24 GB memory, you can set this to 200;
+# but visually it won't help beyond a certain point. Our teaser is done with 128.
+decoder_bottleneck_hw = 128
+def final_vis():
+    cfg = optional_load_config(fname="full_config.yml")
+    assert len(cfg) > 0, "can't find cfg file"
+    mod = SJC(**cfg)
+    family = cfg.pop("family")
+    model: ScoreAdapter = getattr(mod, family).make()
+    vox = mod.vox.make()
+    poser = mod.pose.make()
+    pbar = tqdm(range(1))
+    with EventStorage(), HeartBeat(pbar):
+        ckpt_fname = latest_ckpt()
+        state = torch.load(ckpt_fname, map_location="cpu")
+        vox.load_state_dict(state)
+        vox.to(device_glb)
+        with EventStorage("highres"):
+            # what dominates the speed is NOT the factor here.
+            # you can try from 2 to 8, and the speed is about the same.
+            # the dominating factor in the pipeline I believe is the SD decoder.
+            evaluate(model, vox, poser, n_frames=200, factor=4)
+@torch.no_grad()
+def evaluate(score_model, vox, poser, n_frames=200, factor=4):
+    H, W = poser.H, poser.W
+    vox.eval()
+    K, poses = poser.sample_test(n_frames)
+    del n_frames
+    poses = poses[60:]  # skip the full overhead view; not interesting
+    fuse = EarlyLoopBreak(5)
+    metric = get_event_storage()
+    hbeat = get_heartbeat()
+    aabb = vox.aabb.T.cpu().numpy()
+    vox = vox.to(device_glb)
+    num_imgs = len(poses)
+    for i in (pbar := tqdm(range(num_imgs))):
+        if fuse.on_break():
+            break
+        pose = poses[i]
+        y, depth = highres_render_one_view(vox, aabb, H, W, K, pose, f=factor)
+        if isinstance(score_model, StableDiffusion):
+            y = score_model.decode(y)
+        vis_routine(metric, y, depth)
+        metric.step()
+        hbeat.beat()
+    metric.flush_history()
+    metric.put_artifact(
+        "movie_im_and_depth", ".mp4",
+        lambda fn: stitch_vis(fn, read_stats(metric.output_dir, "view")[1])
+    )
+    metric.put_artifact(
+        "movie_im_only", ".mp4",
+        lambda fn: stitch_vis(fn, read_stats(metric.output_dir, "img")[1])
+    )
+    metric.step()
+def highres_render_one_view(vox, aabb, H, W, K, pose, f=4):
+    bs = 4096
+    ro, rd = subpixel_rays_from_img(H, W, K, pose, f=f)
+    ro, rd, t_min, t_max = scene_box_filter(ro, rd, aabb)
+    n = len(ro)
+    ro, rd, t_min, t_max = as_torch_tsrs(vox.device, ro, rd, t_min, t_max)
+    rgbs = torch.zeros(n, 4, device=vox.device)
+    depth = torch.zeros(n, 1, device=vox.device)
+    with torch.no_grad():
+        for i in range(int(np.ceil(n / bs))):
+            s = i * bs
+            e = min(n, s + bs)
+            _rgbs, _depth, _ = render_ray_bundle(
+                vox, ro[s:e], rd[s:e], t_min[s:e], t_max[s:e]
+            )
+            rgbs[s:e] = _rgbs
+            depth[s:e] = _depth
+    rgbs = rearrange(rgbs, "(h w) c -> 1 c h w", h=H*f, w=W*f)
+    depth = rearrange(depth, "(h w) 1 -> h w", h=H*f, w=W*f)
+    rgbs = torch.nn.functional.interpolate(
+        rgbs, (decoder_bottleneck_hw, decoder_bottleneck_hw),
+        mode='bilinear', antialias=True
+    )
+    return rgbs, depth
+if __name__ == "__main__":
+    final_vis()

voxnerf/vox.py CHANGED Viewed

@@ -169,9 +169,6 @@ class VoxRF(nn.Module):
 @VOXRF_REGISTRY.register()
 class V_SJC(VoxRF):
-    """
-    For SJC, when sampling density σ, add a gaussian ball offset
-    """
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
         # rendering color in [-1, 1] range, since score models all operate on centered img

 @VOXRF_REGISTRY.register()
 class V_SJC(VoxRF):
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
         # rendering color in [-1, 1] range, since score models all operate on centered img