jiawei011 commited on
Commit
12b7f59
1 Parent(s): 5f58ec6
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png* filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ build/
3
+ *.egg-info/
4
+ *.so
5
+ venv_*/
6
+ .vs/
7
+ .vscode/
8
+ .idea/
9
+
10
+ tmp_*
11
+ data?
12
+ data??
13
+ scripts2
14
+
15
+ model_cache
16
+
17
+ logs
18
+ videos
19
+ images
20
+ *.mp4
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2023 dreamgaussian
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
LICENSE_GAUSSIAN_SPLATTING.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Gaussian-Splatting License
2
+ ===========================
3
+
4
+ **Inria** and **the Max Planck Institut for Informatik (MPII)** hold all the ownership rights on the *Software* named **gaussian-splatting**.
5
+ The *Software* is in the process of being registered with the Agence pour la Protection des
6
+ Programmes (APP).
7
+
8
+ The *Software* is still being developed by the *Licensor*.
9
+
10
+ *Licensor*'s goal is to allow the research community to use, test and evaluate
11
+ the *Software*.
12
+
13
+ ## 1. Definitions
14
+
15
+ *Licensee* means any person or entity that uses the *Software* and distributes
16
+ its *Work*.
17
+
18
+ *Licensor* means the owners of the *Software*, i.e Inria and MPII
19
+
20
+ *Software* means the original work of authorship made available under this
21
+ License ie gaussian-splatting.
22
+
23
+ *Work* means the *Software* and any additions to or derivative works of the
24
+ *Software* that are made available under this License.
25
+
26
+
27
+ ## 2. Purpose
28
+ This license is intended to define the rights granted to the *Licensee* by
29
+ Licensors under the *Software*.
30
+
31
+ ## 3. Rights granted
32
+
33
+ For the above reasons Licensors have decided to distribute the *Software*.
34
+ Licensors grant non-exclusive rights to use the *Software* for research purposes
35
+ to research users (both academic and industrial), free of charge, without right
36
+ to sublicense.. The *Software* may be used "non-commercially", i.e., for research
37
+ and/or evaluation purposes only.
38
+
39
+ Subject to the terms and conditions of this License, you are granted a
40
+ non-exclusive, royalty-free, license to reproduce, prepare derivative works of,
41
+ publicly display, publicly perform and distribute its *Work* and any resulting
42
+ derivative works in any form.
43
+
44
+ ## 4. Limitations
45
+
46
+ **4.1 Redistribution.** You may reproduce or distribute the *Work* only if (a) you do
47
+ so under this License, (b) you include a complete copy of this License with
48
+ your distribution, and (c) you retain without modification any copyright,
49
+ patent, trademark, or attribution notices that are present in the *Work*.
50
+
51
+ **4.2 Derivative Works.** You may specify that additional or different terms apply
52
+ to the use, reproduction, and distribution of your derivative works of the *Work*
53
+ ("Your Terms") only if (a) Your Terms provide that the use limitation in
54
+ Section 2 applies to your derivative works, and (b) you identify the specific
55
+ derivative works that are subject to Your Terms. Notwithstanding Your Terms,
56
+ this License (including the redistribution requirements in Section 3.1) will
57
+ continue to apply to the *Work* itself.
58
+
59
+ **4.3** Any other use without of prior consent of Licensors is prohibited. Research
60
+ users explicitly acknowledge having received from Licensors all information
61
+ allowing to appreciate the adequacy between of the *Software* and their needs and
62
+ to undertake all necessary precautions for its execution and use.
63
+
64
+ **4.4** The *Software* is provided both as a compiled library file and as source
65
+ code. In case of using the *Software* for a publication or other results obtained
66
+ through the use of the *Software*, users are strongly encouraged to cite the
67
+ corresponding publications as explained in the documentation of the *Software*.
68
+
69
+ ## 5. Disclaimer
70
+
71
+ THE USER CANNOT USE, EXPLOIT OR DISTRIBUTE THE *SOFTWARE* FOR COMMERCIAL PURPOSES
72
+ WITHOUT PRIOR AND EXPLICIT CONSENT OF LICENSORS. YOU MUST CONTACT INRIA FOR ANY
73
+ UNAUTHORIZED USE: stip-sophia.transfert@inria.fr . ANY SUCH ACTION WILL
74
+ CONSTITUTE A FORGERY. THIS *SOFTWARE* IS PROVIDED "AS IS" WITHOUT ANY WARRANTIES
75
+ OF ANY NATURE AND ANY EXPRESS OR IMPLIED WARRANTIES, WITH REGARDS TO COMMERCIAL
76
+ USE, PROFESSIONNAL USE, LEGAL OR NOT, OR OTHER, OR COMMERCIALISATION OR
77
+ ADAPTATION. UNLESS EXPLICITLY PROVIDED BY LAW, IN NO EVENT, SHALL INRIA OR THE
78
+ AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
79
+ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
80
+ GOODS OR SERVICES, LOSS OF USE, DATA, OR PROFITS OR BUSINESS INTERRUPTION)
81
+ HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
82
+ LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING FROM, OUT OF OR
83
+ IN CONNECTION WITH THE *SOFTWARE* OR THE USE OR OTHER DEALINGS IN THE *SOFTWARE*.
README.md DELETED
@@ -1,13 +0,0 @@
1
- ---
2
- title: Dreamgaussian
3
- emoji: 🌍
4
- colorFrom: red
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 3.47.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ from PIL import Image
4
+ import subprocess
5
+
6
+
7
+ # check if there is a picture uploaded or selected
8
+ def check_img_input(control_image):
9
+ if control_image is None:
10
+ raise gr.Error("Please select or upload an input image")
11
+
12
+
13
+ def optimize_stage_1(image_block: Image.Image, preprocess_chk: bool, elevation_slider: float):
14
+ if not os.path.exists('tmp_data'):
15
+ os.makedirs('tmp_data')
16
+ if preprocess_chk:
17
+ # save image to a designated path
18
+ image_block.save('tmp_data/tmp.png')
19
+
20
+ # preprocess image
21
+ subprocess.run([f'python process.py tmp_data/tmp.png'], shell=True)
22
+ else:
23
+ image_block.save('tmp_data/tmp_rgba.png')
24
+
25
+ # stage 1
26
+ subprocess.run([
27
+ f'python main.py --config configs/image.yaml input=tmp_data/tmp_rgba.png save_path=tmp mesh_format=glb elevation={elevation_slider} force_cuda_rast=True'],
28
+ shell=True)
29
+
30
+ return f'logs/tmp_mesh.glb'
31
+
32
+
33
+ def optimize_stage_2(elevation_slider: float):
34
+ # stage 2
35
+ subprocess.run([
36
+ f'python main2.py --config configs/image.yaml input=tmp_data/tmp_rgba.png save_path=tmp mesh_format=glb elevation={elevation_slider} force_cuda_rast=True'],
37
+ shell=True)
38
+
39
+ return f'logs/tmp.glb'
40
+
41
+
42
+ if __name__ == "__main__":
43
+ _TITLE = '''DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation'''
44
+
45
+ _DESCRIPTION = '''
46
+ <div>
47
+ <a style="display:inline-block" href="https://dreamgaussian.github.io"><img src='https://img.shields.io/badge/public_website-8A2BE2'></a>
48
+ <a style="display:inline-block; margin-left: .5em" href="https://arxiv.org/abs/2309.16653"><img src="https://img.shields.io/badge/2306.16928-f9f7f7?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADcAAABMCAYAAADJPi9EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAuIwAALiMBeKU/dgAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAAa2SURBVHja3Zt7bBRFGMAXUCDGF4rY7m7bAwuhlggKStFgLBgFEkCIIRJEEoOBYHwRFYKilUgEReVNJEGCJJpehHI3M9vZvd3bUP1DjNhEIRQQsQgSHiJgQZ5dv7krWEvvdmZ7d7vHJN+ft/f99pv5XvOtJMFCqvoCUpTdIEeRLC+L9Ox5i3Q9LACaCeK0kXoSChVcD3C/tQPHpAEsquQ73IkUcEz2kcLCknyGW5MGjkljRFVL8xJOKyi4CwCOuQAeAkfTP1+tNxLkogvgEbDgffkJqKqvuMA5ifOpqg/5qWecRstNg7xoUTI1Fovdxg8oy2s5AP8CGeYHmGngeZaOL4I4LXLcpHg4149/GDz4xqgsb+UAbMKKUpkrqHA43MUyyJpWUK0EHeG2YKRXr7tB+QMcgGewLD+ebTDbtrtbBt7UPlhS4rV4IvcDI7J8P1OeA/AcAI7LHljN7aB8XTowJmZt9EFRD/o0SDMH4HlwMhMyDWZZSAHFf3YDs3RS49WDLuaAY3IJq+qzmQKLxXAZKN7oDoYbdV3v5elPqiSpMyiOuAEVZVqHXb1OhloUH+MA+ztO0cAO/RkrfyBE7OAEbAZvO8vzVtTRWFD6DAfY5biBM3PWiaL0a4lvXICwnV8WjmE6ntYmhqX2jjp5LbMZjCw/wbYeN6CizOa2GMVzQOlmHjB4Ceuyk6LJ8huccEmR5Xddg7OOV/NAtchW+E3XbOag60QA4Qwuarca0bRuEJyr+cFQwzcY98huxhAKdQelt4kAQpj4qJ3gvFXAYn+aJumXk1yPlpQUgtIHhbYoFMUstNRRWgjnpl4A7IKlayNymqFHFaWCpV9CFry3LGxR1CgA5kB5M8OX2goApwpaz6mdOMGxtAgXWJySxb4WuQD4qTDgU+N5AAnzpr7ChSWpCyisiQJqY0Y7FtmSKpbV23b45kC0KHBxcQ9QeI8w4KgnHRPVtIU7rOtbioLVg5Hl/qDwSVFAMqLSMSObroCdZYlzIJtMRFVHCaRo/wFWPgaAXzdbBpkc2A4aKzCNd97+URQuESYGDDhIVfWOQIKZJu4D2+oXlgDTV1865gUQZDts756BArMNMoR1oa46BYqbyPixZz1ZUFV3sgwoGBajuBKATl3btIn8QYYMuezRgrsiRUWyr2BxA40EkPMpA/Hm6gbUu7fjEXA3azP6AsbKD9bxdUuhjM9W7fII52BF+daRpE4+WA3P501+jbfmHvQKyFqMuXf7Ot4mkN2fr50y+bRH61X7AXdUpHSxaPQ4GVbR5AGw3g+434XgQGKfr72I+vQRhfsu92dOx7WicInzt3CBg1RVpMm0NveWo2SqFzgmdNZMbriILD+S+zoueWf2vSdAipzacWN5nMl6XxNlUHa/J8DoJodUDE0HR8Ll5V0lPxcrLEHZPV4AzS83OLis7FowVa3RSku7BSNxJqQAlN3hBTC2apmDSkpaw22wJemGQFUG7J4MlP3JC6A+f96V7vRyX9It3nzT/GrjIU8edM7rMSnIi10f476lzbE1K7yEiEuWro0OJBguLCwDuFOJc1Na6sRWL/cCeMIwUN9ggSVbe3v/5/EgzTKWLvEAiBrYRUkgwNI2ZaFQNT75UDxEUEx97zYnzpmiLEmbaYCbNxYtFAb0/Z4AztgUrhyxuNgxPnhfHFDHz/vTgFWUQZxTRkkJhQ6YNdVUEPAfO6ZV5BRss6LcCVb7VaAma9giy0XJZBt9IQh42NY0NSdgbLIPlLUF6rEdrdt0CUCK1wsCbkcI3ZSLc7ZSwGLbmJXbPsNxnE5xilYKAobZ77LpGZ8TAIun+/iCKQoF71IxQDI3K2CCd+ARNvXg9sykBcnHAoCZG4u66hlDoQLe6QV4CRtFSxZQ+D0BwNO2jgdkzoGoah1nj3FVlSR19taTSYxI8QLut23U8dsgzqHulJNCQpcqBnpTALCuQ6NSYLHpmR5i42gZzuIdcrMMvMJbQlxe3jXxyZnLACl7ARm/FjPIDOY8ODtpM71sxwfcZpvBeUzKWmfNINM5AS+wO0Khh7dMqKccu4+qatarZjYAwDlgetzStHtEt+XedsBOQtU9XMrRgjg4KTnc5nr+dmqadit/4C4uLm8DuA9koJTj1TL7fI5nDL+qqoo/FLGAzL7dYT17PzvAcQONYSUQRxW/QMrHZVIyik0ZuQA2mzp+Ji8BW4YM3Mbzm9inaHkJCGfrUZZjujiYailfFwA8DHIy3acwUj4v9vUVa+SmgNsl5fuyDTKovW9/IAmfLV0Pi2UncA515kjYdrwC9i9rpuHiq3JwtAAAAABJRU5ErkJggg=="></a>
49
+ <a style="display:inline-block; margin-left: .5em" href='https://github.com/dreamgaussian/dreamgaussian'><img src='https://img.shields.io/github/stars/dreamgaussian/dreamgaussian?style=social'/></a>
50
+ </div>
51
+ We present DreamGausssion, a 3D content generation framework that significantly improves the efficiency of 3D content creation.
52
+ '''
53
+ _IMG_USER_GUIDE = "Please upload an image in the block above (or choose an example above) and click **Generate 3D**."
54
+
55
+ # load images in 'data' folder as examples
56
+ example_folder = os.path.join(os.path.dirname(__file__), 'data')
57
+ example_fns = os.listdir(example_folder)
58
+ example_fns.sort()
59
+ examples_full = [os.path.join(example_folder, x) for x in example_fns if x.endswith('.png')]
60
+
61
+ # Compose demo layout & data flow
62
+ with gr.Blocks(title=_TITLE, theme=gr.themes.Soft()) as demo:
63
+ with gr.Row():
64
+ with gr.Column(scale=1):
65
+ gr.Markdown('# ' + _TITLE)
66
+ gr.Markdown(_DESCRIPTION)
67
+
68
+ # Image-to-3D
69
+ with gr.Row(variant='panel'):
70
+ with gr.Column(scale=5):
71
+ image_block = gr.Image(type='pil', image_mode='RGBA', height=290, label='Input image', tool=None)
72
+
73
+ elevation_slider = gr.Slider(-90, 90, value=0, step=1, label='Estimated elevation angle')
74
+ gr.Markdown(
75
+ "default to 0 (horizontal), range from [-90, 90]. If you upload a look-down image, try a value like -30")
76
+
77
+ preprocess_chk = gr.Checkbox(True,
78
+ label='Preprocess image automatically (remove background and recenter object)')
79
+
80
+ gr.Examples(
81
+ examples=examples_full, # NOTE: elements must match inputs list!
82
+ inputs=[image_block],
83
+ outputs=[image_block],
84
+ cache_examples=False,
85
+ label='Examples (click one of the images below to start)',
86
+ examples_per_page=40
87
+ )
88
+ img_run_btn = gr.Button("Generate 3D")
89
+ img_guide_text = gr.Markdown(_IMG_USER_GUIDE, visible=True)
90
+
91
+ with gr.Column(scale=5):
92
+ obj3d_stage1 = gr.Model3D(clear_color=[0.0, 0.0, 0.0, 0.0], label="3D Model (Stage 1)")
93
+ obj3d = gr.Model3D(clear_color=[0.0, 0.0, 0.0, 0.0], label="3D Model (Final)")
94
+
95
+ # if there is an input image, continue with inference
96
+ # else display an error message
97
+ img_run_btn.click(check_img_input, inputs=[image_block], queue=False).success(optimize_stage_1,
98
+ inputs=[image_block,
99
+ preprocess_chk,
100
+ elevation_slider],
101
+ outputs=[
102
+ obj3d_stage1]).success(
103
+ optimize_stage_2, inputs=[elevation_slider], outputs=[obj3d])
104
+
105
+ demo.queue().launch(share=True)
cam_utils.py ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from scipy.spatial.transform import Rotation as R
3
+
4
+ import torch
5
+
6
+ def dot(x, y):
7
+ if isinstance(x, np.ndarray):
8
+ return np.sum(x * y, -1, keepdims=True)
9
+ else:
10
+ return torch.sum(x * y, -1, keepdim=True)
11
+
12
+
13
+ def length(x, eps=1e-20):
14
+ if isinstance(x, np.ndarray):
15
+ return np.sqrt(np.maximum(np.sum(x * x, axis=-1, keepdims=True), eps))
16
+ else:
17
+ return torch.sqrt(torch.clamp(dot(x, x), min=eps))
18
+
19
+
20
+ def safe_normalize(x, eps=1e-20):
21
+ return x / length(x, eps)
22
+
23
+
24
+ def look_at(campos, target, opengl=True):
25
+ # campos: [N, 3], camera/eye position
26
+ # target: [N, 3], object to look at
27
+ # return: [N, 3, 3], rotation matrix
28
+ if not opengl:
29
+ # camera forward aligns with -z
30
+ forward_vector = safe_normalize(target - campos)
31
+ up_vector = np.array([0, 1, 0], dtype=np.float32)
32
+ right_vector = safe_normalize(np.cross(forward_vector, up_vector))
33
+ up_vector = safe_normalize(np.cross(right_vector, forward_vector))
34
+ else:
35
+ # camera forward aligns with +z
36
+ forward_vector = safe_normalize(campos - target)
37
+ up_vector = np.array([0, 1, 0], dtype=np.float32)
38
+ right_vector = safe_normalize(np.cross(up_vector, forward_vector))
39
+ up_vector = safe_normalize(np.cross(forward_vector, right_vector))
40
+ R = np.stack([right_vector, up_vector, forward_vector], axis=1)
41
+ return R
42
+
43
+
44
+ # elevation & azimuth to pose (cam2world) matrix
45
+ def orbit_camera(elevation, azimuth, radius=1, is_degree=True, target=None, opengl=True):
46
+ # radius: scalar
47
+ # elevation: scalar, in (-90, 90), from +y to -y is (-90, 90)
48
+ # azimuth: scalar, in (-180, 180), from +z to +x is (0, 90)
49
+ # return: [4, 4], camera pose matrix
50
+ if is_degree:
51
+ elevation = np.deg2rad(elevation)
52
+ azimuth = np.deg2rad(azimuth)
53
+ x = radius * np.cos(elevation) * np.sin(azimuth)
54
+ y = - radius * np.sin(elevation)
55
+ z = radius * np.cos(elevation) * np.cos(azimuth)
56
+ if target is None:
57
+ target = np.zeros([3], dtype=np.float32)
58
+ campos = np.array([x, y, z]) + target # [3]
59
+ T = np.eye(4, dtype=np.float32)
60
+ T[:3, :3] = look_at(campos, target, opengl)
61
+ T[:3, 3] = campos
62
+ return T
63
+
64
+
65
+ class OrbitCamera:
66
+ def __init__(self, W, H, r=2, fovy=60, near=0.01, far=100):
67
+ self.W = W
68
+ self.H = H
69
+ self.radius = r # camera distance from center
70
+ self.fovy = np.deg2rad(fovy) # deg 2 rad
71
+ self.near = near
72
+ self.far = far
73
+ self.center = np.array([0, 0, 0], dtype=np.float32) # look at this point
74
+ self.rot = R.from_matrix(np.eye(3))
75
+ self.up = np.array([0, 1, 0], dtype=np.float32) # need to be normalized!
76
+
77
+ @property
78
+ def fovx(self):
79
+ return 2 * np.arctan(np.tan(self.fovy / 2) * self.W / self.H)
80
+
81
+ @property
82
+ def campos(self):
83
+ return self.pose[:3, 3]
84
+
85
+ # pose (c2w)
86
+ @property
87
+ def pose(self):
88
+ # first move camera to radius
89
+ res = np.eye(4, dtype=np.float32)
90
+ res[2, 3] = self.radius # opengl convention...
91
+ # rotate
92
+ rot = np.eye(4, dtype=np.float32)
93
+ rot[:3, :3] = self.rot.as_matrix()
94
+ res = rot @ res
95
+ # translate
96
+ res[:3, 3] -= self.center
97
+ return res
98
+
99
+ # view (w2c)
100
+ @property
101
+ def view(self):
102
+ return np.linalg.inv(self.pose)
103
+
104
+ # projection (perspective)
105
+ @property
106
+ def perspective(self):
107
+ y = np.tan(self.fovy / 2)
108
+ aspect = self.W / self.H
109
+ return np.array(
110
+ [
111
+ [1 / (y * aspect), 0, 0, 0],
112
+ [0, -1 / y, 0, 0],
113
+ [
114
+ 0,
115
+ 0,
116
+ -(self.far + self.near) / (self.far - self.near),
117
+ -(2 * self.far * self.near) / (self.far - self.near),
118
+ ],
119
+ [0, 0, -1, 0],
120
+ ],
121
+ dtype=np.float32,
122
+ )
123
+
124
+ # intrinsics
125
+ @property
126
+ def intrinsics(self):
127
+ focal = self.H / (2 * np.tan(self.fovy / 2))
128
+ return np.array([focal, focal, self.W // 2, self.H // 2], dtype=np.float32)
129
+
130
+ @property
131
+ def mvp(self):
132
+ return self.perspective @ np.linalg.inv(self.pose) # [4, 4]
133
+
134
+ def orbit(self, dx, dy):
135
+ # rotate along camera up/side axis!
136
+ side = self.rot.as_matrix()[:3, 0]
137
+ rotvec_x = self.up * np.radians(-0.05 * dx)
138
+ rotvec_y = side * np.radians(-0.05 * dy)
139
+ self.rot = R.from_rotvec(rotvec_x) * R.from_rotvec(rotvec_y) * self.rot
140
+
141
+ def scale(self, delta):
142
+ self.radius *= 1.1 ** (-delta)
143
+
144
+ def pan(self, dx, dy, dz=0):
145
+ # pan in camera coordinate system (careful on the sensitivity!)
146
+ self.center += 0.0005 * self.rot.as_matrix()[:3, :3] @ np.array([-dx, -dy, dz])
configs/image.yaml ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Input
2
+ # input rgba image path (default to None, can be load in GUI too)
3
+ input:
4
+ # input text prompt (default to None, can be input in GUI too)
5
+ prompt:
6
+ # input mesh for stage 2 (auto-search from stage 1 output path if None)
7
+ mesh:
8
+ # estimated elevation angle for input image
9
+ elevation: 0
10
+ # reference image resolution
11
+ ref_size: 256
12
+ # density thresh for mesh extraction
13
+ density_thresh: 1
14
+
15
+ ### Output
16
+ outdir: logs
17
+ mesh_format: obj
18
+ save_path: ???
19
+
20
+ ### Training
21
+ # guidance loss weights (0 to disable)
22
+ lambda_sd: 0
23
+ lambda_zero123: 1
24
+ # training batch size per iter
25
+ batch_size: 1
26
+ # training iterations for stage 1
27
+ iters: 500
28
+ # training iterations for stage 2
29
+ iters_refine: 50
30
+ # training camera radius
31
+ radius: 2
32
+ # training camera fovy
33
+ fovy: 49.1 # align with zero123 rendering setting (ref: https://github.com/cvlab-columbia/zero123/blob/main/objaverse-rendering/scripts/blender_script.py#L61
34
+ # checkpoint to load for stage 1 (should be a ply file)
35
+ load:
36
+ # whether allow geom training in stage 2
37
+ train_geo: False
38
+ # prob to invert background color during training (0 = always black, 1 = always white)
39
+ invert_bg_prob: 0.5
40
+
41
+
42
+ ### GUI
43
+ gui: False
44
+ force_cuda_rast: False
45
+ # GUI resolution
46
+ H: 800
47
+ W: 800
48
+
49
+ ### Gaussian splatting
50
+ num_pts: 5000
51
+ sh_degree: 0
52
+ position_lr_init: 0.001
53
+ position_lr_final: 0.00002
54
+ position_lr_delay_mult: 0.02
55
+ position_lr_max_steps: 500
56
+ feature_lr: 0.01
57
+ opacity_lr: 0.05
58
+ scaling_lr: 0.005
59
+ rotation_lr: 0.005
60
+ percent_dense: 0.1
61
+ density_start_iter: 100
62
+ density_end_iter: 3000
63
+ densification_interval: 100
64
+ opacity_reset_interval: 700
65
+ densify_grad_threshold: 0.5
66
+
67
+ ### Textured Mesh
68
+ geom_lr: 0.0001
69
+ texture_lr: 0.2
configs/text.yaml ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Input
2
+ # input rgba image path (default to None, can be load in GUI too)
3
+ input:
4
+ # input text prompt (default to None, can be input in GUI too)
5
+ prompt:
6
+ # input mesh for stage 2 (auto-search from stage 1 output path if None)
7
+ mesh:
8
+ # estimated elevation angle for input image
9
+ elevation: 0
10
+ # reference image resolution
11
+ ref_size: 256
12
+ # density thresh for mesh extraction
13
+ density_thresh: 1
14
+
15
+ ### Output
16
+ outdir: logs
17
+ mesh_format: obj
18
+ save_path: ???
19
+
20
+ ### Training
21
+ # guidance loss weights (0 to disable)
22
+ lambda_sd: 1
23
+ lambda_zero123: 0
24
+ # training batch size per iter
25
+ batch_size: 1
26
+ # training iterations for stage 1
27
+ iters: 500
28
+ # training iterations for stage 2
29
+ iters_refine: 50
30
+ # training camera radius
31
+ radius: 2.5
32
+ # training camera fovy
33
+ fovy: 49.1
34
+ # checkpoint to load for stage 1 (should be a ply file)
35
+ load:
36
+ # whether allow geom training in stage 2
37
+ train_geo: False
38
+ # prob to invert background color during training (0 = always black, 1 = always white)
39
+ invert_bg_prob: 0.5
40
+
41
+ ### GUI
42
+ gui: False
43
+ force_cuda_rast: False
44
+ # GUI resolution
45
+ H: 800
46
+ W: 800
47
+
48
+ ### Gaussian splatting
49
+ num_pts: 1000
50
+ sh_degree: 0
51
+ position_lr_init: 0.001
52
+ position_lr_final: 0.00002
53
+ position_lr_delay_mult: 0.02
54
+ position_lr_max_steps: 500
55
+ feature_lr: 0.01
56
+ opacity_lr: 0.05
57
+ scaling_lr: 0.005
58
+ rotation_lr: 0.005
59
+ percent_dense: 0.1
60
+ density_start_iter: 100
61
+ density_end_iter: 3000
62
+ densification_interval: 50
63
+ opacity_reset_interval: 700
64
+ densify_grad_threshold: 0.01
65
+
66
+ ### Textured Mesh
67
+ geom_lr: 0.0001
68
+ texture_lr: 0.2
data/anya_rgba.png ADDED

Git LFS Details

  • SHA256: b8c3e8fe7fb51c4ae7f8b561e3780a50f1f25a9cb8c838d7fce4b38d773473f8
  • Pointer size: 130 Bytes
  • Size of remote file: 32.9 kB
data/catstatue_rgba.png ADDED

Git LFS Details

  • SHA256: 6a571efb23ff05f92d7363d32a4027c08137d84e9bde863c7dfca5086bd3005d
  • Pointer size: 130 Bytes
  • Size of remote file: 45.5 kB
data/csm_luigi_rgba.png ADDED

Git LFS Details

  • SHA256: 538fd1c3d1be3f0ef0cbdbf60d3e77821cb304dd68e3fbd62229191d5d050186
  • Pointer size: 130 Bytes
  • Size of remote file: 35.4 kB
data/test.png ADDED

Git LFS Details

  • SHA256: 479f4fa9a5d2fcbf81240533f347a0d080050162757702317c8d7e06401bb958
  • Pointer size: 132 Bytes
  • Size of remote file: 1.05 MB
data/zelda_rgba.png ADDED

Git LFS Details

  • SHA256: b5e5004f1c64cbb9aceaf47c3594cfb89dfee64fbdf1a5a10faa5f51e87f0c4f
  • Pointer size: 130 Bytes
  • Size of remote file: 44.9 kB
grid_put.py ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn.functional as F
3
+
4
+ def stride_from_shape(shape):
5
+ stride = [1]
6
+ for x in reversed(shape[1:]):
7
+ stride.append(stride[-1] * x)
8
+ return list(reversed(stride))
9
+
10
+
11
+ def scatter_add_nd(input, indices, values):
12
+ # input: [..., C], D dimension + C channel
13
+ # indices: [N, D], long
14
+ # values: [N, C]
15
+
16
+ D = indices.shape[-1]
17
+ C = input.shape[-1]
18
+ size = input.shape[:-1]
19
+ stride = stride_from_shape(size)
20
+
21
+ assert len(size) == D
22
+
23
+ input = input.view(-1, C) # [HW, C]
24
+ flatten_indices = (indices * torch.tensor(stride, dtype=torch.long, device=indices.device)).sum(-1) # [N]
25
+
26
+ input.scatter_add_(0, flatten_indices.unsqueeze(1).repeat(1, C), values)
27
+
28
+ return input.view(*size, C)
29
+
30
+
31
+ def scatter_add_nd_with_count(input, count, indices, values, weights=None):
32
+ # input: [..., C], D dimension + C channel
33
+ # count: [..., 1], D dimension
34
+ # indices: [N, D], long
35
+ # values: [N, C]
36
+
37
+ D = indices.shape[-1]
38
+ C = input.shape[-1]
39
+ size = input.shape[:-1]
40
+ stride = stride_from_shape(size)
41
+
42
+ assert len(size) == D
43
+
44
+ input = input.view(-1, C) # [HW, C]
45
+ count = count.view(-1, 1)
46
+
47
+ flatten_indices = (indices * torch.tensor(stride, dtype=torch.long, device=indices.device)).sum(-1) # [N]
48
+
49
+ if weights is None:
50
+ weights = torch.ones_like(values[..., :1])
51
+
52
+ input.scatter_add_(0, flatten_indices.unsqueeze(1).repeat(1, C), values)
53
+ count.scatter_add_(0, flatten_indices.unsqueeze(1), weights)
54
+
55
+ return input.view(*size, C), count.view(*size, 1)
56
+
57
+ def nearest_grid_put_2d(H, W, coords, values, return_count=False):
58
+ # coords: [N, 2], float in [-1, 1]
59
+ # values: [N, C]
60
+
61
+ C = values.shape[-1]
62
+
63
+ indices = (coords * 0.5 + 0.5) * torch.tensor(
64
+ [H - 1, W - 1], dtype=torch.float32, device=coords.device
65
+ )
66
+ indices = indices.round().long() # [N, 2]
67
+
68
+ result = torch.zeros(H, W, C, device=values.device, dtype=values.dtype) # [H, W, C]
69
+ count = torch.zeros(H, W, 1, device=values.device, dtype=values.dtype) # [H, W, 1]
70
+ weights = torch.ones_like(values[..., :1]) # [N, 1]
71
+
72
+ result, count = scatter_add_nd_with_count(result, count, indices, values, weights)
73
+
74
+ if return_count:
75
+ return result, count
76
+
77
+ mask = (count.squeeze(-1) > 0)
78
+ result[mask] = result[mask] / count[mask].repeat(1, C)
79
+
80
+ return result
81
+
82
+
83
+ def linear_grid_put_2d(H, W, coords, values, return_count=False):
84
+ # coords: [N, 2], float in [-1, 1]
85
+ # values: [N, C]
86
+
87
+ C = values.shape[-1]
88
+
89
+ indices = (coords * 0.5 + 0.5) * torch.tensor(
90
+ [H - 1, W - 1], dtype=torch.float32, device=coords.device
91
+ )
92
+ indices_00 = indices.floor().long() # [N, 2]
93
+ indices_00[:, 0].clamp_(0, H - 2)
94
+ indices_00[:, 1].clamp_(0, W - 2)
95
+ indices_01 = indices_00 + torch.tensor(
96
+ [0, 1], dtype=torch.long, device=indices.device
97
+ )
98
+ indices_10 = indices_00 + torch.tensor(
99
+ [1, 0], dtype=torch.long, device=indices.device
100
+ )
101
+ indices_11 = indices_00 + torch.tensor(
102
+ [1, 1], dtype=torch.long, device=indices.device
103
+ )
104
+
105
+ h = indices[..., 0] - indices_00[..., 0].float()
106
+ w = indices[..., 1] - indices_00[..., 1].float()
107
+ w_00 = (1 - h) * (1 - w)
108
+ w_01 = (1 - h) * w
109
+ w_10 = h * (1 - w)
110
+ w_11 = h * w
111
+
112
+ result = torch.zeros(H, W, C, device=values.device, dtype=values.dtype) # [H, W, C]
113
+ count = torch.zeros(H, W, 1, device=values.device, dtype=values.dtype) # [H, W, 1]
114
+ weights = torch.ones_like(values[..., :1]) # [N, 1]
115
+
116
+ result, count = scatter_add_nd_with_count(result, count, indices_00, values * w_00.unsqueeze(1), weights* w_00.unsqueeze(1))
117
+ result, count = scatter_add_nd_with_count(result, count, indices_01, values * w_01.unsqueeze(1), weights* w_01.unsqueeze(1))
118
+ result, count = scatter_add_nd_with_count(result, count, indices_10, values * w_10.unsqueeze(1), weights* w_10.unsqueeze(1))
119
+ result, count = scatter_add_nd_with_count(result, count, indices_11, values * w_11.unsqueeze(1), weights* w_11.unsqueeze(1))
120
+
121
+ if return_count:
122
+ return result, count
123
+
124
+ mask = (count.squeeze(-1) > 0)
125
+ result[mask] = result[mask] / count[mask].repeat(1, C)
126
+
127
+ return result
128
+
129
+ def mipmap_linear_grid_put_2d(H, W, coords, values, min_resolution=32, return_count=False):
130
+ # coords: [N, 2], float in [-1, 1]
131
+ # values: [N, C]
132
+
133
+ C = values.shape[-1]
134
+
135
+ result = torch.zeros(H, W, C, device=values.device, dtype=values.dtype) # [H, W, C]
136
+ count = torch.zeros(H, W, 1, device=values.device, dtype=values.dtype) # [H, W, 1]
137
+
138
+ cur_H, cur_W = H, W
139
+
140
+ while min(cur_H, cur_W) > min_resolution:
141
+
142
+ # try to fill the holes
143
+ mask = (count.squeeze(-1) == 0)
144
+ if not mask.any():
145
+ break
146
+
147
+ cur_result, cur_count = linear_grid_put_2d(cur_H, cur_W, coords, values, return_count=True)
148
+ result[mask] = result[mask] + F.interpolate(cur_result.permute(2,0,1).unsqueeze(0).contiguous(), (H, W), mode='bilinear', align_corners=False).squeeze(0).permute(1,2,0).contiguous()[mask]
149
+ count[mask] = count[mask] + F.interpolate(cur_count.view(1, 1, cur_H, cur_W), (H, W), mode='bilinear', align_corners=False).view(H, W, 1)[mask]
150
+ cur_H //= 2
151
+ cur_W //= 2
152
+
153
+ if return_count:
154
+ return result, count
155
+
156
+ mask = (count.squeeze(-1) > 0)
157
+ result[mask] = result[mask] / count[mask].repeat(1, C)
158
+
159
+ return result
160
+
161
+ def nearest_grid_put_3d(H, W, D, coords, values, return_count=False):
162
+ # coords: [N, 3], float in [-1, 1]
163
+ # values: [N, C]
164
+
165
+ C = values.shape[-1]
166
+
167
+ indices = (coords * 0.5 + 0.5) * torch.tensor(
168
+ [H - 1, W - 1, D - 1], dtype=torch.float32, device=coords.device
169
+ )
170
+ indices = indices.round().long() # [N, 2]
171
+
172
+ result = torch.zeros(H, W, D, C, device=values.device, dtype=values.dtype) # [H, W, C]
173
+ count = torch.zeros(H, W, D, 1, device=values.device, dtype=values.dtype) # [H, W, 1]
174
+ weights = torch.ones_like(values[..., :1]) # [N, 1]
175
+
176
+ result, count = scatter_add_nd_with_count(result, count, indices, values, weights)
177
+
178
+ if return_count:
179
+ return result, count
180
+
181
+ mask = (count.squeeze(-1) > 0)
182
+ result[mask] = result[mask] / count[mask].repeat(1, C)
183
+
184
+ return result
185
+
186
+
187
+ def linear_grid_put_3d(H, W, D, coords, values, return_count=False):
188
+ # coords: [N, 3], float in [-1, 1]
189
+ # values: [N, C]
190
+
191
+ C = values.shape[-1]
192
+
193
+ indices = (coords * 0.5 + 0.5) * torch.tensor(
194
+ [H - 1, W - 1, D - 1], dtype=torch.float32, device=coords.device
195
+ )
196
+ indices_000 = indices.floor().long() # [N, 3]
197
+ indices_000[:, 0].clamp_(0, H - 2)
198
+ indices_000[:, 1].clamp_(0, W - 2)
199
+ indices_000[:, 2].clamp_(0, D - 2)
200
+
201
+ indices_001 = indices_000 + torch.tensor([0, 0, 1], dtype=torch.long, device=indices.device)
202
+ indices_010 = indices_000 + torch.tensor([0, 1, 0], dtype=torch.long, device=indices.device)
203
+ indices_011 = indices_000 + torch.tensor([0, 1, 1], dtype=torch.long, device=indices.device)
204
+ indices_100 = indices_000 + torch.tensor([1, 0, 0], dtype=torch.long, device=indices.device)
205
+ indices_101 = indices_000 + torch.tensor([1, 0, 1], dtype=torch.long, device=indices.device)
206
+ indices_110 = indices_000 + torch.tensor([1, 1, 0], dtype=torch.long, device=indices.device)
207
+ indices_111 = indices_000 + torch.tensor([1, 1, 1], dtype=torch.long, device=indices.device)
208
+
209
+ h = indices[..., 0] - indices_000[..., 0].float()
210
+ w = indices[..., 1] - indices_000[..., 1].float()
211
+ d = indices[..., 2] - indices_000[..., 2].float()
212
+
213
+ w_000 = (1 - h) * (1 - w) * (1 - d)
214
+ w_001 = (1 - h) * w * (1 - d)
215
+ w_010 = h * (1 - w) * (1 - d)
216
+ w_011 = h * w * (1 - d)
217
+ w_100 = (1 - h) * (1 - w) * d
218
+ w_101 = (1 - h) * w * d
219
+ w_110 = h * (1 - w) * d
220
+ w_111 = h * w * d
221
+
222
+ result = torch.zeros(H, W, D, C, device=values.device, dtype=values.dtype) # [H, W, D, C]
223
+ count = torch.zeros(H, W, D, 1, device=values.device, dtype=values.dtype) # [H, W, D, 1]
224
+ weights = torch.ones_like(values[..., :1]) # [N, 1]
225
+
226
+ result, count = scatter_add_nd_with_count(result, count, indices_000, values * w_000.unsqueeze(1), weights * w_000.unsqueeze(1))
227
+ result, count = scatter_add_nd_with_count(result, count, indices_001, values * w_001.unsqueeze(1), weights * w_001.unsqueeze(1))
228
+ result, count = scatter_add_nd_with_count(result, count, indices_010, values * w_010.unsqueeze(1), weights * w_010.unsqueeze(1))
229
+ result, count = scatter_add_nd_with_count(result, count, indices_011, values * w_011.unsqueeze(1), weights * w_011.unsqueeze(1))
230
+ result, count = scatter_add_nd_with_count(result, count, indices_100, values * w_100.unsqueeze(1), weights * w_100.unsqueeze(1))
231
+ result, count = scatter_add_nd_with_count(result, count, indices_101, values * w_101.unsqueeze(1), weights * w_101.unsqueeze(1))
232
+ result, count = scatter_add_nd_with_count(result, count, indices_110, values * w_110.unsqueeze(1), weights * w_110.unsqueeze(1))
233
+ result, count = scatter_add_nd_with_count(result, count, indices_111, values * w_111.unsqueeze(1), weights * w_111.unsqueeze(1))
234
+
235
+ if return_count:
236
+ return result, count
237
+
238
+ mask = (count.squeeze(-1) > 0)
239
+ result[mask] = result[mask] / count[mask].repeat(1, C)
240
+
241
+ return result
242
+
243
+ def mipmap_linear_grid_put_3d(H, W, D, coords, values, min_resolution=32, return_count=False):
244
+ # coords: [N, 3], float in [-1, 1]
245
+ # values: [N, C]
246
+
247
+ C = values.shape[-1]
248
+
249
+ result = torch.zeros(H, W, D, C, device=values.device, dtype=values.dtype) # [H, W, D, C]
250
+ count = torch.zeros(H, W, D, 1, device=values.device, dtype=values.dtype) # [H, W, D, 1]
251
+ cur_H, cur_W, cur_D = H, W, D
252
+
253
+ while min(min(cur_H, cur_W), cur_D) > min_resolution:
254
+
255
+ # try to fill the holes
256
+ mask = (count.squeeze(-1) == 0)
257
+ if not mask.any():
258
+ break
259
+
260
+ cur_result, cur_count = linear_grid_put_3d(cur_H, cur_W, cur_D, coords, values, return_count=True)
261
+ result[mask] = result[mask] + F.interpolate(cur_result.permute(3,0,1,2).unsqueeze(0).contiguous(), (H, W, D), mode='trilinear', align_corners=False).squeeze(0).permute(1,2,3,0).contiguous()[mask]
262
+ count[mask] = count[mask] + F.interpolate(cur_count.view(1, 1, cur_H, cur_W, cur_D), (H, W, D), mode='trilinear', align_corners=False).view(H, W, D, 1)[mask]
263
+ cur_H //= 2
264
+ cur_W //= 2
265
+ cur_D //= 2
266
+
267
+ if return_count:
268
+ return result, count
269
+
270
+ mask = (count.squeeze(-1) > 0)
271
+ result[mask] = result[mask] / count[mask].repeat(1, C)
272
+
273
+ return result
274
+
275
+
276
+ def grid_put(shape, coords, values, mode='linear-mipmap', min_resolution=32, return_raw=False):
277
+ # shape: [D], list/tuple
278
+ # coords: [N, D], float in [-1, 1]
279
+ # values: [N, C]
280
+
281
+ D = len(shape)
282
+ assert D in [2, 3], f'only support D == 2 or 3, but got D == {D}'
283
+
284
+ if mode == 'nearest':
285
+ if D == 2:
286
+ return nearest_grid_put_2d(*shape, coords, values, return_raw)
287
+ else:
288
+ return nearest_grid_put_3d(*shape, coords, values, return_raw)
289
+ elif mode == 'linear':
290
+ if D == 2:
291
+ return linear_grid_put_2d(*shape, coords, values, return_raw)
292
+ else:
293
+ return linear_grid_put_3d(*shape, coords, values, return_raw)
294
+ elif mode == 'linear-mipmap':
295
+ if D == 2:
296
+ return mipmap_linear_grid_put_2d(*shape, coords, values, min_resolution, return_raw)
297
+ else:
298
+ return mipmap_linear_grid_put_3d(*shape, coords, values, min_resolution, return_raw)
299
+ else:
300
+ raise NotImplementedError(f"got mode {mode}")
gs_renderer.py ADDED
@@ -0,0 +1,820 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import math
3
+ import numpy as np
4
+ from typing import NamedTuple
5
+ from plyfile import PlyData, PlyElement
6
+
7
+ import torch
8
+ from torch import nn
9
+
10
+ from diff_gaussian_rasterization import (
11
+ GaussianRasterizationSettings,
12
+ GaussianRasterizer,
13
+ )
14
+ from simple_knn._C import distCUDA2
15
+
16
+ from sh_utils import eval_sh, SH2RGB, RGB2SH
17
+ from mesh import Mesh
18
+ from mesh_utils import decimate_mesh, clean_mesh
19
+
20
+ import kiui
21
+
22
+ def inverse_sigmoid(x):
23
+ return torch.log(x/(1-x))
24
+
25
+ def get_expon_lr_func(
26
+ lr_init, lr_final, lr_delay_steps=0, lr_delay_mult=1.0, max_steps=1000000
27
+ ):
28
+
29
+ def helper(step):
30
+ if lr_init == lr_final:
31
+ # constant lr, ignore other params
32
+ return lr_init
33
+ if step < 0 or (lr_init == 0.0 and lr_final == 0.0):
34
+ # Disable this parameter
35
+ return 0.0
36
+ if lr_delay_steps > 0:
37
+ # A kind of reverse cosine decay.
38
+ delay_rate = lr_delay_mult + (1 - lr_delay_mult) * np.sin(
39
+ 0.5 * np.pi * np.clip(step / lr_delay_steps, 0, 1)
40
+ )
41
+ else:
42
+ delay_rate = 1.0
43
+ t = np.clip(step / max_steps, 0, 1)
44
+ log_lerp = np.exp(np.log(lr_init) * (1 - t) + np.log(lr_final) * t)
45
+ return delay_rate * log_lerp
46
+
47
+ return helper
48
+
49
+
50
+ def strip_lowerdiag(L):
51
+ uncertainty = torch.zeros((L.shape[0], 6), dtype=torch.float, device="cuda")
52
+
53
+ uncertainty[:, 0] = L[:, 0, 0]
54
+ uncertainty[:, 1] = L[:, 0, 1]
55
+ uncertainty[:, 2] = L[:, 0, 2]
56
+ uncertainty[:, 3] = L[:, 1, 1]
57
+ uncertainty[:, 4] = L[:, 1, 2]
58
+ uncertainty[:, 5] = L[:, 2, 2]
59
+ return uncertainty
60
+
61
+ def strip_symmetric(sym):
62
+ return strip_lowerdiag(sym)
63
+
64
+ def gaussian_3d_coeff(xyzs, covs):
65
+ # xyzs: [N, 3]
66
+ # covs: [N, 6]
67
+ x, y, z = xyzs[:, 0], xyzs[:, 1], xyzs[:, 2]
68
+ a, b, c, d, e, f = covs[:, 0], covs[:, 1], covs[:, 2], covs[:, 3], covs[:, 4], covs[:, 5]
69
+
70
+ # eps must be small enough !!!
71
+ inv_det = 1 / (a * d * f + 2 * e * c * b - e**2 * a - c**2 * d - b**2 * f + 1e-24)
72
+ inv_a = (d * f - e**2) * inv_det
73
+ inv_b = (e * c - b * f) * inv_det
74
+ inv_c = (e * b - c * d) * inv_det
75
+ inv_d = (a * f - c**2) * inv_det
76
+ inv_e = (b * c - e * a) * inv_det
77
+ inv_f = (a * d - b**2) * inv_det
78
+
79
+ power = -0.5 * (x**2 * inv_a + y**2 * inv_d + z**2 * inv_f) - x * y * inv_b - x * z * inv_c - y * z * inv_e
80
+
81
+ power[power > 0] = -1e10 # abnormal values... make weights 0
82
+
83
+ return torch.exp(power)
84
+
85
+ def build_rotation(r):
86
+ norm = torch.sqrt(r[:,0]*r[:,0] + r[:,1]*r[:,1] + r[:,2]*r[:,2] + r[:,3]*r[:,3])
87
+
88
+ q = r / norm[:, None]
89
+
90
+ R = torch.zeros((q.size(0), 3, 3), device='cuda')
91
+
92
+ r = q[:, 0]
93
+ x = q[:, 1]
94
+ y = q[:, 2]
95
+ z = q[:, 3]
96
+
97
+ R[:, 0, 0] = 1 - 2 * (y*y + z*z)
98
+ R[:, 0, 1] = 2 * (x*y - r*z)
99
+ R[:, 0, 2] = 2 * (x*z + r*y)
100
+ R[:, 1, 0] = 2 * (x*y + r*z)
101
+ R[:, 1, 1] = 1 - 2 * (x*x + z*z)
102
+ R[:, 1, 2] = 2 * (y*z - r*x)
103
+ R[:, 2, 0] = 2 * (x*z - r*y)
104
+ R[:, 2, 1] = 2 * (y*z + r*x)
105
+ R[:, 2, 2] = 1 - 2 * (x*x + y*y)
106
+ return R
107
+
108
+ def build_scaling_rotation(s, r):
109
+ L = torch.zeros((s.shape[0], 3, 3), dtype=torch.float, device="cuda")
110
+ R = build_rotation(r)
111
+
112
+ L[:,0,0] = s[:,0]
113
+ L[:,1,1] = s[:,1]
114
+ L[:,2,2] = s[:,2]
115
+
116
+ L = R @ L
117
+ return L
118
+
119
+ class BasicPointCloud(NamedTuple):
120
+ points: np.array
121
+ colors: np.array
122
+ normals: np.array
123
+
124
+
125
+ class GaussianModel:
126
+
127
+ def setup_functions(self):
128
+ def build_covariance_from_scaling_rotation(scaling, scaling_modifier, rotation):
129
+ L = build_scaling_rotation(scaling_modifier * scaling, rotation)
130
+ actual_covariance = L @ L.transpose(1, 2)
131
+ symm = strip_symmetric(actual_covariance)
132
+ return symm
133
+
134
+ self.scaling_activation = torch.exp
135
+ self.scaling_inverse_activation = torch.log
136
+
137
+ self.covariance_activation = build_covariance_from_scaling_rotation
138
+
139
+ self.opacity_activation = torch.sigmoid
140
+ self.inverse_opacity_activation = inverse_sigmoid
141
+
142
+ self.rotation_activation = torch.nn.functional.normalize
143
+
144
+
145
+ def __init__(self, sh_degree : int):
146
+ self.active_sh_degree = 0
147
+ self.max_sh_degree = sh_degree
148
+ self._xyz = torch.empty(0)
149
+ self._features_dc = torch.empty(0)
150
+ self._features_rest = torch.empty(0)
151
+ self._scaling = torch.empty(0)
152
+ self._rotation = torch.empty(0)
153
+ self._opacity = torch.empty(0)
154
+ self.max_radii2D = torch.empty(0)
155
+ self.xyz_gradient_accum = torch.empty(0)
156
+ self.denom = torch.empty(0)
157
+ self.optimizer = None
158
+ self.percent_dense = 0
159
+ self.spatial_lr_scale = 0
160
+ self.setup_functions()
161
+
162
+ def capture(self):
163
+ return (
164
+ self.active_sh_degree,
165
+ self._xyz,
166
+ self._features_dc,
167
+ self._features_rest,
168
+ self._scaling,
169
+ self._rotation,
170
+ self._opacity,
171
+ self.max_radii2D,
172
+ self.xyz_gradient_accum,
173
+ self.denom,
174
+ self.optimizer.state_dict(),
175
+ self.spatial_lr_scale,
176
+ )
177
+
178
+ def restore(self, model_args, training_args):
179
+ (self.active_sh_degree,
180
+ self._xyz,
181
+ self._features_dc,
182
+ self._features_rest,
183
+ self._scaling,
184
+ self._rotation,
185
+ self._opacity,
186
+ self.max_radii2D,
187
+ xyz_gradient_accum,
188
+ denom,
189
+ opt_dict,
190
+ self.spatial_lr_scale) = model_args
191
+ self.training_setup(training_args)
192
+ self.xyz_gradient_accum = xyz_gradient_accum
193
+ self.denom = denom
194
+ self.optimizer.load_state_dict(opt_dict)
195
+
196
+ @property
197
+ def get_scaling(self):
198
+ return self.scaling_activation(self._scaling)
199
+
200
+ @property
201
+ def get_rotation(self):
202
+ return self.rotation_activation(self._rotation)
203
+
204
+ @property
205
+ def get_xyz(self):
206
+ return self._xyz
207
+
208
+ @property
209
+ def get_features(self):
210
+ features_dc = self._features_dc
211
+ features_rest = self._features_rest
212
+ return torch.cat((features_dc, features_rest), dim=1)
213
+
214
+ @property
215
+ def get_opacity(self):
216
+ return self.opacity_activation(self._opacity)
217
+
218
+ @torch.no_grad()
219
+ def extract_fields(self, resolution=128, num_blocks=16, relax_ratio=1.5):
220
+ # resolution: resolution of field
221
+
222
+ block_size = 2 / num_blocks
223
+
224
+ assert resolution % block_size == 0
225
+ split_size = resolution // num_blocks
226
+
227
+ opacities = self.get_opacity
228
+
229
+ # pre-filter low opacity gaussians to save computation
230
+ mask = (opacities > 0.005).squeeze(1)
231
+
232
+ opacities = opacities[mask]
233
+ xyzs = self.get_xyz[mask]
234
+ stds = self.get_scaling[mask]
235
+
236
+ # normalize to ~ [-1, 1]
237
+ mn, mx = xyzs.amin(0), xyzs.amax(0)
238
+ self.center = (mn + mx) / 2
239
+ self.scale = 1.8 / (mx - mn).amax().item()
240
+
241
+ xyzs = (xyzs - self.center) * self.scale
242
+ stds = stds * self.scale
243
+
244
+ covs = self.covariance_activation(stds, 1, self._rotation[mask])
245
+
246
+ # tile
247
+ device = opacities.device
248
+ occ = torch.zeros([resolution] * 3, dtype=torch.float32, device=device)
249
+
250
+ X = torch.linspace(-1, 1, resolution).split(split_size)
251
+ Y = torch.linspace(-1, 1, resolution).split(split_size)
252
+ Z = torch.linspace(-1, 1, resolution).split(split_size)
253
+
254
+
255
+ # loop blocks (assume max size of gaussian is small than relax_ratio * block_size !!!)
256
+ for xi, xs in enumerate(X):
257
+ for yi, ys in enumerate(Y):
258
+ for zi, zs in enumerate(Z):
259
+ xx, yy, zz = torch.meshgrid(xs, ys, zs)
260
+ # sample points [M, 3]
261
+ pts = torch.cat([xx.reshape(-1, 1), yy.reshape(-1, 1), zz.reshape(-1, 1)], dim=-1).to(device)
262
+ # in-tile gaussians mask
263
+ vmin, vmax = pts.amin(0), pts.amax(0)
264
+ vmin -= block_size * relax_ratio
265
+ vmax += block_size * relax_ratio
266
+ mask = (xyzs < vmax).all(-1) & (xyzs > vmin).all(-1)
267
+ # if hit no gaussian, continue to next block
268
+ if not mask.any():
269
+ continue
270
+ mask_xyzs = xyzs[mask] # [L, 3]
271
+ mask_covs = covs[mask] # [L, 6]
272
+ mask_opas = opacities[mask].view(1, -1) # [L, 1] --> [1, L]
273
+
274
+ # query per point-gaussian pair.
275
+ g_pts = pts.unsqueeze(1).repeat(1, mask_covs.shape[0], 1) - mask_xyzs.unsqueeze(0) # [M, L, 3]
276
+ g_covs = mask_covs.unsqueeze(0).repeat(pts.shape[0], 1, 1) # [M, L, 6]
277
+
278
+ # batch on gaussian to avoid OOM
279
+ batch_g = 1024
280
+ val = 0
281
+ for start in range(0, g_covs.shape[1], batch_g):
282
+ end = min(start + batch_g, g_covs.shape[1])
283
+ w = gaussian_3d_coeff(g_pts[:, start:end].reshape(-1, 3), g_covs[:, start:end].reshape(-1, 6)).reshape(pts.shape[0], -1) # [M, l]
284
+ val += (mask_opas[:, start:end] * w).sum(-1)
285
+
286
+ # kiui.lo(val, mask_opas, w)
287
+
288
+ occ[xi * split_size: xi * split_size + len(xs),
289
+ yi * split_size: yi * split_size + len(ys),
290
+ zi * split_size: zi * split_size + len(zs)] = val.reshape(len(xs), len(ys), len(zs))
291
+
292
+ kiui.lo(occ, verbose=1)
293
+
294
+ return occ
295
+
296
+ def extract_mesh(self, path, density_thresh=1, resolution=128, decimate_target=1e5):
297
+
298
+ os.makedirs(os.path.dirname(path), exist_ok=True)
299
+
300
+ occ = self.extract_fields(resolution).detach().cpu().numpy()
301
+
302
+ import mcubes
303
+ vertices, triangles = mcubes.marching_cubes(occ, density_thresh)
304
+ vertices = vertices / (resolution - 1.0) * 2 - 1
305
+
306
+ # transform back to the original space
307
+ vertices = vertices / self.scale + self.center.detach().cpu().numpy()
308
+
309
+ vertices, triangles = clean_mesh(vertices, triangles, remesh=True, remesh_size=0.015)
310
+ if decimate_target > 0 and triangles.shape[0] > decimate_target:
311
+ vertices, triangles = decimate_mesh(vertices, triangles, decimate_target)
312
+
313
+ v = torch.from_numpy(vertices.astype(np.float32)).contiguous().cuda()
314
+ f = torch.from_numpy(triangles.astype(np.int32)).contiguous().cuda()
315
+
316
+ print(
317
+ f"[INFO] marching cubes result: {v.shape} ({v.min().item()}-{v.max().item()}), {f.shape}"
318
+ )
319
+
320
+ mesh = Mesh(v=v, f=f, device='cuda')
321
+
322
+ return mesh
323
+
324
+ def get_covariance(self, scaling_modifier = 1):
325
+ return self.covariance_activation(self.get_scaling, scaling_modifier, self._rotation)
326
+
327
+ def oneupSHdegree(self):
328
+ if self.active_sh_degree < self.max_sh_degree:
329
+ self.active_sh_degree += 1
330
+
331
+ def create_from_pcd(self, pcd : BasicPointCloud, spatial_lr_scale : float = 1):
332
+ self.spatial_lr_scale = spatial_lr_scale
333
+ fused_point_cloud = torch.tensor(np.asarray(pcd.points)).float().cuda()
334
+ fused_color = RGB2SH(torch.tensor(np.asarray(pcd.colors)).float().cuda())
335
+ features = torch.zeros((fused_color.shape[0], 3, (self.max_sh_degree + 1) ** 2)).float().cuda()
336
+ features[:, :3, 0 ] = fused_color
337
+ features[:, 3:, 1:] = 0.0
338
+
339
+ print("Number of points at initialisation : ", fused_point_cloud.shape[0])
340
+
341
+ dist2 = torch.clamp_min(distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()), 0.0000001)
342
+ scales = torch.log(torch.sqrt(dist2))[...,None].repeat(1, 3)
343
+ rots = torch.zeros((fused_point_cloud.shape[0], 4), device="cuda")
344
+ rots[:, 0] = 1
345
+
346
+ opacities = inverse_sigmoid(0.1 * torch.ones((fused_point_cloud.shape[0], 1), dtype=torch.float, device="cuda"))
347
+
348
+ self._xyz = nn.Parameter(fused_point_cloud.requires_grad_(True))
349
+ self._features_dc = nn.Parameter(features[:,:,0:1].transpose(1, 2).contiguous().requires_grad_(True))
350
+ self._features_rest = nn.Parameter(features[:,:,1:].transpose(1, 2).contiguous().requires_grad_(True))
351
+ self._scaling = nn.Parameter(scales.requires_grad_(True))
352
+ self._rotation = nn.Parameter(rots.requires_grad_(True))
353
+ self._opacity = nn.Parameter(opacities.requires_grad_(True))
354
+ self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")
355
+
356
+ def training_setup(self, training_args):
357
+ self.percent_dense = training_args.percent_dense
358
+ self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
359
+ self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
360
+
361
+ l = [
362
+ {'params': [self._xyz], 'lr': training_args.position_lr_init * self.spatial_lr_scale, "name": "xyz"},
363
+ {'params': [self._features_dc], 'lr': training_args.feature_lr, "name": "f_dc"},
364
+ {'params': [self._features_rest], 'lr': training_args.feature_lr / 20.0, "name": "f_rest"},
365
+ {'params': [self._opacity], 'lr': training_args.opacity_lr, "name": "opacity"},
366
+ {'params': [self._scaling], 'lr': training_args.scaling_lr, "name": "scaling"},
367
+ {'params': [self._rotation], 'lr': training_args.rotation_lr, "name": "rotation"}
368
+ ]
369
+
370
+ self.optimizer = torch.optim.Adam(l, lr=0.0, eps=1e-15)
371
+ self.xyz_scheduler_args = get_expon_lr_func(lr_init=training_args.position_lr_init*self.spatial_lr_scale,
372
+ lr_final=training_args.position_lr_final*self.spatial_lr_scale,
373
+ lr_delay_mult=training_args.position_lr_delay_mult,
374
+ max_steps=training_args.position_lr_max_steps)
375
+
376
+ def update_learning_rate(self, iteration):
377
+ ''' Learning rate scheduling per step '''
378
+ for param_group in self.optimizer.param_groups:
379
+ if param_group["name"] == "xyz":
380
+ lr = self.xyz_scheduler_args(iteration)
381
+ param_group['lr'] = lr
382
+ return lr
383
+
384
+ def construct_list_of_attributes(self):
385
+ l = ['x', 'y', 'z', 'nx', 'ny', 'nz']
386
+ # All channels except the 3 DC
387
+ for i in range(self._features_dc.shape[1]*self._features_dc.shape[2]):
388
+ l.append('f_dc_{}'.format(i))
389
+ for i in range(self._features_rest.shape[1]*self._features_rest.shape[2]):
390
+ l.append('f_rest_{}'.format(i))
391
+ l.append('opacity')
392
+ for i in range(self._scaling.shape[1]):
393
+ l.append('scale_{}'.format(i))
394
+ for i in range(self._rotation.shape[1]):
395
+ l.append('rot_{}'.format(i))
396
+ return l
397
+
398
+ def save_ply(self, path):
399
+ os.makedirs(os.path.dirname(path), exist_ok=True)
400
+
401
+ xyz = self._xyz.detach().cpu().numpy()
402
+ normals = np.zeros_like(xyz)
403
+ f_dc = self._features_dc.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
404
+ f_rest = self._features_rest.detach().transpose(1, 2).flatten(start_dim=1).contiguous().cpu().numpy()
405
+ opacities = self._opacity.detach().cpu().numpy()
406
+ scale = self._scaling.detach().cpu().numpy()
407
+ rotation = self._rotation.detach().cpu().numpy()
408
+
409
+ dtype_full = [(attribute, 'f4') for attribute in self.construct_list_of_attributes()]
410
+
411
+ elements = np.empty(xyz.shape[0], dtype=dtype_full)
412
+ attributes = np.concatenate((xyz, normals, f_dc, f_rest, opacities, scale, rotation), axis=1)
413
+ elements[:] = list(map(tuple, attributes))
414
+ el = PlyElement.describe(elements, 'vertex')
415
+ PlyData([el]).write(path)
416
+
417
+ def reset_opacity(self):
418
+ opacities_new = inverse_sigmoid(torch.min(self.get_opacity, torch.ones_like(self.get_opacity)*0.01))
419
+ optimizable_tensors = self.replace_tensor_to_optimizer(opacities_new, "opacity")
420
+ self._opacity = optimizable_tensors["opacity"]
421
+
422
+ def load_ply(self, path):
423
+ plydata = PlyData.read(path)
424
+
425
+ xyz = np.stack((np.asarray(plydata.elements[0]["x"]),
426
+ np.asarray(plydata.elements[0]["y"]),
427
+ np.asarray(plydata.elements[0]["z"])), axis=1)
428
+ opacities = np.asarray(plydata.elements[0]["opacity"])[..., np.newaxis]
429
+
430
+ print("Number of points at loading : ", xyz.shape[0])
431
+
432
+ features_dc = np.zeros((xyz.shape[0], 3, 1))
433
+ features_dc[:, 0, 0] = np.asarray(plydata.elements[0]["f_dc_0"])
434
+ features_dc[:, 1, 0] = np.asarray(plydata.elements[0]["f_dc_1"])
435
+ features_dc[:, 2, 0] = np.asarray(plydata.elements[0]["f_dc_2"])
436
+
437
+ extra_f_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("f_rest_")]
438
+ assert len(extra_f_names)==3*(self.max_sh_degree + 1) ** 2 - 3
439
+ features_extra = np.zeros((xyz.shape[0], len(extra_f_names)))
440
+ for idx, attr_name in enumerate(extra_f_names):
441
+ features_extra[:, idx] = np.asarray(plydata.elements[0][attr_name])
442
+ # Reshape (P,F*SH_coeffs) to (P, F, SH_coeffs except DC)
443
+ features_extra = features_extra.reshape((features_extra.shape[0], 3, (self.max_sh_degree + 1) ** 2 - 1))
444
+
445
+ scale_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("scale_")]
446
+ scales = np.zeros((xyz.shape[0], len(scale_names)))
447
+ for idx, attr_name in enumerate(scale_names):
448
+ scales[:, idx] = np.asarray(plydata.elements[0][attr_name])
449
+
450
+ rot_names = [p.name for p in plydata.elements[0].properties if p.name.startswith("rot")]
451
+ rots = np.zeros((xyz.shape[0], len(rot_names)))
452
+ for idx, attr_name in enumerate(rot_names):
453
+ rots[:, idx] = np.asarray(plydata.elements[0][attr_name])
454
+
455
+ self._xyz = nn.Parameter(torch.tensor(xyz, dtype=torch.float, device="cuda").requires_grad_(True))
456
+ self._features_dc = nn.Parameter(torch.tensor(features_dc, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
457
+ self._features_rest = nn.Parameter(torch.tensor(features_extra, dtype=torch.float, device="cuda").transpose(1, 2).contiguous().requires_grad_(True))
458
+ self._opacity = nn.Parameter(torch.tensor(opacities, dtype=torch.float, device="cuda").requires_grad_(True))
459
+ self._scaling = nn.Parameter(torch.tensor(scales, dtype=torch.float, device="cuda").requires_grad_(True))
460
+ self._rotation = nn.Parameter(torch.tensor(rots, dtype=torch.float, device="cuda").requires_grad_(True))
461
+
462
+ self.active_sh_degree = self.max_sh_degree
463
+
464
+ def replace_tensor_to_optimizer(self, tensor, name):
465
+ optimizable_tensors = {}
466
+ for group in self.optimizer.param_groups:
467
+ if group["name"] == name:
468
+ stored_state = self.optimizer.state.get(group['params'][0], None)
469
+ stored_state["exp_avg"] = torch.zeros_like(tensor)
470
+ stored_state["exp_avg_sq"] = torch.zeros_like(tensor)
471
+
472
+ del self.optimizer.state[group['params'][0]]
473
+ group["params"][0] = nn.Parameter(tensor.requires_grad_(True))
474
+ self.optimizer.state[group['params'][0]] = stored_state
475
+
476
+ optimizable_tensors[group["name"]] = group["params"][0]
477
+ return optimizable_tensors
478
+
479
+ def _prune_optimizer(self, mask):
480
+ optimizable_tensors = {}
481
+ for group in self.optimizer.param_groups:
482
+ stored_state = self.optimizer.state.get(group['params'][0], None)
483
+ if stored_state is not None:
484
+ stored_state["exp_avg"] = stored_state["exp_avg"][mask]
485
+ stored_state["exp_avg_sq"] = stored_state["exp_avg_sq"][mask]
486
+
487
+ del self.optimizer.state[group['params'][0]]
488
+ group["params"][0] = nn.Parameter((group["params"][0][mask].requires_grad_(True)))
489
+ self.optimizer.state[group['params'][0]] = stored_state
490
+
491
+ optimizable_tensors[group["name"]] = group["params"][0]
492
+ else:
493
+ group["params"][0] = nn.Parameter(group["params"][0][mask].requires_grad_(True))
494
+ optimizable_tensors[group["name"]] = group["params"][0]
495
+ return optimizable_tensors
496
+
497
+ def prune_points(self, mask):
498
+ valid_points_mask = ~mask
499
+ optimizable_tensors = self._prune_optimizer(valid_points_mask)
500
+
501
+ self._xyz = optimizable_tensors["xyz"]
502
+ self._features_dc = optimizable_tensors["f_dc"]
503
+ self._features_rest = optimizable_tensors["f_rest"]
504
+ self._opacity = optimizable_tensors["opacity"]
505
+ self._scaling = optimizable_tensors["scaling"]
506
+ self._rotation = optimizable_tensors["rotation"]
507
+
508
+ self.xyz_gradient_accum = self.xyz_gradient_accum[valid_points_mask]
509
+
510
+ self.denom = self.denom[valid_points_mask]
511
+ self.max_radii2D = self.max_radii2D[valid_points_mask]
512
+
513
+ def cat_tensors_to_optimizer(self, tensors_dict):
514
+ optimizable_tensors = {}
515
+ for group in self.optimizer.param_groups:
516
+ assert len(group["params"]) == 1
517
+ extension_tensor = tensors_dict[group["name"]]
518
+ stored_state = self.optimizer.state.get(group['params'][0], None)
519
+ if stored_state is not None:
520
+
521
+ stored_state["exp_avg"] = torch.cat((stored_state["exp_avg"], torch.zeros_like(extension_tensor)), dim=0)
522
+ stored_state["exp_avg_sq"] = torch.cat((stored_state["exp_avg_sq"], torch.zeros_like(extension_tensor)), dim=0)
523
+
524
+ del self.optimizer.state[group['params'][0]]
525
+ group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True))
526
+ self.optimizer.state[group['params'][0]] = stored_state
527
+
528
+ optimizable_tensors[group["name"]] = group["params"][0]
529
+ else:
530
+ group["params"][0] = nn.Parameter(torch.cat((group["params"][0], extension_tensor), dim=0).requires_grad_(True))
531
+ optimizable_tensors[group["name"]] = group["params"][0]
532
+
533
+ return optimizable_tensors
534
+
535
+ def densification_postfix(self, new_xyz, new_features_dc, new_features_rest, new_opacities, new_scaling, new_rotation):
536
+ d = {"xyz": new_xyz,
537
+ "f_dc": new_features_dc,
538
+ "f_rest": new_features_rest,
539
+ "opacity": new_opacities,
540
+ "scaling" : new_scaling,
541
+ "rotation" : new_rotation}
542
+
543
+ optimizable_tensors = self.cat_tensors_to_optimizer(d)
544
+ self._xyz = optimizable_tensors["xyz"]
545
+ self._features_dc = optimizable_tensors["f_dc"]
546
+ self._features_rest = optimizable_tensors["f_rest"]
547
+ self._opacity = optimizable_tensors["opacity"]
548
+ self._scaling = optimizable_tensors["scaling"]
549
+ self._rotation = optimizable_tensors["rotation"]
550
+
551
+ self.xyz_gradient_accum = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
552
+ self.denom = torch.zeros((self.get_xyz.shape[0], 1), device="cuda")
553
+ self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")
554
+
555
+ def densify_and_split(self, grads, grad_threshold, scene_extent, N=2):
556
+ n_init_points = self.get_xyz.shape[0]
557
+ # Extract points that satisfy the gradient condition
558
+ padded_grad = torch.zeros((n_init_points), device="cuda")
559
+ padded_grad[:grads.shape[0]] = grads.squeeze()
560
+ selected_pts_mask = torch.where(padded_grad >= grad_threshold, True, False)
561
+ selected_pts_mask = torch.logical_and(selected_pts_mask,
562
+ torch.max(self.get_scaling, dim=1).values > self.percent_dense*scene_extent)
563
+
564
+ stds = self.get_scaling[selected_pts_mask].repeat(N,1)
565
+ means =torch.zeros((stds.size(0), 3),device="cuda")
566
+ samples = torch.normal(mean=means, std=stds)
567
+ rots = build_rotation(self._rotation[selected_pts_mask]).repeat(N,1,1)
568
+ new_xyz = torch.bmm(rots, samples.unsqueeze(-1)).squeeze(-1) + self.get_xyz[selected_pts_mask].repeat(N, 1)
569
+ new_scaling = self.scaling_inverse_activation(self.get_scaling[selected_pts_mask].repeat(N,1) / (0.8*N))
570
+ new_rotation = self._rotation[selected_pts_mask].repeat(N,1)
571
+ new_features_dc = self._features_dc[selected_pts_mask].repeat(N,1,1)
572
+ new_features_rest = self._features_rest[selected_pts_mask].repeat(N,1,1)
573
+ new_opacity = self._opacity[selected_pts_mask].repeat(N,1)
574
+
575
+ self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_opacity, new_scaling, new_rotation)
576
+
577
+ prune_filter = torch.cat((selected_pts_mask, torch.zeros(N * selected_pts_mask.sum(), device="cuda", dtype=bool)))
578
+ self.prune_points(prune_filter)
579
+
580
+ def densify_and_clone(self, grads, grad_threshold, scene_extent):
581
+ # Extract points that satisfy the gradient condition
582
+ selected_pts_mask = torch.where(torch.norm(grads, dim=-1) >= grad_threshold, True, False)
583
+ selected_pts_mask = torch.logical_and(selected_pts_mask,
584
+ torch.max(self.get_scaling, dim=1).values <= self.percent_dense*scene_extent)
585
+
586
+ new_xyz = self._xyz[selected_pts_mask]
587
+ new_features_dc = self._features_dc[selected_pts_mask]
588
+ new_features_rest = self._features_rest[selected_pts_mask]
589
+ new_opacities = self._opacity[selected_pts_mask]
590
+ new_scaling = self._scaling[selected_pts_mask]
591
+ new_rotation = self._rotation[selected_pts_mask]
592
+
593
+ self.densification_postfix(new_xyz, new_features_dc, new_features_rest, new_opacities, new_scaling, new_rotation)
594
+
595
+ def densify_and_prune(self, max_grad, min_opacity, extent, max_screen_size):
596
+ grads = self.xyz_gradient_accum / self.denom
597
+ grads[grads.isnan()] = 0.0
598
+
599
+ self.densify_and_clone(grads, max_grad, extent)
600
+ self.densify_and_split(grads, max_grad, extent)
601
+
602
+ prune_mask = (self.get_opacity < min_opacity).squeeze()
603
+ if max_screen_size:
604
+ big_points_vs = self.max_radii2D > max_screen_size
605
+ big_points_ws = self.get_scaling.max(dim=1).values > 0.1 * extent
606
+ prune_mask = torch.logical_or(torch.logical_or(prune_mask, big_points_vs), big_points_ws)
607
+ self.prune_points(prune_mask)
608
+
609
+ torch.cuda.empty_cache()
610
+
611
+ def prune(self, min_opacity, extent, max_screen_size):
612
+
613
+ prune_mask = (self.get_opacity < min_opacity).squeeze()
614
+ if max_screen_size:
615
+ big_points_vs = self.max_radii2D > max_screen_size
616
+ big_points_ws = self.get_scaling.max(dim=1).values > 0.1 * extent
617
+ prune_mask = torch.logical_or(torch.logical_or(prune_mask, big_points_vs), big_points_ws)
618
+ self.prune_points(prune_mask)
619
+
620
+ torch.cuda.empty_cache()
621
+
622
+
623
+ def add_densification_stats(self, viewspace_point_tensor, update_filter):
624
+ self.xyz_gradient_accum[update_filter] += torch.norm(viewspace_point_tensor.grad[update_filter,:2], dim=-1, keepdim=True)
625
+ self.denom[update_filter] += 1
626
+
627
+ def getProjectionMatrix(znear, zfar, fovX, fovY):
628
+ tanHalfFovY = math.tan((fovY / 2))
629
+ tanHalfFovX = math.tan((fovX / 2))
630
+
631
+ P = torch.zeros(4, 4)
632
+
633
+ z_sign = 1.0
634
+
635
+ P[0, 0] = 1 / tanHalfFovX
636
+ P[1, 1] = 1 / tanHalfFovY
637
+ P[3, 2] = z_sign
638
+ P[2, 2] = z_sign * zfar / (zfar - znear)
639
+ P[2, 3] = -(zfar * znear) / (zfar - znear)
640
+ return P
641
+
642
+
643
+ class MiniCam:
644
+ def __init__(self, c2w, width, height, fovy, fovx, znear, zfar):
645
+ # c2w (pose) should be in NeRF convention.
646
+
647
+ self.image_width = width
648
+ self.image_height = height
649
+ self.FoVy = fovy
650
+ self.FoVx = fovx
651
+ self.znear = znear
652
+ self.zfar = zfar
653
+
654
+ w2c = np.linalg.inv(c2w)
655
+
656
+ # rectify...
657
+ w2c[1:3, :3] *= -1
658
+ w2c[:3, 3] *= -1
659
+
660
+ self.world_view_transform = torch.tensor(w2c).transpose(0, 1).cuda()
661
+ self.projection_matrix = (
662
+ getProjectionMatrix(
663
+ znear=self.znear, zfar=self.zfar, fovX=self.FoVx, fovY=self.FoVy
664
+ )
665
+ .transpose(0, 1)
666
+ .cuda()
667
+ )
668
+ self.full_proj_transform = self.world_view_transform @ self.projection_matrix
669
+ self.camera_center = -torch.tensor(c2w[:3, 3]).cuda()
670
+
671
+
672
+ class Renderer:
673
+ def __init__(self, sh_degree=3, white_background=True, radius=1):
674
+
675
+ self.sh_degree = sh_degree
676
+ self.white_background = white_background
677
+ self.radius = radius
678
+
679
+ self.gaussians = GaussianModel(sh_degree)
680
+
681
+ self.bg_color = torch.tensor(
682
+ [1, 1, 1] if white_background else [0, 0, 0],
683
+ dtype=torch.float32,
684
+ device="cuda",
685
+ )
686
+
687
+ def initialize(self, input=None, num_pts=5000, radius=0.5):
688
+ # load checkpoint
689
+ if input is None:
690
+ # init from random point cloud
691
+
692
+ phis = np.random.random((num_pts,)) * 2 * np.pi
693
+ costheta = np.random.random((num_pts,)) * 2 - 1
694
+ thetas = np.arccos(costheta)
695
+ mu = np.random.random((num_pts,))
696
+ radius = radius * np.cbrt(mu)
697
+ x = radius * np.sin(thetas) * np.cos(phis)
698
+ y = radius * np.sin(thetas) * np.sin(phis)
699
+ z = radius * np.cos(thetas)
700
+ xyz = np.stack((x, y, z), axis=1)
701
+ # xyz = np.random.random((num_pts, 3)) * 2.6 - 1.3
702
+
703
+ shs = np.random.random((num_pts, 3)) / 255.0
704
+ pcd = BasicPointCloud(
705
+ points=xyz, colors=SH2RGB(shs), normals=np.zeros((num_pts, 3))
706
+ )
707
+ self.gaussians.create_from_pcd(pcd, 10)
708
+ elif isinstance(input, BasicPointCloud):
709
+ # load from a provided pcd
710
+ self.gaussians.create_from_pcd(input, 1)
711
+ else:
712
+ # load from saved ply
713
+ self.gaussians.load_ply(input)
714
+
715
+ def render(
716
+ self,
717
+ viewpoint_camera,
718
+ scaling_modifier=1.0,
719
+ invert_bg_color=False,
720
+ override_color=None,
721
+ compute_cov3D_python=False,
722
+ convert_SHs_python=False,
723
+ ):
724
+ # Create zero tensor. We will use it to make pytorch return gradients of the 2D (screen-space) means
725
+ screenspace_points = (
726
+ torch.zeros_like(
727
+ self.gaussians.get_xyz,
728
+ dtype=self.gaussians.get_xyz.dtype,
729
+ requires_grad=True,
730
+ device="cuda",
731
+ )
732
+ + 0
733
+ )
734
+ try:
735
+ screenspace_points.retain_grad()
736
+ except:
737
+ pass
738
+
739
+ # Set up rasterization configuration
740
+ tanfovx = math.tan(viewpoint_camera.FoVx * 0.5)
741
+ tanfovy = math.tan(viewpoint_camera.FoVy * 0.5)
742
+
743
+ raster_settings = GaussianRasterizationSettings(
744
+ image_height=int(viewpoint_camera.image_height),
745
+ image_width=int(viewpoint_camera.image_width),
746
+ tanfovx=tanfovx,
747
+ tanfovy=tanfovy,
748
+ bg=self.bg_color if not invert_bg_color else 1 - self.bg_color,
749
+ scale_modifier=scaling_modifier,
750
+ viewmatrix=viewpoint_camera.world_view_transform,
751
+ projmatrix=viewpoint_camera.full_proj_transform,
752
+ sh_degree=self.gaussians.active_sh_degree,
753
+ campos=viewpoint_camera.camera_center,
754
+ prefiltered=False,
755
+ debug=False,
756
+ )
757
+
758
+ rasterizer = GaussianRasterizer(raster_settings=raster_settings)
759
+
760
+ means3D = self.gaussians.get_xyz
761
+ means2D = screenspace_points
762
+ opacity = self.gaussians.get_opacity
763
+
764
+ # If precomputed 3d covariance is provided, use it. If not, then it will be computed from
765
+ # scaling / rotation by the rasterizer.
766
+ scales = None
767
+ rotations = None
768
+ cov3D_precomp = None
769
+ if compute_cov3D_python:
770
+ cov3D_precomp = self.gaussians.get_covariance(scaling_modifier)
771
+ else:
772
+ scales = self.gaussians.get_scaling
773
+ rotations = self.gaussians.get_rotation
774
+
775
+ # If precomputed colors are provided, use them. Otherwise, if it is desired to precompute colors
776
+ # from SHs in Python, do it. If not, then SH -> RGB conversion will be done by rasterizer.
777
+ shs = None
778
+ colors_precomp = None
779
+ if colors_precomp is None:
780
+ if convert_SHs_python:
781
+ shs_view = self.gaussians.get_features.transpose(1, 2).view(
782
+ -1, 3, (self.gaussians.max_sh_degree + 1) ** 2
783
+ )
784
+ dir_pp = self.gaussians.get_xyz - viewpoint_camera.camera_center.repeat(
785
+ self.gaussians.get_features.shape[0], 1
786
+ )
787
+ dir_pp_normalized = dir_pp / dir_pp.norm(dim=1, keepdim=True)
788
+ sh2rgb = eval_sh(
789
+ self.gaussians.active_sh_degree, shs_view, dir_pp_normalized
790
+ )
791
+ colors_precomp = torch.clamp_min(sh2rgb + 0.5, 0.0)
792
+ else:
793
+ shs = self.gaussians.get_features
794
+ else:
795
+ colors_precomp = override_color
796
+
797
+ # Rasterize visible Gaussians to image, obtain their radii (on screen).
798
+ rendered_image, radii, rendered_depth, rendered_alpha = rasterizer(
799
+ means3D=means3D,
800
+ means2D=means2D,
801
+ shs=shs,
802
+ colors_precomp=colors_precomp,
803
+ opacities=opacity,
804
+ scales=scales,
805
+ rotations=rotations,
806
+ cov3D_precomp=cov3D_precomp,
807
+ )
808
+
809
+ rendered_image = rendered_image.clamp(0, 1)
810
+
811
+ # Those Gaussians that were frustum culled or had a radius of 0 were not visible.
812
+ # They will be excluded from value updates used in the splitting criteria.
813
+ return {
814
+ "image": rendered_image,
815
+ "depth": rendered_depth,
816
+ "alpha": rendered_alpha,
817
+ "viewspace_points": screenspace_points,
818
+ "visibility_filter": radii > 0,
819
+ "radii": radii,
820
+ }
guidance/sd_utils.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import CLIPTextModel, CLIPTokenizer, logging
2
+ from diffusers import (
3
+ AutoencoderKL,
4
+ UNet2DConditionModel,
5
+ PNDMScheduler,
6
+ DDIMScheduler,
7
+ StableDiffusionPipeline,
8
+ )
9
+ from diffusers.utils.import_utils import is_xformers_available
10
+
11
+ # suppress partial model loading warning
12
+ logging.set_verbosity_error()
13
+
14
+ import numpy as np
15
+ import torch
16
+ import torch.nn as nn
17
+ import torch.nn.functional as F
18
+
19
+
20
+ def seed_everything(seed):
21
+ torch.manual_seed(seed)
22
+ torch.cuda.manual_seed(seed)
23
+ # torch.backends.cudnn.deterministic = True
24
+ # torch.backends.cudnn.benchmark = True
25
+
26
+
27
+ class StableDiffusion(nn.Module):
28
+ def __init__(
29
+ self,
30
+ device,
31
+ fp16=True,
32
+ vram_O=False,
33
+ sd_version="2.1",
34
+ hf_key=None,
35
+ t_range=[0.02, 0.98],
36
+ ):
37
+ super().__init__()
38
+
39
+ self.device = device
40
+ self.sd_version = sd_version
41
+
42
+ if hf_key is not None:
43
+ print(f"[INFO] using hugging face custom model key: {hf_key}")
44
+ model_key = hf_key
45
+ elif self.sd_version == "2.1":
46
+ model_key = "stabilityai/stable-diffusion-2-1-base"
47
+ elif self.sd_version == "2.0":
48
+ model_key = "stabilityai/stable-diffusion-2-base"
49
+ elif self.sd_version == "1.5":
50
+ model_key = "runwayml/stable-diffusion-v1-5"
51
+ else:
52
+ raise ValueError(
53
+ f"Stable-diffusion version {self.sd_version} not supported."
54
+ )
55
+
56
+ self.dtype = torch.float16 if fp16 else torch.float32
57
+
58
+ # Create model
59
+ pipe = StableDiffusionPipeline.from_pretrained(
60
+ model_key, torch_dtype=self.dtype
61
+ )
62
+
63
+ if vram_O:
64
+ pipe.enable_sequential_cpu_offload()
65
+ pipe.enable_vae_slicing()
66
+ pipe.unet.to(memory_format=torch.channels_last)
67
+ pipe.enable_attention_slicing(1)
68
+ # pipe.enable_model_cpu_offload()
69
+ else:
70
+ pipe.to(device)
71
+
72
+ self.vae = pipe.vae
73
+ self.tokenizer = pipe.tokenizer
74
+ self.text_encoder = pipe.text_encoder
75
+ self.unet = pipe.unet
76
+
77
+ self.scheduler = DDIMScheduler.from_pretrained(
78
+ model_key, subfolder="scheduler", torch_dtype=self.dtype
79
+ )
80
+
81
+ del pipe
82
+
83
+ self.num_train_timesteps = self.scheduler.config.num_train_timesteps
84
+ self.min_step = int(self.num_train_timesteps * t_range[0])
85
+ self.max_step = int(self.num_train_timesteps * t_range[1])
86
+ self.alphas = self.scheduler.alphas_cumprod.to(self.device) # for convenience
87
+
88
+ self.embeddings = None
89
+
90
+ @torch.no_grad()
91
+ def get_text_embeds(self, prompts, negative_prompts):
92
+ pos_embeds = self.encode_text(prompts) # [1, 77, 768]
93
+ neg_embeds = self.encode_text(negative_prompts)
94
+ self.embeddings = torch.cat([neg_embeds, pos_embeds], dim=0) # [2, 77, 768]
95
+
96
+ def encode_text(self, prompt):
97
+ # prompt: [str]
98
+ inputs = self.tokenizer(
99
+ prompt,
100
+ padding="max_length",
101
+ max_length=self.tokenizer.model_max_length,
102
+ return_tensors="pt",
103
+ )
104
+ embeddings = self.text_encoder(inputs.input_ids.to(self.device))[0]
105
+ return embeddings
106
+
107
+ @torch.no_grad()
108
+ def refine(self, pred_rgb,
109
+ guidance_scale=100, steps=50, strength=0.8,
110
+ ):
111
+
112
+ batch_size = pred_rgb.shape[0]
113
+ pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode='bilinear', align_corners=False)
114
+ latents = self.encode_imgs(pred_rgb_512.to(self.dtype))
115
+ # latents = torch.randn((1, 4, 64, 64), device=self.device, dtype=self.dtype)
116
+
117
+ self.scheduler.set_timesteps(steps)
118
+ init_step = int(steps * strength)
119
+ latents = self.scheduler.add_noise(latents, torch.randn_like(latents), self.scheduler.timesteps[init_step])
120
+
121
+ for i, t in enumerate(self.scheduler.timesteps[init_step:]):
122
+
123
+ latent_model_input = torch.cat([latents] * 2)
124
+
125
+ noise_pred = self.unet(
126
+ latent_model_input, t, encoder_hidden_states=self.embeddings,
127
+ ).sample
128
+
129
+ noise_pred_uncond, noise_pred_cond = noise_pred.chunk(2)
130
+ noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_cond - noise_pred_uncond)
131
+
132
+ latents = self.scheduler.step(noise_pred, t, latents).prev_sample
133
+
134
+ imgs = self.decode_latents(latents) # [1, 3, 512, 512]
135
+ return imgs
136
+
137
+ def train_step(
138
+ self,
139
+ pred_rgb,
140
+ step_ratio=None,
141
+ guidance_scale=100,
142
+ as_latent=False,
143
+ ):
144
+
145
+ batch_size = pred_rgb.shape[0]
146
+ pred_rgb = pred_rgb.to(self.dtype)
147
+
148
+ if as_latent:
149
+ latents = F.interpolate(pred_rgb, (64, 64), mode="bilinear", align_corners=False) * 2 - 1
150
+ else:
151
+ # interp to 512x512 to be fed into vae.
152
+ pred_rgb_512 = F.interpolate(pred_rgb, (512, 512), mode="bilinear", align_corners=False)
153
+ # encode image into latents with vae, requires grad!
154
+ latents = self.encode_imgs(pred_rgb_512)
155
+
156
+ if step_ratio is not None:
157
+ # dreamtime-like
158
+ # t = self.max_step - (self.max_step - self.min_step) * np.sqrt(step_ratio)
159
+ t = np.round((1 - step_ratio) * self.num_train_timesteps).clip(self.min_step, self.max_step)
160
+ t = torch.full((batch_size,), t, dtype=torch.long, device=self.device)
161
+ else:
162
+ t = torch.randint(self.min_step, self.max_step + 1, (batch_size,), dtype=torch.long, device=self.device)
163
+
164
+ # w(t), sigma_t^2
165
+ w = (1 - self.alphas[t]).view(batch_size, 1, 1, 1)
166
+
167
+ # predict the noise residual with unet, NO grad!
168
+ with torch.no_grad():
169
+ # add noise
170
+ noise = torch.randn_like(latents)
171
+ latents_noisy = self.scheduler.add_noise(latents, noise, t)
172
+ # pred noise
173
+ latent_model_input = torch.cat([latents_noisy] * 2)
174
+ tt = torch.cat([t] * 2)
175
+
176
+ noise_pred = self.unet(
177
+ latent_model_input, tt, encoder_hidden_states=self.embeddings.repeat(batch_size, 1, 1)
178
+ ).sample
179
+
180
+ # perform guidance (high scale from paper!)
181
+ noise_pred_uncond, noise_pred_pos = noise_pred.chunk(2)
182
+ noise_pred = noise_pred_uncond + guidance_scale * (
183
+ noise_pred_pos - noise_pred_uncond
184
+ )
185
+
186
+ grad = w * (noise_pred - noise)
187
+ grad = torch.nan_to_num(grad)
188
+
189
+ # seems important to avoid NaN...
190
+ # grad = grad.clamp(-1, 1)
191
+
192
+ target = (latents - grad).detach()
193
+ loss = 0.5 * F.mse_loss(latents.float(), target, reduction='sum') / latents.shape[0]
194
+
195
+ return loss
196
+
197
+ @torch.no_grad()
198
+ def produce_latents(
199
+ self,
200
+ height=512,
201
+ width=512,
202
+ num_inference_steps=50,
203
+ guidance_scale=7.5,
204
+ latents=None,
205
+ ):
206
+ if latents is None:
207
+ latents = torch.randn(
208
+ (
209
+ self.embeddings.shape[0] // 2,
210
+ self.unet.in_channels,
211
+ height // 8,
212
+ width // 8,
213
+ ),
214
+ device=self.device,
215
+ )
216
+
217
+ self.scheduler.set_timesteps(num_inference_steps)
218
+
219
+ for i, t in enumerate(self.scheduler.timesteps):
220
+ # expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
221
+ latent_model_input = torch.cat([latents] * 2)
222
+ # predict the noise residual
223
+ noise_pred = self.unet(
224
+ latent_model_input, t, encoder_hidden_states=self.embeddings
225
+ ).sample
226
+
227
+ # perform guidance
228
+ noise_pred_uncond, noise_pred_cond = noise_pred.chunk(2)
229
+ noise_pred = noise_pred_uncond + guidance_scale * (
230
+ noise_pred_cond - noise_pred_uncond
231
+ )
232
+
233
+ # compute the previous noisy sample x_t -> x_t-1
234
+ latents = self.scheduler.step(noise_pred, t, latents).prev_sample
235
+
236
+ return latents
237
+
238
+ def decode_latents(self, latents):
239
+ latents = 1 / self.vae.config.scaling_factor * latents
240
+
241
+ imgs = self.vae.decode(latents).sample
242
+ imgs = (imgs / 2 + 0.5).clamp(0, 1)
243
+
244
+ return imgs
245
+
246
+ def encode_imgs(self, imgs):
247
+ # imgs: [B, 3, H, W]
248
+
249
+ imgs = 2 * imgs - 1
250
+
251
+ posterior = self.vae.encode(imgs).latent_dist
252
+ latents = posterior.sample() * self.vae.config.scaling_factor
253
+
254
+ return latents
255
+
256
+ def prompt_to_img(
257
+ self,
258
+ prompts,
259
+ negative_prompts="",
260
+ height=512,
261
+ width=512,
262
+ num_inference_steps=50,
263
+ guidance_scale=7.5,
264
+ latents=None,
265
+ ):
266
+ if isinstance(prompts, str):
267
+ prompts = [prompts]
268
+
269
+ if isinstance(negative_prompts, str):
270
+ negative_prompts = [negative_prompts]
271
+
272
+ # Prompts -> text embeds
273
+ self.get_text_embeds(prompts, negative_prompts)
274
+
275
+ # Text embeds -> img latents
276
+ latents = self.produce_latents(
277
+ height=height,
278
+ width=width,
279
+ latents=latents,
280
+ num_inference_steps=num_inference_steps,
281
+ guidance_scale=guidance_scale,
282
+ ) # [1, 4, 64, 64]
283
+
284
+ # Img latents -> imgs
285
+ imgs = self.decode_latents(latents) # [1, 3, 512, 512]
286
+
287
+ # Img to Numpy
288
+ imgs = imgs.detach().cpu().permute(0, 2, 3, 1).numpy()
289
+ imgs = (imgs * 255).round().astype("uint8")
290
+
291
+ return imgs
292
+
293
+
294
+ if __name__ == "__main__":
295
+ import argparse
296
+ import matplotlib.pyplot as plt
297
+
298
+ parser = argparse.ArgumentParser()
299
+ parser.add_argument("prompt", type=str)
300
+ parser.add_argument("--negative", default="", type=str)
301
+ parser.add_argument(
302
+ "--sd_version",
303
+ type=str,
304
+ default="2.1",
305
+ choices=["1.5", "2.0", "2.1"],
306
+ help="stable diffusion version",
307
+ )
308
+ parser.add_argument(
309
+ "--hf_key",
310
+ type=str,
311
+ default=None,
312
+ help="hugging face Stable diffusion model key",
313
+ )
314
+ parser.add_argument("--fp16", action="store_true", help="use float16 for training")
315
+ parser.add_argument(
316
+ "--vram_O", action="store_true", help="optimization for low VRAM usage"
317
+ )
318
+ parser.add_argument("-H", type=int, default=512)
319
+ parser.add_argument("-W", type=int, default=512)
320
+ parser.add_argument("--seed", type=int, default=0)
321
+ parser.add_argument("--steps", type=int, default=50)
322
+ opt = parser.parse_args()
323
+
324
+ seed_everything(opt.seed)
325
+
326
+ device = torch.device("cuda")
327
+
328
+ sd = StableDiffusion(device, opt.fp16, opt.vram_O, opt.sd_version, opt.hf_key)
329
+
330
+ imgs = sd.prompt_to_img(opt.prompt, opt.negative, opt.H, opt.W, opt.steps)
331
+
332
+ # visualize image
333
+ plt.imshow(imgs[0])
334
+ plt.show()
guidance/zero123_utils.py ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import CLIPTextModel, CLIPTokenizer, logging
2
+ from diffusers import (
3
+ AutoencoderKL,
4
+ UNet2DConditionModel,
5
+ DDIMScheduler,
6
+ StableDiffusionPipeline,
7
+ )
8
+ import torchvision.transforms.functional as TF
9
+
10
+ import numpy as np
11
+ import torch
12
+ import torch.nn as nn
13
+ import torch.nn.functional as F
14
+
15
+ import sys
16
+ sys.path.append('./')
17
+
18
+ from zero123 import Zero123Pipeline
19
+
20
+
21
+ class Zero123(nn.Module):
22
+ def __init__(self, device, fp16=True, t_range=[0.02, 0.98]):
23
+ super().__init__()
24
+
25
+ self.device = device
26
+ self.fp16 = fp16
27
+ self.dtype = torch.float16 if fp16 else torch.float32
28
+
29
+ self.pipe = Zero123Pipeline.from_pretrained(
30
+ # "bennyguo/zero123-diffusers",
31
+ "bennyguo/zero123-xl-diffusers",
32
+ # './model_cache/zero123_xl',
33
+ variant="fp16_ema" if self.fp16 else None,
34
+ torch_dtype=self.dtype,
35
+ ).to(self.device)
36
+
37
+ # for param in self.pipe.parameters():
38
+ # param.requires_grad = False
39
+
40
+ self.pipe.image_encoder.eval()
41
+ self.pipe.vae.eval()
42
+ self.pipe.unet.eval()
43
+ self.pipe.clip_camera_projection.eval()
44
+
45
+ self.vae = self.pipe.vae
46
+ self.unet = self.pipe.unet
47
+
48
+ self.pipe.set_progress_bar_config(disable=True)
49
+
50
+ self.scheduler = DDIMScheduler.from_config(self.pipe.scheduler.config)
51
+ self.num_train_timesteps = self.scheduler.config.num_train_timesteps
52
+
53
+ self.min_step = int(self.num_train_timesteps * t_range[0])
54
+ self.max_step = int(self.num_train_timesteps * t_range[1])
55
+ self.alphas = self.scheduler.alphas_cumprod.to(self.device) # for convenience
56
+
57
+ self.embeddings = None
58
+
59
+ @torch.no_grad()
60
+ def get_img_embeds(self, x):
61
+ # x: image tensor in [0, 1]
62
+ x = F.interpolate(x, (256, 256), mode='bilinear', align_corners=False)
63
+ x_pil = [TF.to_pil_image(image) for image in x]
64
+ x_clip = self.pipe.feature_extractor(images=x_pil, return_tensors="pt").pixel_values.to(device=self.device, dtype=self.dtype)
65
+ c = self.pipe.image_encoder(x_clip).image_embeds
66
+ v = self.encode_imgs(x.to(self.dtype)) / self.vae.config.scaling_factor
67
+ self.embeddings = [c, v]
68
+
69
+ @torch.no_grad()
70
+ def refine(self, pred_rgb, polar, azimuth, radius,
71
+ guidance_scale=5, steps=50, strength=0.8,
72
+ ):
73
+
74
+ batch_size = pred_rgb.shape[0]
75
+
76
+ self.scheduler.set_timesteps(steps)
77
+
78
+ if strength == 0:
79
+ init_step = 0
80
+ latents = torch.randn((1, 4, 32, 32), device=self.device, dtype=self.dtype)
81
+ else:
82
+ init_step = int(steps * strength)
83
+ pred_rgb_256 = F.interpolate(pred_rgb, (256, 256), mode='bilinear', align_corners=False)
84
+ latents = self.encode_imgs(pred_rgb_256.to(self.dtype))
85
+ latents = self.scheduler.add_noise(latents, torch.randn_like(latents), self.scheduler.timesteps[init_step])
86
+
87
+ T = np.stack([np.deg2rad(polar), np.sin(np.deg2rad(azimuth)), np.cos(np.deg2rad(azimuth)), radius], axis=-1)
88
+ T = torch.from_numpy(T).unsqueeze(1).to(self.dtype).to(self.device) # [8, 1, 4]
89
+ cc_emb = torch.cat([self.embeddings[0].repeat(batch_size, 1, 1), T], dim=-1)
90
+ cc_emb = self.pipe.clip_camera_projection(cc_emb)
91
+ cc_emb = torch.cat([cc_emb, torch.zeros_like(cc_emb)], dim=0)
92
+
93
+ vae_emb = self.embeddings[1].repeat(batch_size, 1, 1, 1)
94
+ vae_emb = torch.cat([vae_emb, torch.zeros_like(vae_emb)], dim=0)
95
+
96
+ for i, t in enumerate(self.scheduler.timesteps[init_step:]):
97
+
98
+ x_in = torch.cat([latents] * 2)
99
+ t_in = torch.cat([t.view(1)] * 2).to(self.device)
100
+
101
+ noise_pred = self.unet(
102
+ torch.cat([x_in, vae_emb], dim=1),
103
+ t_in.to(self.unet.dtype),
104
+ encoder_hidden_states=cc_emb,
105
+ ).sample
106
+
107
+ noise_pred_cond, noise_pred_uncond = noise_pred.chunk(2)
108
+ noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_cond - noise_pred_uncond)
109
+
110
+ latents = self.scheduler.step(noise_pred, t, latents).prev_sample
111
+
112
+ imgs = self.decode_latents(latents) # [1, 3, 256, 256]
113
+ return imgs
114
+
115
+ def train_step(self, pred_rgb, polar, azimuth, radius, step_ratio=None, guidance_scale=5, as_latent=False):
116
+ # pred_rgb: tensor [1, 3, H, W] in [0, 1]
117
+
118
+ batch_size = pred_rgb.shape[0]
119
+
120
+ if as_latent:
121
+ latents = F.interpolate(pred_rgb, (32, 32), mode='bilinear', align_corners=False) * 2 - 1
122
+ else:
123
+ pred_rgb_256 = F.interpolate(pred_rgb, (256, 256), mode='bilinear', align_corners=False)
124
+ latents = self.encode_imgs(pred_rgb_256.to(self.dtype))
125
+
126
+ if step_ratio is not None:
127
+ # dreamtime-like
128
+ # t = self.max_step - (self.max_step - self.min_step) * np.sqrt(step_ratio)
129
+ t = np.round((1 - step_ratio) * self.num_train_timesteps).clip(self.min_step, self.max_step)
130
+ t = torch.full((batch_size,), t, dtype=torch.long, device=self.device)
131
+ else:
132
+ t = torch.randint(self.min_step, self.max_step + 1, (batch_size,), dtype=torch.long, device=self.device)
133
+
134
+ w = (1 - self.alphas[t]).view(batch_size, 1, 1, 1)
135
+
136
+ with torch.no_grad():
137
+ noise = torch.randn_like(latents)
138
+ latents_noisy = self.scheduler.add_noise(latents, noise, t)
139
+
140
+ x_in = torch.cat([latents_noisy] * 2)
141
+ t_in = torch.cat([t] * 2)
142
+
143
+ T = np.stack([np.deg2rad(polar), np.sin(np.deg2rad(azimuth)), np.cos(np.deg2rad(azimuth)), radius], axis=-1)
144
+ T = torch.from_numpy(T).unsqueeze(1).to(self.dtype).to(self.device) # [8, 1, 4]
145
+ cc_emb = torch.cat([self.embeddings[0].repeat(batch_size, 1, 1), T], dim=-1)
146
+ cc_emb = self.pipe.clip_camera_projection(cc_emb)
147
+ cc_emb = torch.cat([cc_emb, torch.zeros_like(cc_emb)], dim=0)
148
+
149
+ vae_emb = self.embeddings[1].repeat(batch_size, 1, 1, 1)
150
+ vae_emb = torch.cat([vae_emb, torch.zeros_like(vae_emb)], dim=0)
151
+
152
+ noise_pred = self.unet(
153
+ torch.cat([x_in, vae_emb], dim=1),
154
+ t_in.to(self.unet.dtype),
155
+ encoder_hidden_states=cc_emb,
156
+ ).sample
157
+
158
+ noise_pred_cond, noise_pred_uncond = noise_pred.chunk(2)
159
+ noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_cond - noise_pred_uncond)
160
+
161
+ grad = w * (noise_pred - noise)
162
+ grad = torch.nan_to_num(grad)
163
+
164
+ target = (latents - grad).detach()
165
+ loss = 0.5 * F.mse_loss(latents.float(), target, reduction='sum')
166
+
167
+ return loss
168
+
169
+
170
+ def decode_latents(self, latents):
171
+ latents = 1 / self.vae.config.scaling_factor * latents
172
+
173
+ imgs = self.vae.decode(latents).sample
174
+ imgs = (imgs / 2 + 0.5).clamp(0, 1)
175
+
176
+ return imgs
177
+
178
+ def encode_imgs(self, imgs, mode=False):
179
+ # imgs: [B, 3, H, W]
180
+
181
+ imgs = 2 * imgs - 1
182
+
183
+ posterior = self.vae.encode(imgs).latent_dist
184
+ if mode:
185
+ latents = posterior.mode()
186
+ else:
187
+ latents = posterior.sample()
188
+ latents = latents * self.vae.config.scaling_factor
189
+
190
+ return latents
191
+
192
+
193
+ if __name__ == '__main__':
194
+ import cv2
195
+ import argparse
196
+ import numpy as np
197
+ import matplotlib.pyplot as plt
198
+
199
+ parser = argparse.ArgumentParser()
200
+
201
+ parser.add_argument('input', type=str)
202
+ parser.add_argument('--polar', type=float, default=0, help='delta polar angle in [-90, 90]')
203
+ parser.add_argument('--azimuth', type=float, default=0, help='delta azimuth angle in [-180, 180]')
204
+ parser.add_argument('--radius', type=float, default=0, help='delta camera radius multiplier in [-0.5, 0.5]')
205
+
206
+ opt = parser.parse_args()
207
+
208
+ device = torch.device('cuda')
209
+
210
+ print(f'[INFO] loading image from {opt.input} ...')
211
+ image = cv2.imread(opt.input, cv2.IMREAD_UNCHANGED)
212
+ image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
213
+ image = cv2.resize(image, (256, 256), interpolation=cv2.INTER_AREA)
214
+ image = image.astype(np.float32) / 255.0
215
+ image = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0).contiguous().to(device)
216
+
217
+ print(f'[INFO] loading model ...')
218
+ zero123 = Zero123(device)
219
+
220
+ print(f'[INFO] running model ...')
221
+ zero123.get_img_embeds(image)
222
+
223
+ while True:
224
+ outputs = zero123.refine(image, polar=[opt.polar], azimuth=[opt.azimuth], radius=[opt.radius], strength=0)
225
+ plt.imshow(outputs.float().cpu().numpy().transpose(0, 2, 3, 1)[0])
226
+ plt.show()
main.py ADDED
@@ -0,0 +1,882 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ import time
4
+ import tqdm
5
+ import numpy as np
6
+ import dearpygui.dearpygui as dpg
7
+
8
+ import torch
9
+ import torch.nn.functional as F
10
+
11
+ import rembg
12
+
13
+ from cam_utils import orbit_camera, OrbitCamera
14
+ from gs_renderer import Renderer, MiniCam
15
+
16
+ from grid_put import mipmap_linear_grid_put_2d
17
+ from mesh import Mesh, safe_normalize
18
+
19
+ class GUI:
20
+ def __init__(self, opt):
21
+ self.opt = opt # shared with the trainer's opt to support in-place modification of rendering parameters.
22
+ self.gui = opt.gui # enable gui
23
+ self.W = opt.W
24
+ self.H = opt.H
25
+ self.cam = OrbitCamera(opt.W, opt.H, r=opt.radius, fovy=opt.fovy)
26
+
27
+ self.mode = "image"
28
+ self.seed = "random"
29
+
30
+ self.buffer_image = np.ones((self.W, self.H, 3), dtype=np.float32)
31
+ self.need_update = True # update buffer_image
32
+
33
+ # models
34
+ self.device = torch.device("cuda")
35
+ self.bg_remover = None
36
+
37
+ self.guidance_sd = None
38
+ self.guidance_zero123 = None
39
+
40
+ self.enable_sd = False
41
+ self.enable_zero123 = False
42
+
43
+ # renderer
44
+ self.renderer = Renderer(sh_degree=self.opt.sh_degree)
45
+ self.gaussain_scale_factor = 1
46
+
47
+ # input image
48
+ self.input_img = None
49
+ self.input_mask = None
50
+ self.input_img_torch = None
51
+ self.input_mask_torch = None
52
+ self.overlay_input_img = False
53
+ self.overlay_input_img_ratio = 0.5
54
+
55
+ # input text
56
+ self.prompt = ""
57
+ self.negative_prompt = ""
58
+
59
+ # training stuff
60
+ self.training = False
61
+ self.optimizer = None
62
+ self.step = 0
63
+ self.train_steps = 1 # steps per rendering loop
64
+
65
+ # load input data from cmdline
66
+ if self.opt.input is not None:
67
+ self.load_input(self.opt.input)
68
+
69
+ # override prompt from cmdline
70
+ if self.opt.prompt is not None:
71
+ self.prompt = self.opt.prompt
72
+
73
+ # override if provide a checkpoint
74
+ if self.opt.load is not None:
75
+ self.renderer.initialize(self.opt.load)
76
+ else:
77
+ # initialize gaussians to a blob
78
+ self.renderer.initialize(num_pts=self.opt.num_pts)
79
+
80
+ if self.gui:
81
+ dpg.create_context()
82
+ self.register_dpg()
83
+ self.test_step()
84
+
85
+ def __del__(self):
86
+ if self.gui:
87
+ dpg.destroy_context()
88
+
89
+ def seed_everything(self):
90
+ try:
91
+ seed = int(self.seed)
92
+ except:
93
+ seed = np.random.randint(0, 1000000)
94
+
95
+ os.environ["PYTHONHASHSEED"] = str(seed)
96
+ np.random.seed(seed)
97
+ torch.manual_seed(seed)
98
+ torch.cuda.manual_seed(seed)
99
+ torch.backends.cudnn.deterministic = True
100
+ torch.backends.cudnn.benchmark = True
101
+
102
+ self.last_seed = seed
103
+
104
+ def prepare_train(self):
105
+
106
+ self.step = 0
107
+
108
+ # setup training
109
+ self.renderer.gaussians.training_setup(self.opt)
110
+ # do not do progressive sh-level
111
+ self.renderer.gaussians.active_sh_degree = self.renderer.gaussians.max_sh_degree
112
+ self.optimizer = self.renderer.gaussians.optimizer
113
+
114
+ # default camera
115
+ pose = orbit_camera(self.opt.elevation, 0, self.opt.radius)
116
+ self.fixed_cam = MiniCam(
117
+ pose,
118
+ self.opt.ref_size,
119
+ self.opt.ref_size,
120
+ self.cam.fovy,
121
+ self.cam.fovx,
122
+ self.cam.near,
123
+ self.cam.far,
124
+ )
125
+
126
+ self.enable_sd = self.opt.lambda_sd > 0 and self.prompt != ""
127
+ self.enable_zero123 = self.opt.lambda_zero123 > 0 and self.input_img is not None
128
+
129
+ # lazy load guidance model
130
+ if self.guidance_sd is None and self.enable_sd:
131
+ print(f"[INFO] loading SD...")
132
+ from guidance.sd_utils import StableDiffusion
133
+ self.guidance_sd = StableDiffusion(self.device)
134
+ print(f"[INFO] loaded SD!")
135
+
136
+ if self.guidance_zero123 is None and self.enable_zero123:
137
+ print(f"[INFO] loading zero123...")
138
+ from guidance.zero123_utils import Zero123
139
+ self.guidance_zero123 = Zero123(self.device)
140
+ print(f"[INFO] loaded zero123!")
141
+
142
+ # input image
143
+ if self.input_img is not None:
144
+ self.input_img_torch = torch.from_numpy(self.input_img).permute(2, 0, 1).unsqueeze(0).to(self.device)
145
+ self.input_img_torch = F.interpolate(self.input_img_torch, (self.opt.ref_size, self.opt.ref_size), mode="bilinear", align_corners=False)
146
+
147
+ self.input_mask_torch = torch.from_numpy(self.input_mask).permute(2, 0, 1).unsqueeze(0).to(self.device)
148
+ self.input_mask_torch = F.interpolate(self.input_mask_torch, (self.opt.ref_size, self.opt.ref_size), mode="bilinear", align_corners=False)
149
+
150
+ # prepare embeddings
151
+ with torch.no_grad():
152
+
153
+ if self.enable_sd:
154
+ self.guidance_sd.get_text_embeds([self.prompt], [self.negative_prompt])
155
+
156
+ if self.enable_zero123:
157
+ self.guidance_zero123.get_img_embeds(self.input_img_torch)
158
+
159
+ def train_step(self):
160
+ starter = torch.cuda.Event(enable_timing=True)
161
+ ender = torch.cuda.Event(enable_timing=True)
162
+ starter.record()
163
+
164
+ for _ in range(self.train_steps):
165
+
166
+ self.step += 1
167
+ step_ratio = min(1, self.step / self.opt.iters)
168
+
169
+ # update lr
170
+ self.renderer.gaussians.update_learning_rate(self.step)
171
+
172
+ loss = 0
173
+
174
+ ### known view
175
+ if self.input_img_torch is not None:
176
+ cur_cam = self.fixed_cam
177
+ out = self.renderer.render(cur_cam)
178
+
179
+ # rgb loss
180
+ image = out["image"].unsqueeze(0) # [1, 3, H, W] in [0, 1]
181
+ loss = loss + 10000 * step_ratio * F.mse_loss(image, self.input_img_torch)
182
+
183
+ # mask loss
184
+ mask = out["alpha"].unsqueeze(0) # [1, 1, H, W] in [0, 1]
185
+ loss = loss + 1000 * step_ratio * F.mse_loss(mask, self.input_mask_torch)
186
+
187
+ ### novel view (manual batch)
188
+ render_resolution = 128 if step_ratio < 0.3 else (256 if step_ratio < 0.6 else 512)
189
+ images = []
190
+ vers, hors, radii = [], [], []
191
+ # avoid too large elevation (> 80 or < -80), and make sure it always cover [-30, 30]
192
+ min_ver = max(min(-30, -30 - self.opt.elevation), -80 - self.opt.elevation)
193
+ max_ver = min(max(30, 30 - self.opt.elevation), 80 - self.opt.elevation)
194
+ for _ in range(self.opt.batch_size):
195
+
196
+ # render random view
197
+ ver = np.random.randint(min_ver, max_ver)
198
+ hor = np.random.randint(-180, 180)
199
+ radius = 0
200
+
201
+ vers.append(ver)
202
+ hors.append(hor)
203
+ radii.append(radius)
204
+
205
+ pose = orbit_camera(self.opt.elevation + ver, hor, self.opt.radius + radius)
206
+
207
+ cur_cam = MiniCam(
208
+ pose,
209
+ render_resolution,
210
+ render_resolution,
211
+ self.cam.fovy,
212
+ self.cam.fovx,
213
+ self.cam.near,
214
+ self.cam.far,
215
+ )
216
+
217
+ invert_bg_color = np.random.rand() > self.opt.invert_bg_prob
218
+ out = self.renderer.render(cur_cam, invert_bg_color=invert_bg_color)
219
+
220
+ image = out["image"].unsqueeze(0)# [1, 3, H, W] in [0, 1]
221
+ images.append(image)
222
+
223
+ images = torch.cat(images, dim=0)
224
+
225
+ # import kiui
226
+ # kiui.lo(hor, ver)
227
+ # kiui.vis.plot_image(image)
228
+
229
+ # guidance loss
230
+ if self.enable_sd:
231
+ loss = loss + self.opt.lambda_sd * self.guidance_sd.train_step(images, step_ratio)
232
+
233
+ if self.enable_zero123:
234
+ loss = loss + self.opt.lambda_zero123 * self.guidance_zero123.train_step(images, vers, hors, radii, step_ratio)
235
+
236
+ # optimize step
237
+ loss.backward()
238
+ self.optimizer.step()
239
+ self.optimizer.zero_grad()
240
+
241
+ # densify and prune
242
+ if self.step >= self.opt.density_start_iter and self.step <= self.opt.density_end_iter:
243
+ viewspace_point_tensor, visibility_filter, radii = out["viewspace_points"], out["visibility_filter"], out["radii"]
244
+ self.renderer.gaussians.max_radii2D[visibility_filter] = torch.max(self.renderer.gaussians.max_radii2D[visibility_filter], radii[visibility_filter])
245
+ self.renderer.gaussians.add_densification_stats(viewspace_point_tensor, visibility_filter)
246
+
247
+ if self.step % self.opt.densification_interval == 0:
248
+ # size_threshold = 20 if self.step > self.opt.opacity_reset_interval else None
249
+ self.renderer.gaussians.densify_and_prune(self.opt.densify_grad_threshold, min_opacity=0.01, extent=0.5, max_screen_size=1)
250
+
251
+ if self.step % self.opt.opacity_reset_interval == 0:
252
+ self.renderer.gaussians.reset_opacity()
253
+
254
+ ender.record()
255
+ torch.cuda.synchronize()
256
+ t = starter.elapsed_time(ender)
257
+
258
+ self.need_update = True
259
+
260
+ if self.gui:
261
+ dpg.set_value("_log_train_time", f"{t:.4f}ms")
262
+ dpg.set_value(
263
+ "_log_train_log",
264
+ f"step = {self.step: 5d} (+{self.train_steps: 2d}) loss = {loss.item():.4f}",
265
+ )
266
+
267
+ # dynamic train steps (no need for now)
268
+ # max allowed train time per-frame is 500 ms
269
+ # full_t = t / self.train_steps * 16
270
+ # train_steps = min(16, max(4, int(16 * 500 / full_t)))
271
+ # if train_steps > self.train_steps * 1.2 or train_steps < self.train_steps * 0.8:
272
+ # self.train_steps = train_steps
273
+
274
+ @torch.no_grad()
275
+ def test_step(self):
276
+ # ignore if no need to update
277
+ if not self.need_update:
278
+ return
279
+
280
+ starter = torch.cuda.Event(enable_timing=True)
281
+ ender = torch.cuda.Event(enable_timing=True)
282
+ starter.record()
283
+
284
+ # should update image
285
+ if self.need_update:
286
+ # render image
287
+
288
+ cur_cam = MiniCam(
289
+ self.cam.pose,
290
+ self.W,
291
+ self.H,
292
+ self.cam.fovy,
293
+ self.cam.fovx,
294
+ self.cam.near,
295
+ self.cam.far,
296
+ )
297
+
298
+ out = self.renderer.render(cur_cam, self.gaussain_scale_factor)
299
+
300
+ buffer_image = out[self.mode] # [3, H, W]
301
+
302
+ if self.mode in ['depth', 'alpha']:
303
+ buffer_image = buffer_image.repeat(3, 1, 1)
304
+ if self.mode == 'depth':
305
+ buffer_image = (buffer_image - buffer_image.min()) / (buffer_image.max() - buffer_image.min() + 1e-20)
306
+
307
+ buffer_image = F.interpolate(
308
+ buffer_image.unsqueeze(0),
309
+ size=(self.H, self.W),
310
+ mode="bilinear",
311
+ align_corners=False,
312
+ ).squeeze(0)
313
+
314
+ self.buffer_image = (
315
+ buffer_image.permute(1, 2, 0)
316
+ .contiguous()
317
+ .clamp(0, 1)
318
+ .contiguous()
319
+ .detach()
320
+ .cpu()
321
+ .numpy()
322
+ )
323
+
324
+ # display input_image
325
+ if self.overlay_input_img and self.input_img is not None:
326
+ self.buffer_image = (
327
+ self.buffer_image * (1 - self.overlay_input_img_ratio)
328
+ + self.input_img * self.overlay_input_img_ratio
329
+ )
330
+
331
+ self.need_update = False
332
+
333
+ ender.record()
334
+ torch.cuda.synchronize()
335
+ t = starter.elapsed_time(ender)
336
+
337
+ if self.gui:
338
+ dpg.set_value("_log_infer_time", f"{t:.4f}ms ({int(1000/t)} FPS)")
339
+ dpg.set_value(
340
+ "_texture", self.buffer_image
341
+ ) # buffer must be contiguous, else seg fault!
342
+
343
+
344
+ def load_input(self, file):
345
+ # load image
346
+ print(f'[INFO] load image from {file}...')
347
+ img = cv2.imread(file, cv2.IMREAD_UNCHANGED)
348
+ if img.shape[-1] == 3:
349
+ if self.bg_remover is None:
350
+ self.bg_remover = rembg.new_session()
351
+ img = rembg.remove(img, session=self.bg_remover)
352
+
353
+ img = cv2.resize(img, (self.W, self.H), interpolation=cv2.INTER_AREA)
354
+ img = img.astype(np.float32) / 255.0
355
+
356
+ self.input_mask = img[..., 3:]
357
+ # white bg
358
+ self.input_img = img[..., :3] * self.input_mask + (1 - self.input_mask)
359
+ # bgr to rgb
360
+ self.input_img = self.input_img[..., ::-1].copy()
361
+
362
+ # load prompt
363
+ file_prompt = file.replace("_rgba.png", "_caption.txt")
364
+ if os.path.exists(file_prompt):
365
+ print(f'[INFO] load prompt from {file_prompt}...')
366
+ with open(file_prompt, "r") as f:
367
+ self.prompt = f.read().strip()
368
+
369
+ @torch.no_grad()
370
+ def save_model(self, mode='geo', texture_size=1024):
371
+ os.makedirs(self.opt.outdir, exist_ok=True)
372
+ if mode == 'geo':
373
+ path = os.path.join(self.opt.outdir, self.opt.save_path + '_mesh.ply')
374
+ mesh = self.renderer.gaussians.extract_mesh(path, self.opt.density_thresh)
375
+ mesh.write_ply(path)
376
+
377
+ elif mode == 'geo+tex':
378
+ path = os.path.join(self.opt.outdir, self.opt.save_path + '_mesh.' + self.opt.mesh_format)
379
+ mesh = self.renderer.gaussians.extract_mesh(path, self.opt.density_thresh)
380
+
381
+ # perform texture extraction
382
+ print(f"[INFO] unwrap uv...")
383
+ h = w = texture_size
384
+ mesh.auto_uv()
385
+ mesh.auto_normal()
386
+
387
+ albedo = torch.zeros((h, w, 3), device=self.device, dtype=torch.float32)
388
+ cnt = torch.zeros((h, w, 1), device=self.device, dtype=torch.float32)
389
+
390
+ # self.prepare_train() # tmp fix for not loading 0123
391
+ # vers = [0]
392
+ # hors = [0]
393
+ vers = [0] * 8 + [-45] * 8 + [45] * 8 + [-89.9, 89.9]
394
+ hors = [0, 45, -45, 90, -90, 135, -135, 180] * 3 + [0, 0]
395
+
396
+ render_resolution = 512
397
+
398
+ import nvdiffrast.torch as dr
399
+
400
+ if not self.opt.force_cuda_rast and (not self.opt.gui or os.name == 'nt'):
401
+ glctx = dr.RasterizeGLContext()
402
+ else:
403
+ glctx = dr.RasterizeCudaContext()
404
+
405
+ for ver, hor in zip(vers, hors):
406
+ # render image
407
+ pose = orbit_camera(ver, hor, self.cam.radius)
408
+
409
+ cur_cam = MiniCam(
410
+ pose,
411
+ render_resolution,
412
+ render_resolution,
413
+ self.cam.fovy,
414
+ self.cam.fovx,
415
+ self.cam.near,
416
+ self.cam.far,
417
+ )
418
+
419
+ cur_out = self.renderer.render(cur_cam)
420
+
421
+ rgbs = cur_out["image"].unsqueeze(0) # [1, 3, H, W] in [0, 1]
422
+
423
+ # enhance texture quality with zero123 [not working well]
424
+ # if self.opt.guidance_model == 'zero123':
425
+ # rgbs = self.guidance.refine(rgbs, [ver], [hor], [0])
426
+ # import kiui
427
+ # kiui.vis.plot_image(rgbs)
428
+
429
+ # get coordinate in texture image
430
+ pose = torch.from_numpy(pose.astype(np.float32)).to(self.device)
431
+ proj = torch.from_numpy(self.cam.perspective.astype(np.float32)).to(self.device)
432
+
433
+ v_cam = torch.matmul(F.pad(mesh.v, pad=(0, 1), mode='constant', value=1.0), torch.inverse(pose).T).float().unsqueeze(0)
434
+ v_clip = v_cam @ proj.T
435
+ rast, rast_db = dr.rasterize(glctx, v_clip, mesh.f, (render_resolution, render_resolution))
436
+
437
+ depth, _ = dr.interpolate(-v_cam[..., [2]], rast, mesh.f) # [1, H, W, 1]
438
+ depth = depth.squeeze(0) # [H, W, 1]
439
+
440
+ alpha = (rast[0, ..., 3:] > 0).float()
441
+
442
+ uvs, _ = dr.interpolate(mesh.vt.unsqueeze(0), rast, mesh.ft) # [1, 512, 512, 2] in [0, 1]
443
+
444
+ # use normal to produce a back-project mask
445
+ normal, _ = dr.interpolate(mesh.vn.unsqueeze(0).contiguous(), rast, mesh.fn)
446
+ normal = safe_normalize(normal[0])
447
+
448
+ # rotated normal (where [0, 0, 1] always faces camera)
449
+ rot_normal = normal @ pose[:3, :3]
450
+ viewcos = rot_normal[..., [2]]
451
+
452
+ mask = (alpha > 0) & (viewcos > 0.5) # [H, W, 1]
453
+ mask = mask.view(-1)
454
+
455
+ uvs = uvs.view(-1, 2).clamp(0, 1)[mask]
456
+ rgbs = rgbs.view(3, -1).permute(1, 0)[mask].contiguous()
457
+
458
+ # update texture image
459
+ cur_albedo, cur_cnt = mipmap_linear_grid_put_2d(
460
+ h, w,
461
+ uvs[..., [1, 0]] * 2 - 1,
462
+ rgbs,
463
+ min_resolution=256,
464
+ return_count=True,
465
+ )
466
+
467
+ # albedo += cur_albedo
468
+ # cnt += cur_cnt
469
+ mask = cnt.squeeze(-1) < 0.1
470
+ albedo[mask] += cur_albedo[mask]
471
+ cnt[mask] += cur_cnt[mask]
472
+
473
+ mask = cnt.squeeze(-1) > 0
474
+ albedo[mask] = albedo[mask] / cnt[mask].repeat(1, 3)
475
+
476
+ mask = mask.view(h, w)
477
+
478
+ albedo = albedo.detach().cpu().numpy()
479
+ mask = mask.detach().cpu().numpy()
480
+
481
+ # dilate texture
482
+ from sklearn.neighbors import NearestNeighbors
483
+ from scipy.ndimage import binary_dilation, binary_erosion
484
+
485
+ inpaint_region = binary_dilation(mask, iterations=32)
486
+ inpaint_region[mask] = 0
487
+
488
+ search_region = mask.copy()
489
+ not_search_region = binary_erosion(search_region, iterations=3)
490
+ search_region[not_search_region] = 0
491
+
492
+ search_coords = np.stack(np.nonzero(search_region), axis=-1)
493
+ inpaint_coords = np.stack(np.nonzero(inpaint_region), axis=-1)
494
+
495
+ knn = NearestNeighbors(n_neighbors=1, algorithm="kd_tree").fit(
496
+ search_coords
497
+ )
498
+ _, indices = knn.kneighbors(inpaint_coords)
499
+
500
+ albedo[tuple(inpaint_coords.T)] = albedo[tuple(search_coords[indices[:, 0]].T)]
501
+
502
+ mesh.albedo = torch.from_numpy(albedo).to(self.device)
503
+ mesh.write(path)
504
+
505
+ else:
506
+ path = os.path.join(self.opt.outdir, self.opt.save_path + '_model.ply')
507
+ self.renderer.gaussians.save_ply(path)
508
+
509
+ print(f"[INFO] save model to {path}.")
510
+
511
+ def register_dpg(self):
512
+ ### register texture
513
+
514
+ with dpg.texture_registry(show=False):
515
+ dpg.add_raw_texture(
516
+ self.W,
517
+ self.H,
518
+ self.buffer_image,
519
+ format=dpg.mvFormat_Float_rgb,
520
+ tag="_texture",
521
+ )
522
+
523
+ ### register window
524
+
525
+ # the rendered image, as the primary window
526
+ with dpg.window(
527
+ tag="_primary_window",
528
+ width=self.W,
529
+ height=self.H,
530
+ pos=[0, 0],
531
+ no_move=True,
532
+ no_title_bar=True,
533
+ no_scrollbar=True,
534
+ ):
535
+ # add the texture
536
+ dpg.add_image("_texture")
537
+
538
+ # dpg.set_primary_window("_primary_window", True)
539
+
540
+ # control window
541
+ with dpg.window(
542
+ label="Control",
543
+ tag="_control_window",
544
+ width=600,
545
+ height=self.H,
546
+ pos=[self.W, 0],
547
+ no_move=True,
548
+ no_title_bar=True,
549
+ ):
550
+ # button theme
551
+ with dpg.theme() as theme_button:
552
+ with dpg.theme_component(dpg.mvButton):
553
+ dpg.add_theme_color(dpg.mvThemeCol_Button, (23, 3, 18))
554
+ dpg.add_theme_color(dpg.mvThemeCol_ButtonHovered, (51, 3, 47))
555
+ dpg.add_theme_color(dpg.mvThemeCol_ButtonActive, (83, 18, 83))
556
+ dpg.add_theme_style(dpg.mvStyleVar_FrameRounding, 5)
557
+ dpg.add_theme_style(dpg.mvStyleVar_FramePadding, 3, 3)
558
+
559
+ # timer stuff
560
+ with dpg.group(horizontal=True):
561
+ dpg.add_text("Infer time: ")
562
+ dpg.add_text("no data", tag="_log_infer_time")
563
+
564
+ def callback_setattr(sender, app_data, user_data):
565
+ setattr(self, user_data, app_data)
566
+
567
+ # init stuff
568
+ with dpg.collapsing_header(label="Initialize", default_open=True):
569
+
570
+ # seed stuff
571
+ def callback_set_seed(sender, app_data):
572
+ self.seed = app_data
573
+ self.seed_everything()
574
+
575
+ dpg.add_input_text(
576
+ label="seed",
577
+ default_value=self.seed,
578
+ on_enter=True,
579
+ callback=callback_set_seed,
580
+ )
581
+
582
+ # input stuff
583
+ def callback_select_input(sender, app_data):
584
+ # only one item
585
+ for k, v in app_data["selections"].items():
586
+ dpg.set_value("_log_input", k)
587
+ self.load_input(v)
588
+
589
+ self.need_update = True
590
+
591
+ with dpg.file_dialog(
592
+ directory_selector=False,
593
+ show=False,
594
+ callback=callback_select_input,
595
+ file_count=1,
596
+ tag="file_dialog_tag",
597
+ width=700,
598
+ height=400,
599
+ ):
600
+ dpg.add_file_extension("Images{.jpg,.jpeg,.png}")
601
+
602
+ with dpg.group(horizontal=True):
603
+ dpg.add_button(
604
+ label="input",
605
+ callback=lambda: dpg.show_item("file_dialog_tag"),
606
+ )
607
+ dpg.add_text("", tag="_log_input")
608
+
609
+ # overlay stuff
610
+ with dpg.group(horizontal=True):
611
+
612
+ def callback_toggle_overlay_input_img(sender, app_data):
613
+ self.overlay_input_img = not self.overlay_input_img
614
+ self.need_update = True
615
+
616
+ dpg.add_checkbox(
617
+ label="overlay image",
618
+ default_value=self.overlay_input_img,
619
+ callback=callback_toggle_overlay_input_img,
620
+ )
621
+
622
+ def callback_set_overlay_input_img_ratio(sender, app_data):
623
+ self.overlay_input_img_ratio = app_data
624
+ self.need_update = True
625
+
626
+ dpg.add_slider_float(
627
+ label="ratio",
628
+ min_value=0,
629
+ max_value=1,
630
+ format="%.1f",
631
+ default_value=self.overlay_input_img_ratio,
632
+ callback=callback_set_overlay_input_img_ratio,
633
+ )
634
+
635
+ # prompt stuff
636
+
637
+ dpg.add_input_text(
638
+ label="prompt",
639
+ default_value=self.prompt,
640
+ callback=callback_setattr,
641
+ user_data="prompt",
642
+ )
643
+
644
+ dpg.add_input_text(
645
+ label="negative",
646
+ default_value=self.negative_prompt,
647
+ callback=callback_setattr,
648
+ user_data="negative_prompt",
649
+ )
650
+
651
+ # save current model
652
+ with dpg.group(horizontal=True):
653
+ dpg.add_text("Save: ")
654
+
655
+ def callback_save(sender, app_data, user_data):
656
+ self.save_model(mode=user_data)
657
+
658
+ dpg.add_button(
659
+ label="model",
660
+ tag="_button_save_model",
661
+ callback=callback_save,
662
+ user_data='model',
663
+ )
664
+ dpg.bind_item_theme("_button_save_model", theme_button)
665
+
666
+ dpg.add_button(
667
+ label="geo",
668
+ tag="_button_save_mesh",
669
+ callback=callback_save,
670
+ user_data='geo',
671
+ )
672
+ dpg.bind_item_theme("_button_save_mesh", theme_button)
673
+
674
+ dpg.add_button(
675
+ label="geo+tex",
676
+ tag="_button_save_mesh_with_tex",
677
+ callback=callback_save,
678
+ user_data='geo+tex',
679
+ )
680
+ dpg.bind_item_theme("_button_save_mesh_with_tex", theme_button)
681
+
682
+ dpg.add_input_text(
683
+ label="",
684
+ default_value=self.opt.save_path,
685
+ callback=callback_setattr,
686
+ user_data="save_path",
687
+ )
688
+
689
+ # training stuff
690
+ with dpg.collapsing_header(label="Train", default_open=True):
691
+ # lr and train button
692
+ with dpg.group(horizontal=True):
693
+ dpg.add_text("Train: ")
694
+
695
+ def callback_train(sender, app_data):
696
+ if self.training:
697
+ self.training = False
698
+ dpg.configure_item("_button_train", label="start")
699
+ else:
700
+ self.prepare_train()
701
+ self.training = True
702
+ dpg.configure_item("_button_train", label="stop")
703
+
704
+ # dpg.add_button(
705
+ # label="init", tag="_button_init", callback=self.prepare_train
706
+ # )
707
+ # dpg.bind_item_theme("_button_init", theme_button)
708
+
709
+ dpg.add_button(
710
+ label="start", tag="_button_train", callback=callback_train
711
+ )
712
+ dpg.bind_item_theme("_button_train", theme_button)
713
+
714
+ with dpg.group(horizontal=True):
715
+ dpg.add_text("", tag="_log_train_time")
716
+ dpg.add_text("", tag="_log_train_log")
717
+
718
+ # rendering options
719
+ with dpg.collapsing_header(label="Rendering", default_open=True):
720
+ # mode combo
721
+ def callback_change_mode(sender, app_data):
722
+ self.mode = app_data
723
+ self.need_update = True
724
+
725
+ dpg.add_combo(
726
+ ("image", "depth", "alpha"),
727
+ label="mode",
728
+ default_value=self.mode,
729
+ callback=callback_change_mode,
730
+ )
731
+
732
+ # fov slider
733
+ def callback_set_fovy(sender, app_data):
734
+ self.cam.fovy = np.deg2rad(app_data)
735
+ self.need_update = True
736
+
737
+ dpg.add_slider_int(
738
+ label="FoV (vertical)",
739
+ min_value=1,
740
+ max_value=120,
741
+ format="%d deg",
742
+ default_value=np.rad2deg(self.cam.fovy),
743
+ callback=callback_set_fovy,
744
+ )
745
+
746
+ def callback_set_gaussain_scale(sender, app_data):
747
+ self.gaussain_scale_factor = app_data
748
+ self.need_update = True
749
+
750
+ dpg.add_slider_float(
751
+ label="gaussain scale",
752
+ min_value=0,
753
+ max_value=1,
754
+ format="%.2f",
755
+ default_value=self.gaussain_scale_factor,
756
+ callback=callback_set_gaussain_scale,
757
+ )
758
+
759
+ ### register camera handler
760
+
761
+ def callback_camera_drag_rotate_or_draw_mask(sender, app_data):
762
+ if not dpg.is_item_focused("_primary_window"):
763
+ return
764
+
765
+ dx = app_data[1]
766
+ dy = app_data[2]
767
+
768
+ self.cam.orbit(dx, dy)
769
+ self.need_update = True
770
+
771
+ def callback_camera_wheel_scale(sender, app_data):
772
+ if not dpg.is_item_focused("_primary_window"):
773
+ return
774
+
775
+ delta = app_data
776
+
777
+ self.cam.scale(delta)
778
+ self.need_update = True
779
+
780
+ def callback_camera_drag_pan(sender, app_data):
781
+ if not dpg.is_item_focused("_primary_window"):
782
+ return
783
+
784
+ dx = app_data[1]
785
+ dy = app_data[2]
786
+
787
+ self.cam.pan(dx, dy)
788
+ self.need_update = True
789
+
790
+ def callback_set_mouse_loc(sender, app_data):
791
+ if not dpg.is_item_focused("_primary_window"):
792
+ return
793
+
794
+ # just the pixel coordinate in image
795
+ self.mouse_loc = np.array(app_data)
796
+
797
+ with dpg.handler_registry():
798
+ # for camera moving
799
+ dpg.add_mouse_drag_handler(
800
+ button=dpg.mvMouseButton_Left,
801
+ callback=callback_camera_drag_rotate_or_draw_mask,
802
+ )
803
+ dpg.add_mouse_wheel_handler(callback=callback_camera_wheel_scale)
804
+ dpg.add_mouse_drag_handler(
805
+ button=dpg.mvMouseButton_Middle, callback=callback_camera_drag_pan
806
+ )
807
+
808
+ dpg.create_viewport(
809
+ title="Gaussian3D",
810
+ width=self.W + 600,
811
+ height=self.H + (45 if os.name == "nt" else 0),
812
+ resizable=False,
813
+ )
814
+
815
+ ### global theme
816
+ with dpg.theme() as theme_no_padding:
817
+ with dpg.theme_component(dpg.mvAll):
818
+ # set all padding to 0 to avoid scroll bar
819
+ dpg.add_theme_style(
820
+ dpg.mvStyleVar_WindowPadding, 0, 0, category=dpg.mvThemeCat_Core
821
+ )
822
+ dpg.add_theme_style(
823
+ dpg.mvStyleVar_FramePadding, 0, 0, category=dpg.mvThemeCat_Core
824
+ )
825
+ dpg.add_theme_style(
826
+ dpg.mvStyleVar_CellPadding, 0, 0, category=dpg.mvThemeCat_Core
827
+ )
828
+
829
+ dpg.bind_item_theme("_primary_window", theme_no_padding)
830
+
831
+ dpg.setup_dearpygui()
832
+
833
+ ### register a larger font
834
+ # get it from: https://github.com/lxgw/LxgwWenKai/releases/download/v1.300/LXGWWenKai-Regular.ttf
835
+ if os.path.exists("LXGWWenKai-Regular.ttf"):
836
+ with dpg.font_registry():
837
+ with dpg.font("LXGWWenKai-Regular.ttf", 18) as default_font:
838
+ dpg.bind_font(default_font)
839
+
840
+ # dpg.show_metrics()
841
+
842
+ dpg.show_viewport()
843
+
844
+ def render(self):
845
+ assert self.gui
846
+ while dpg.is_dearpygui_running():
847
+ # update texture every frame
848
+ if self.training:
849
+ self.train_step()
850
+ self.test_step()
851
+ dpg.render_dearpygui_frame()
852
+
853
+ # no gui mode
854
+ def train(self, iters=500):
855
+ if iters > 0:
856
+ self.prepare_train()
857
+ for i in tqdm.trange(iters):
858
+ self.train_step()
859
+ # do a last prune
860
+ self.renderer.gaussians.prune(min_opacity=0.01, extent=1, max_screen_size=1)
861
+ # save
862
+ self.save_model(mode='model')
863
+ self.save_model(mode='geo+tex')
864
+
865
+
866
+ if __name__ == "__main__":
867
+ import argparse
868
+ from omegaconf import OmegaConf
869
+
870
+ parser = argparse.ArgumentParser()
871
+ parser.add_argument("--config", required=True, help="path to the yaml config file")
872
+ args, extras = parser.parse_known_args()
873
+
874
+ # override default config from cli
875
+ opt = OmegaConf.merge(OmegaConf.load(args.config), OmegaConf.from_cli(extras))
876
+
877
+ gui = GUI(opt)
878
+
879
+ if opt.gui:
880
+ gui.render()
881
+ else:
882
+ gui.train(opt.iters)
main2.py ADDED
@@ -0,0 +1,671 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ import time
4
+ import tqdm
5
+ import numpy as np
6
+ import dearpygui.dearpygui as dpg
7
+
8
+ import torch
9
+ import torch.nn.functional as F
10
+
11
+ import trimesh
12
+ import rembg
13
+
14
+ from cam_utils import orbit_camera, OrbitCamera
15
+ from mesh_renderer import Renderer
16
+
17
+ # from kiui.lpips import LPIPS
18
+
19
+ class GUI:
20
+ def __init__(self, opt):
21
+ self.opt = opt # shared with the trainer's opt to support in-place modification of rendering parameters.
22
+ self.gui = opt.gui # enable gui
23
+ self.W = opt.W
24
+ self.H = opt.H
25
+ self.cam = OrbitCamera(opt.W, opt.H, r=opt.radius, fovy=opt.fovy)
26
+
27
+ self.mode = "image"
28
+ self.seed = "random"
29
+
30
+ self.buffer_image = np.ones((self.W, self.H, 3), dtype=np.float32)
31
+ self.need_update = True # update buffer_image
32
+
33
+ # models
34
+ self.device = torch.device("cuda")
35
+ self.bg_remover = None
36
+
37
+ self.guidance_sd = None
38
+ self.guidance_zero123 = None
39
+
40
+ self.enable_sd = False
41
+ self.enable_zero123 = False
42
+
43
+ # renderer
44
+ self.renderer = Renderer(opt).to(self.device)
45
+
46
+ # input image
47
+ self.input_img = None
48
+ self.input_mask = None
49
+ self.input_img_torch = None
50
+ self.input_mask_torch = None
51
+ self.overlay_input_img = False
52
+ self.overlay_input_img_ratio = 0.5
53
+
54
+ # input text
55
+ self.prompt = ""
56
+ self.negative_prompt = ""
57
+
58
+ # training stuff
59
+ self.training = False
60
+ self.optimizer = None
61
+ self.step = 0
62
+ self.train_steps = 1 # steps per rendering loop
63
+ # self.lpips_loss = LPIPS(net='vgg').to(self.device)
64
+
65
+ # load input data from cmdline
66
+ if self.opt.input is not None:
67
+ self.load_input(self.opt.input)
68
+
69
+ # override prompt from cmdline
70
+ if self.opt.prompt is not None:
71
+ self.prompt = self.opt.prompt
72
+
73
+ if self.gui:
74
+ dpg.create_context()
75
+ self.register_dpg()
76
+ self.test_step()
77
+
78
+ def __del__(self):
79
+ if self.gui:
80
+ dpg.destroy_context()
81
+
82
+ def seed_everything(self):
83
+ try:
84
+ seed = int(self.seed)
85
+ except:
86
+ seed = np.random.randint(0, 1000000)
87
+
88
+ os.environ["PYTHONHASHSEED"] = str(seed)
89
+ np.random.seed(seed)
90
+ torch.manual_seed(seed)
91
+ torch.cuda.manual_seed(seed)
92
+ torch.backends.cudnn.deterministic = True
93
+ torch.backends.cudnn.benchmark = True
94
+
95
+ self.last_seed = seed
96
+
97
+ def prepare_train(self):
98
+
99
+ self.step = 0
100
+
101
+ # setup training
102
+ self.optimizer = torch.optim.Adam(self.renderer.get_params())
103
+
104
+ # default camera
105
+ pose = orbit_camera(self.opt.elevation, 0, self.opt.radius)
106
+ self.fixed_cam = (pose, self.cam.perspective)
107
+
108
+
109
+ self.enable_sd = self.opt.lambda_sd > 0 and self.prompt != ""
110
+ self.enable_zero123 = self.opt.lambda_zero123 > 0 and self.input_img is not None
111
+
112
+ # lazy load guidance model
113
+ if self.guidance_sd is None and self.enable_sd:
114
+ print(f"[INFO] loading SD...")
115
+ from guidance.sd_utils import StableDiffusion
116
+ self.guidance_sd = StableDiffusion(self.device)
117
+ print(f"[INFO] loaded SD!")
118
+
119
+ if self.guidance_zero123 is None and self.enable_zero123:
120
+ print(f"[INFO] loading zero123...")
121
+ from guidance.zero123_utils import Zero123
122
+ self.guidance_zero123 = Zero123(self.device)
123
+ print(f"[INFO] loaded zero123!")
124
+
125
+ # input image
126
+ if self.input_img is not None:
127
+ self.input_img_torch = torch.from_numpy(self.input_img).permute(2, 0, 1).unsqueeze(0).to(self.device)
128
+ self.input_img_torch = F.interpolate(
129
+ self.input_img_torch, (self.opt.ref_size, self.opt.ref_size), mode="bilinear", align_corners=False
130
+ )
131
+
132
+ self.input_mask_torch = torch.from_numpy(self.input_mask).permute(2, 0, 1).unsqueeze(0).to(self.device)
133
+ self.input_mask_torch = F.interpolate(
134
+ self.input_mask_torch, (self.opt.ref_size, self.opt.ref_size), mode="bilinear", align_corners=False
135
+ )
136
+ self.input_img_torch_channel_last = self.input_img_torch[0].permute(1,2,0).contiguous()
137
+
138
+ # prepare embeddings
139
+ with torch.no_grad():
140
+
141
+ if self.enable_sd:
142
+ self.guidance_sd.get_text_embeds([self.prompt], [self.negative_prompt])
143
+
144
+ if self.enable_zero123:
145
+ self.guidance_zero123.get_img_embeds(self.input_img_torch)
146
+
147
+ def train_step(self):
148
+ starter = torch.cuda.Event(enable_timing=True)
149
+ ender = torch.cuda.Event(enable_timing=True)
150
+ starter.record()
151
+
152
+
153
+ for _ in range(self.train_steps):
154
+
155
+ self.step += 1
156
+ step_ratio = min(1, self.step / self.opt.iters_refine)
157
+
158
+ loss = 0
159
+
160
+ ### known view
161
+ if self.input_img_torch is not None:
162
+
163
+ ssaa = min(2.0, max(0.125, 2 * np.random.random()))
164
+ out = self.renderer.render(*self.fixed_cam, self.opt.ref_size, self.opt.ref_size, ssaa=ssaa)
165
+
166
+ # rgb loss
167
+ image = out["image"] # [H, W, 3] in [0, 1]
168
+ valid_mask = ((out["alpha"] > 0) & (out["viewcos"] > 0.5)).detach()
169
+ loss = loss + F.mse_loss(image * valid_mask, self.input_img_torch_channel_last * valid_mask)
170
+
171
+ ### novel view (manual batch)
172
+ render_resolution = 512
173
+ images = []
174
+ vers, hors, radii = [], [], []
175
+ # avoid too large elevation (> 80 or < -80), and make sure it always cover [-30, 30]
176
+ min_ver = max(min(-30, -30 - self.opt.elevation), -80 - self.opt.elevation)
177
+ max_ver = min(max(30, 30 - self.opt.elevation), 80 - self.opt.elevation)
178
+ for _ in range(self.opt.batch_size):
179
+
180
+ # render random view
181
+ ver = np.random.randint(min_ver, max_ver)
182
+ hor = np.random.randint(-180, 180)
183
+ radius = 0
184
+
185
+ vers.append(ver)
186
+ hors.append(hor)
187
+ radii.append(radius)
188
+
189
+ pose = orbit_camera(self.opt.elevation + ver, hor, self.opt.radius + radius)
190
+
191
+ # random render resolution
192
+ ssaa = min(2.0, max(0.125, 2 * np.random.random()))
193
+ out = self.renderer.render(pose, self.cam.perspective, render_resolution, render_resolution, ssaa=ssaa)
194
+
195
+ image = out["image"] # [H, W, 3] in [0, 1]
196
+ image = image.permute(2,0,1).contiguous().unsqueeze(0) # [1, 3, H, W] in [0, 1]
197
+
198
+ images.append(image)
199
+
200
+ images = torch.cat(images, dim=0)
201
+
202
+ # import kiui
203
+ # kiui.lo(hor, ver)
204
+ # kiui.vis.plot_image(image)
205
+
206
+ # guidance loss
207
+ if self.enable_sd:
208
+
209
+ # loss = loss + self.opt.lambda_sd * self.guidance_sd.train_step(images, step_ratio)
210
+ refined_images = self.guidance_sd.refine(images, strength=0.6).float()
211
+ refined_images = F.interpolate(refined_images, (render_resolution, render_resolution), mode="bilinear", align_corners=False)
212
+ loss = loss + self.opt.lambda_sd * F.mse_loss(images, refined_images)
213
+
214
+ if self.enable_zero123:
215
+ # loss = loss + self.opt.lambda_zero123 * self.guidance_zero123.train_step(images, vers, hors, radii, step_ratio)
216
+ refined_images = self.guidance_zero123.refine(images, vers, hors, radii, strength=0.6).float()
217
+ refined_images = F.interpolate(refined_images, (render_resolution, render_resolution), mode="bilinear", align_corners=False)
218
+ loss = loss + self.opt.lambda_zero123 * F.mse_loss(images, refined_images)
219
+ # loss = loss + self.opt.lambda_zero123 * self.lpips_loss(images, refined_images)
220
+
221
+ # optimize step
222
+ loss.backward()
223
+ self.optimizer.step()
224
+ self.optimizer.zero_grad()
225
+
226
+ ender.record()
227
+ torch.cuda.synchronize()
228
+ t = starter.elapsed_time(ender)
229
+
230
+ self.need_update = True
231
+
232
+ if self.gui:
233
+ dpg.set_value("_log_train_time", f"{t:.4f}ms")
234
+ dpg.set_value(
235
+ "_log_train_log",
236
+ f"step = {self.step: 5d} (+{self.train_steps: 2d}) loss = {loss.item():.4f}",
237
+ )
238
+
239
+ # dynamic train steps (no need for now)
240
+ # max allowed train time per-frame is 500 ms
241
+ # full_t = t / self.train_steps * 16
242
+ # train_steps = min(16, max(4, int(16 * 500 / full_t)))
243
+ # if train_steps > self.train_steps * 1.2 or train_steps < self.train_steps * 0.8:
244
+ # self.train_steps = train_steps
245
+
246
+ @torch.no_grad()
247
+ def test_step(self):
248
+ # ignore if no need to update
249
+ if not self.need_update:
250
+ return
251
+
252
+ starter = torch.cuda.Event(enable_timing=True)
253
+ ender = torch.cuda.Event(enable_timing=True)
254
+ starter.record()
255
+
256
+ # should update image
257
+ if self.need_update:
258
+ # render image
259
+
260
+ out = self.renderer.render(self.cam.pose, self.cam.perspective, self.H, self.W)
261
+
262
+ buffer_image = out[self.mode] # [H, W, 3]
263
+
264
+ if self.mode in ['depth', 'alpha']:
265
+ buffer_image = buffer_image.repeat(1, 1, 3)
266
+ if self.mode == 'depth':
267
+ buffer_image = (buffer_image - buffer_image.min()) / (buffer_image.max() - buffer_image.min() + 1e-20)
268
+
269
+ self.buffer_image = buffer_image.contiguous().clamp(0, 1).detach().cpu().numpy()
270
+
271
+ # display input_image
272
+ if self.overlay_input_img and self.input_img is not None:
273
+ self.buffer_image = (
274
+ self.buffer_image * (1 - self.overlay_input_img_ratio)
275
+ + self.input_img * self.overlay_input_img_ratio
276
+ )
277
+
278
+ self.need_update = False
279
+
280
+ ender.record()
281
+ torch.cuda.synchronize()
282
+ t = starter.elapsed_time(ender)
283
+
284
+ if self.gui:
285
+ dpg.set_value("_log_infer_time", f"{t:.4f}ms ({int(1000/t)} FPS)")
286
+ dpg.set_value(
287
+ "_texture", self.buffer_image
288
+ ) # buffer must be contiguous, else seg fault!
289
+
290
+
291
+ def load_input(self, file):
292
+ # load image
293
+ print(f'[INFO] load image from {file}...')
294
+ img = cv2.imread(file, cv2.IMREAD_UNCHANGED)
295
+ if img.shape[-1] == 3:
296
+ if self.bg_remover is None:
297
+ self.bg_remover = rembg.new_session()
298
+ img = rembg.remove(img, session=self.bg_remover)
299
+
300
+ img = cv2.resize(
301
+ img, (self.W, self.H), interpolation=cv2.INTER_AREA
302
+ )
303
+ img = img.astype(np.float32) / 255.0
304
+
305
+ self.input_mask = img[..., 3:]
306
+ # white bg
307
+ self.input_img = img[..., :3] * self.input_mask + (
308
+ 1 - self.input_mask
309
+ )
310
+ # bgr to rgb
311
+ self.input_img = self.input_img[..., ::-1].copy()
312
+
313
+ # load prompt
314
+ file_prompt = file.replace("_rgba.png", "_caption.txt")
315
+ if os.path.exists(file_prompt):
316
+ print(f'[INFO] load prompt from {file_prompt}...')
317
+ with open(file_prompt, "r") as f:
318
+ self.prompt = f.read().strip()
319
+
320
+ def save_model(self):
321
+ os.makedirs(self.opt.outdir, exist_ok=True)
322
+
323
+ path = os.path.join(self.opt.outdir, self.opt.save_path + '.' + self.opt.mesh_format)
324
+ self.renderer.export_mesh(path)
325
+
326
+ print(f"[INFO] save model to {path}.")
327
+
328
+ def register_dpg(self):
329
+ ### register texture
330
+
331
+ with dpg.texture_registry(show=False):
332
+ dpg.add_raw_texture(
333
+ self.W,
334
+ self.H,
335
+ self.buffer_image,
336
+ format=dpg.mvFormat_Float_rgb,
337
+ tag="_texture",
338
+ )
339
+
340
+ ### register window
341
+
342
+ # the rendered image, as the primary window
343
+ with dpg.window(
344
+ tag="_primary_window",
345
+ width=self.W,
346
+ height=self.H,
347
+ pos=[0, 0],
348
+ no_move=True,
349
+ no_title_bar=True,
350
+ no_scrollbar=True,
351
+ ):
352
+ # add the texture
353
+ dpg.add_image("_texture")
354
+
355
+ # dpg.set_primary_window("_primary_window", True)
356
+
357
+ # control window
358
+ with dpg.window(
359
+ label="Control",
360
+ tag="_control_window",
361
+ width=600,
362
+ height=self.H,
363
+ pos=[self.W, 0],
364
+ no_move=True,
365
+ no_title_bar=True,
366
+ ):
367
+ # button theme
368
+ with dpg.theme() as theme_button:
369
+ with dpg.theme_component(dpg.mvButton):
370
+ dpg.add_theme_color(dpg.mvThemeCol_Button, (23, 3, 18))
371
+ dpg.add_theme_color(dpg.mvThemeCol_ButtonHovered, (51, 3, 47))
372
+ dpg.add_theme_color(dpg.mvThemeCol_ButtonActive, (83, 18, 83))
373
+ dpg.add_theme_style(dpg.mvStyleVar_FrameRounding, 5)
374
+ dpg.add_theme_style(dpg.mvStyleVar_FramePadding, 3, 3)
375
+
376
+ # timer stuff
377
+ with dpg.group(horizontal=True):
378
+ dpg.add_text("Infer time: ")
379
+ dpg.add_text("no data", tag="_log_infer_time")
380
+
381
+ def callback_setattr(sender, app_data, user_data):
382
+ setattr(self, user_data, app_data)
383
+
384
+ # init stuff
385
+ with dpg.collapsing_header(label="Initialize", default_open=True):
386
+
387
+ # seed stuff
388
+ def callback_set_seed(sender, app_data):
389
+ self.seed = app_data
390
+ self.seed_everything()
391
+
392
+ dpg.add_input_text(
393
+ label="seed",
394
+ default_value=self.seed,
395
+ on_enter=True,
396
+ callback=callback_set_seed,
397
+ )
398
+
399
+ # input stuff
400
+ def callback_select_input(sender, app_data):
401
+ # only one item
402
+ for k, v in app_data["selections"].items():
403
+ dpg.set_value("_log_input", k)
404
+ self.load_input(v)
405
+
406
+ self.need_update = True
407
+
408
+ with dpg.file_dialog(
409
+ directory_selector=False,
410
+ show=False,
411
+ callback=callback_select_input,
412
+ file_count=1,
413
+ tag="file_dialog_tag",
414
+ width=700,
415
+ height=400,
416
+ ):
417
+ dpg.add_file_extension("Images{.jpg,.jpeg,.png}")
418
+
419
+ with dpg.group(horizontal=True):
420
+ dpg.add_button(
421
+ label="input",
422
+ callback=lambda: dpg.show_item("file_dialog_tag"),
423
+ )
424
+ dpg.add_text("", tag="_log_input")
425
+
426
+ # overlay stuff
427
+ with dpg.group(horizontal=True):
428
+
429
+ def callback_toggle_overlay_input_img(sender, app_data):
430
+ self.overlay_input_img = not self.overlay_input_img
431
+ self.need_update = True
432
+
433
+ dpg.add_checkbox(
434
+ label="overlay image",
435
+ default_value=self.overlay_input_img,
436
+ callback=callback_toggle_overlay_input_img,
437
+ )
438
+
439
+ def callback_set_overlay_input_img_ratio(sender, app_data):
440
+ self.overlay_input_img_ratio = app_data
441
+ self.need_update = True
442
+
443
+ dpg.add_slider_float(
444
+ label="ratio",
445
+ min_value=0,
446
+ max_value=1,
447
+ format="%.1f",
448
+ default_value=self.overlay_input_img_ratio,
449
+ callback=callback_set_overlay_input_img_ratio,
450
+ )
451
+
452
+ # prompt stuff
453
+
454
+ dpg.add_input_text(
455
+ label="prompt",
456
+ default_value=self.prompt,
457
+ callback=callback_setattr,
458
+ user_data="prompt",
459
+ )
460
+
461
+ dpg.add_input_text(
462
+ label="negative",
463
+ default_value=self.negative_prompt,
464
+ callback=callback_setattr,
465
+ user_data="negative_prompt",
466
+ )
467
+
468
+ # save current model
469
+ with dpg.group(horizontal=True):
470
+ dpg.add_text("Save: ")
471
+
472
+ dpg.add_button(
473
+ label="model",
474
+ tag="_button_save_model",
475
+ callback=self.save_model,
476
+ )
477
+ dpg.bind_item_theme("_button_save_model", theme_button)
478
+
479
+ dpg.add_input_text(
480
+ label="",
481
+ default_value=self.opt.save_path,
482
+ callback=callback_setattr,
483
+ user_data="save_path",
484
+ )
485
+
486
+ # training stuff
487
+ with dpg.collapsing_header(label="Train", default_open=True):
488
+ # lr and train button
489
+ with dpg.group(horizontal=True):
490
+ dpg.add_text("Train: ")
491
+
492
+ def callback_train(sender, app_data):
493
+ if self.training:
494
+ self.training = False
495
+ dpg.configure_item("_button_train", label="start")
496
+ else:
497
+ self.prepare_train()
498
+ self.training = True
499
+ dpg.configure_item("_button_train", label="stop")
500
+
501
+ # dpg.add_button(
502
+ # label="init", tag="_button_init", callback=self.prepare_train
503
+ # )
504
+ # dpg.bind_item_theme("_button_init", theme_button)
505
+
506
+ dpg.add_button(
507
+ label="start", tag="_button_train", callback=callback_train
508
+ )
509
+ dpg.bind_item_theme("_button_train", theme_button)
510
+
511
+ with dpg.group(horizontal=True):
512
+ dpg.add_text("", tag="_log_train_time")
513
+ dpg.add_text("", tag="_log_train_log")
514
+
515
+ # rendering options
516
+ with dpg.collapsing_header(label="Rendering", default_open=True):
517
+ # mode combo
518
+ def callback_change_mode(sender, app_data):
519
+ self.mode = app_data
520
+ self.need_update = True
521
+
522
+ dpg.add_combo(
523
+ ("image", "depth", "alpha", "normal"),
524
+ label="mode",
525
+ default_value=self.mode,
526
+ callback=callback_change_mode,
527
+ )
528
+
529
+ # fov slider
530
+ def callback_set_fovy(sender, app_data):
531
+ self.cam.fovy = np.deg2rad(app_data)
532
+ self.need_update = True
533
+
534
+ dpg.add_slider_int(
535
+ label="FoV (vertical)",
536
+ min_value=1,
537
+ max_value=120,
538
+ format="%d deg",
539
+ default_value=np.rad2deg(self.cam.fovy),
540
+ callback=callback_set_fovy,
541
+ )
542
+
543
+ ### register camera handler
544
+
545
+ def callback_camera_drag_rotate_or_draw_mask(sender, app_data):
546
+ if not dpg.is_item_focused("_primary_window"):
547
+ return
548
+
549
+ dx = app_data[1]
550
+ dy = app_data[2]
551
+
552
+ self.cam.orbit(dx, dy)
553
+ self.need_update = True
554
+
555
+ def callback_camera_wheel_scale(sender, app_data):
556
+ if not dpg.is_item_focused("_primary_window"):
557
+ return
558
+
559
+ delta = app_data
560
+
561
+ self.cam.scale(delta)
562
+ self.need_update = True
563
+
564
+ def callback_camera_drag_pan(sender, app_data):
565
+ if not dpg.is_item_focused("_primary_window"):
566
+ return
567
+
568
+ dx = app_data[1]
569
+ dy = app_data[2]
570
+
571
+ self.cam.pan(dx, dy)
572
+ self.need_update = True
573
+
574
+ def callback_set_mouse_loc(sender, app_data):
575
+ if not dpg.is_item_focused("_primary_window"):
576
+ return
577
+
578
+ # just the pixel coordinate in image
579
+ self.mouse_loc = np.array(app_data)
580
+
581
+ with dpg.handler_registry():
582
+ # for camera moving
583
+ dpg.add_mouse_drag_handler(
584
+ button=dpg.mvMouseButton_Left,
585
+ callback=callback_camera_drag_rotate_or_draw_mask,
586
+ )
587
+ dpg.add_mouse_wheel_handler(callback=callback_camera_wheel_scale)
588
+ dpg.add_mouse_drag_handler(
589
+ button=dpg.mvMouseButton_Middle, callback=callback_camera_drag_pan
590
+ )
591
+
592
+ dpg.create_viewport(
593
+ title="Gaussian3D",
594
+ width=self.W + 600,
595
+ height=self.H + (45 if os.name == "nt" else 0),
596
+ resizable=False,
597
+ )
598
+
599
+ ### global theme
600
+ with dpg.theme() as theme_no_padding:
601
+ with dpg.theme_component(dpg.mvAll):
602
+ # set all padding to 0 to avoid scroll bar
603
+ dpg.add_theme_style(
604
+ dpg.mvStyleVar_WindowPadding, 0, 0, category=dpg.mvThemeCat_Core
605
+ )
606
+ dpg.add_theme_style(
607
+ dpg.mvStyleVar_FramePadding, 0, 0, category=dpg.mvThemeCat_Core
608
+ )
609
+ dpg.add_theme_style(
610
+ dpg.mvStyleVar_CellPadding, 0, 0, category=dpg.mvThemeCat_Core
611
+ )
612
+
613
+ dpg.bind_item_theme("_primary_window", theme_no_padding)
614
+
615
+ dpg.setup_dearpygui()
616
+
617
+ ### register a larger font
618
+ # get it from: https://github.com/lxgw/LxgwWenKai/releases/download/v1.300/LXGWWenKai-Regular.ttf
619
+ if os.path.exists("LXGWWenKai-Regular.ttf"):
620
+ with dpg.font_registry():
621
+ with dpg.font("LXGWWenKai-Regular.ttf", 18) as default_font:
622
+ dpg.bind_font(default_font)
623
+
624
+ # dpg.show_metrics()
625
+
626
+ dpg.show_viewport()
627
+
628
+ def render(self):
629
+ assert self.gui
630
+ while dpg.is_dearpygui_running():
631
+ # update texture every frame
632
+ if self.training:
633
+ self.train_step()
634
+ self.test_step()
635
+ dpg.render_dearpygui_frame()
636
+
637
+ # no gui mode
638
+ def train(self, iters=500):
639
+ if iters > 0:
640
+ self.prepare_train()
641
+ for i in tqdm.trange(iters):
642
+ self.train_step()
643
+ # save
644
+ self.save_model()
645
+
646
+
647
+ if __name__ == "__main__":
648
+ import argparse
649
+ from omegaconf import OmegaConf
650
+
651
+ parser = argparse.ArgumentParser()
652
+ parser.add_argument("--config", required=True, help="path to the yaml config file")
653
+ args, extras = parser.parse_known_args()
654
+
655
+ # override default config from cli
656
+ opt = OmegaConf.merge(OmegaConf.load(args.config), OmegaConf.from_cli(extras))
657
+
658
+ # auto find mesh from stage 1
659
+ if opt.mesh is None:
660
+ default_path = os.path.join(opt.outdir, opt.save_path + '_mesh.' + opt.mesh_format)
661
+ if os.path.exists(default_path):
662
+ opt.mesh = default_path
663
+ else:
664
+ raise ValueError(f"Cannot find mesh from {default_path}, must specify --mesh explicitly!")
665
+
666
+ gui = GUI(opt)
667
+
668
+ if opt.gui:
669
+ gui.render()
670
+ else:
671
+ gui.train(opt.iters_refine)
mesh.py ADDED
@@ -0,0 +1,622 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ import torch
4
+ import trimesh
5
+ import numpy as np
6
+
7
+ def dot(x, y):
8
+ return torch.sum(x * y, -1, keepdim=True)
9
+
10
+
11
+ def length(x, eps=1e-20):
12
+ return torch.sqrt(torch.clamp(dot(x, x), min=eps))
13
+
14
+
15
+ def safe_normalize(x, eps=1e-20):
16
+ return x / length(x, eps)
17
+
18
+ class Mesh:
19
+ def __init__(
20
+ self,
21
+ v=None,
22
+ f=None,
23
+ vn=None,
24
+ fn=None,
25
+ vt=None,
26
+ ft=None,
27
+ albedo=None,
28
+ vc=None, # vertex color
29
+ device=None,
30
+ ):
31
+ self.device = device
32
+ self.v = v
33
+ self.vn = vn
34
+ self.vt = vt
35
+ self.f = f
36
+ self.fn = fn
37
+ self.ft = ft
38
+ # only support a single albedo
39
+ self.albedo = albedo
40
+ # support vertex color is no albedo
41
+ self.vc = vc
42
+
43
+ self.ori_center = 0
44
+ self.ori_scale = 1
45
+
46
+ @classmethod
47
+ def load(cls, path=None, resize=True, renormal=True, retex=False, front_dir='+z', **kwargs):
48
+ # assume init with kwargs
49
+ if path is None:
50
+ mesh = cls(**kwargs)
51
+ # obj supports face uv
52
+ elif path.endswith(".obj"):
53
+ mesh = cls.load_obj(path, **kwargs)
54
+ # trimesh only supports vertex uv, but can load more formats
55
+ else:
56
+ mesh = cls.load_trimesh(path, **kwargs)
57
+
58
+ print(f"[Mesh loading] v: {mesh.v.shape}, f: {mesh.f.shape}")
59
+ # auto-normalize
60
+ if resize:
61
+ mesh.auto_size()
62
+ # auto-fix normal
63
+ if renormal or mesh.vn is None:
64
+ mesh.auto_normal()
65
+ print(f"[Mesh loading] vn: {mesh.vn.shape}, fn: {mesh.fn.shape}")
66
+ # auto-fix texcoords
67
+ if retex or (mesh.albedo is not None and mesh.vt is None):
68
+ mesh.auto_uv(cache_path=path)
69
+ print(f"[Mesh loading] vt: {mesh.vt.shape}, ft: {mesh.ft.shape}")
70
+
71
+ # rotate front dir to +z
72
+ if front_dir != "+z":
73
+ # axis switch
74
+ if "-z" in front_dir:
75
+ T = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, -1]], device=mesh.device, dtype=torch.float32)
76
+ elif "+x" in front_dir:
77
+ T = torch.tensor([[0, 0, 1], [0, 1, 0], [1, 0, 0]], device=mesh.device, dtype=torch.float32)
78
+ elif "-x" in front_dir:
79
+ T = torch.tensor([[0, 0, -1], [0, 1, 0], [1, 0, 0]], device=mesh.device, dtype=torch.float32)
80
+ elif "+y" in front_dir:
81
+ T = torch.tensor([[1, 0, 0], [0, 0, 1], [0, 1, 0]], device=mesh.device, dtype=torch.float32)
82
+ elif "-y" in front_dir:
83
+ T = torch.tensor([[1, 0, 0], [0, 0, -1], [0, 1, 0]], device=mesh.device, dtype=torch.float32)
84
+ else:
85
+ T = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]], device=mesh.device, dtype=torch.float32)
86
+ # rotation (how many 90 degrees)
87
+ if '1' in front_dir:
88
+ T @= torch.tensor([[0, -1, 0], [1, 0, 0], [0, 0, 1]], device=mesh.device, dtype=torch.float32)
89
+ elif '2' in front_dir:
90
+ T @= torch.tensor([[1, 0, 0], [0, -1, 0], [0, 0, 1]], device=mesh.device, dtype=torch.float32)
91
+ elif '3' in front_dir:
92
+ T @= torch.tensor([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], device=mesh.device, dtype=torch.float32)
93
+ mesh.v @= T
94
+ mesh.vn @= T
95
+
96
+ return mesh
97
+
98
+ # load from obj file
99
+ @classmethod
100
+ def load_obj(cls, path, albedo_path=None, device=None):
101
+ assert os.path.splitext(path)[-1] == ".obj"
102
+
103
+ mesh = cls()
104
+
105
+ # device
106
+ if device is None:
107
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
108
+
109
+ mesh.device = device
110
+
111
+ # load obj
112
+ with open(path, "r") as f:
113
+ lines = f.readlines()
114
+
115
+ def parse_f_v(fv):
116
+ # pass in a vertex term of a face, return {v, vt, vn} (-1 if not provided)
117
+ # supported forms:
118
+ # f v1 v2 v3
119
+ # f v1/vt1 v2/vt2 v3/vt3
120
+ # f v1/vt1/vn1 v2/vt2/vn2 v3/vt3/vn3
121
+ # f v1//vn1 v2//vn2 v3//vn3
122
+ xs = [int(x) - 1 if x != "" else -1 for x in fv.split("/")]
123
+ xs.extend([-1] * (3 - len(xs)))
124
+ return xs[0], xs[1], xs[2]
125
+
126
+ # NOTE: we ignore usemtl, and assume the mesh ONLY uses one material (first in mtl)
127
+ vertices, texcoords, normals = [], [], []
128
+ faces, tfaces, nfaces = [], [], []
129
+ mtl_path = None
130
+
131
+ for line in lines:
132
+ split_line = line.split()
133
+ # empty line
134
+ if len(split_line) == 0:
135
+ continue
136
+ prefix = split_line[0].lower()
137
+ # mtllib
138
+ if prefix == "mtllib":
139
+ mtl_path = split_line[1]
140
+ # usemtl
141
+ elif prefix == "usemtl":
142
+ pass # ignored
143
+ # v/vn/vt
144
+ elif prefix == "v":
145
+ vertices.append([float(v) for v in split_line[1:]])
146
+ elif prefix == "vn":
147
+ normals.append([float(v) for v in split_line[1:]])
148
+ elif prefix == "vt":
149
+ val = [float(v) for v in split_line[1:]]
150
+ texcoords.append([val[0], 1.0 - val[1]])
151
+ elif prefix == "f":
152
+ vs = split_line[1:]
153
+ nv = len(vs)
154
+ v0, t0, n0 = parse_f_v(vs[0])
155
+ for i in range(nv - 2): # triangulate (assume vertices are ordered)
156
+ v1, t1, n1 = parse_f_v(vs[i + 1])
157
+ v2, t2, n2 = parse_f_v(vs[i + 2])
158
+ faces.append([v0, v1, v2])
159
+ tfaces.append([t0, t1, t2])
160
+ nfaces.append([n0, n1, n2])
161
+
162
+ mesh.v = torch.tensor(vertices, dtype=torch.float32, device=device)
163
+ mesh.vt = (
164
+ torch.tensor(texcoords, dtype=torch.float32, device=device)
165
+ if len(texcoords) > 0
166
+ else None
167
+ )
168
+ mesh.vn = (
169
+ torch.tensor(normals, dtype=torch.float32, device=device)
170
+ if len(normals) > 0
171
+ else None
172
+ )
173
+
174
+ mesh.f = torch.tensor(faces, dtype=torch.int32, device=device)
175
+ mesh.ft = (
176
+ torch.tensor(tfaces, dtype=torch.int32, device=device)
177
+ if len(texcoords) > 0
178
+ else None
179
+ )
180
+ mesh.fn = (
181
+ torch.tensor(nfaces, dtype=torch.int32, device=device)
182
+ if len(normals) > 0
183
+ else None
184
+ )
185
+
186
+ # see if there is vertex color
187
+ use_vertex_color = False
188
+ if mesh.v.shape[1] == 6:
189
+ use_vertex_color = True
190
+ mesh.vc = mesh.v[:, 3:]
191
+ mesh.v = mesh.v[:, :3]
192
+ print(f"[load_obj] use vertex color: {mesh.vc.shape}")
193
+
194
+ # try to load texture image
195
+ if not use_vertex_color:
196
+ # try to retrieve mtl file
197
+ mtl_path_candidates = []
198
+ if mtl_path is not None:
199
+ mtl_path_candidates.append(mtl_path)
200
+ mtl_path_candidates.append(os.path.join(os.path.dirname(path), mtl_path))
201
+ mtl_path_candidates.append(path.replace(".obj", ".mtl"))
202
+
203
+ mtl_path = None
204
+ for candidate in mtl_path_candidates:
205
+ if os.path.exists(candidate):
206
+ mtl_path = candidate
207
+ break
208
+
209
+ # if albedo_path is not provided, try retrieve it from mtl
210
+ if mtl_path is not None and albedo_path is None:
211
+ with open(mtl_path, "r") as f:
212
+ lines = f.readlines()
213
+ for line in lines:
214
+ split_line = line.split()
215
+ # empty line
216
+ if len(split_line) == 0:
217
+ continue
218
+ prefix = split_line[0]
219
+ # NOTE: simply use the first map_Kd as albedo!
220
+ if "map_Kd" in prefix:
221
+ albedo_path = os.path.join(os.path.dirname(path), split_line[1])
222
+ print(f"[load_obj] use texture from: {albedo_path}")
223
+ break
224
+
225
+ # still not found albedo_path, or the path doesn't exist
226
+ if albedo_path is None or not os.path.exists(albedo_path):
227
+ # init an empty texture
228
+ print(f"[load_obj] init empty albedo!")
229
+ # albedo = np.random.rand(1024, 1024, 3).astype(np.float32)
230
+ albedo = np.ones((1024, 1024, 3), dtype=np.float32) * np.array([0.5, 0.5, 0.5]) # default color
231
+ else:
232
+ albedo = cv2.imread(albedo_path, cv2.IMREAD_UNCHANGED)
233
+ albedo = cv2.cvtColor(albedo, cv2.COLOR_BGR2RGB)
234
+ albedo = albedo.astype(np.float32) / 255
235
+ print(f"[load_obj] load texture: {albedo.shape}")
236
+
237
+ # import matplotlib.pyplot as plt
238
+ # plt.imshow(albedo)
239
+ # plt.show()
240
+
241
+ mesh.albedo = torch.tensor(albedo, dtype=torch.float32, device=device)
242
+
243
+ return mesh
244
+
245
+ @classmethod
246
+ def load_trimesh(cls, path, device=None):
247
+ mesh = cls()
248
+
249
+ # device
250
+ if device is None:
251
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
252
+
253
+ mesh.device = device
254
+
255
+ # use trimesh to load ply/glb, assume only has one single RootMesh...
256
+ _data = trimesh.load(path)
257
+ if isinstance(_data, trimesh.Scene):
258
+ if len(_data.geometry) == 1:
259
+ _mesh = list(_data.geometry.values())[0]
260
+ else:
261
+ # manual concat, will lose texture
262
+ _concat = []
263
+ for g in _data.geometry.values():
264
+ if isinstance(g, trimesh.Trimesh):
265
+ _concat.append(g)
266
+ _mesh = trimesh.util.concatenate(_concat)
267
+ else:
268
+ _mesh = _data
269
+
270
+ if _mesh.visual.kind == 'vertex':
271
+ vertex_colors = _mesh.visual.vertex_colors
272
+ vertex_colors = np.array(vertex_colors[..., :3]).astype(np.float32) / 255
273
+ mesh.vc = torch.tensor(vertex_colors, dtype=torch.float32, device=device)
274
+ print(f"[load_trimesh] use vertex color: {mesh.vc.shape}")
275
+ elif _mesh.visual.kind == 'texture':
276
+ _material = _mesh.visual.material
277
+ if isinstance(_material, trimesh.visual.material.PBRMaterial):
278
+ texture = np.array(_material.baseColorTexture).astype(np.float32) / 255
279
+ elif isinstance(_material, trimesh.visual.material.SimpleMaterial):
280
+ texture = np.array(_material.to_pbr().baseColorTexture).astype(np.float32) / 255
281
+ else:
282
+ raise NotImplementedError(f"material type {type(_material)} not supported!")
283
+ mesh.albedo = torch.tensor(texture, dtype=torch.float32, device=device)
284
+ print(f"[load_trimesh] load texture: {texture.shape}")
285
+ else:
286
+ texture = np.ones((1024, 1024, 3), dtype=np.float32) * np.array([0.5, 0.5, 0.5])
287
+ mesh.albedo = torch.tensor(texture, dtype=torch.float32, device=device)
288
+ print(f"[load_trimesh] failed to load texture.")
289
+
290
+ vertices = _mesh.vertices
291
+
292
+ try:
293
+ texcoords = _mesh.visual.uv
294
+ texcoords[:, 1] = 1 - texcoords[:, 1]
295
+ except Exception as e:
296
+ texcoords = None
297
+
298
+ try:
299
+ normals = _mesh.vertex_normals
300
+ except Exception as e:
301
+ normals = None
302
+
303
+ # trimesh only support vertex uv...
304
+ faces = tfaces = nfaces = _mesh.faces
305
+
306
+ mesh.v = torch.tensor(vertices, dtype=torch.float32, device=device)
307
+ mesh.vt = (
308
+ torch.tensor(texcoords, dtype=torch.float32, device=device)
309
+ if texcoords is not None
310
+ else None
311
+ )
312
+ mesh.vn = (
313
+ torch.tensor(normals, dtype=torch.float32, device=device)
314
+ if normals is not None
315
+ else None
316
+ )
317
+
318
+ mesh.f = torch.tensor(faces, dtype=torch.int32, device=device)
319
+ mesh.ft = (
320
+ torch.tensor(tfaces, dtype=torch.int32, device=device)
321
+ if texcoords is not None
322
+ else None
323
+ )
324
+ mesh.fn = (
325
+ torch.tensor(nfaces, dtype=torch.int32, device=device)
326
+ if normals is not None
327
+ else None
328
+ )
329
+
330
+ return mesh
331
+
332
+ # aabb
333
+ def aabb(self):
334
+ return torch.min(self.v, dim=0).values, torch.max(self.v, dim=0).values
335
+
336
+ # unit size
337
+ @torch.no_grad()
338
+ def auto_size(self):
339
+ vmin, vmax = self.aabb()
340
+ self.ori_center = (vmax + vmin) / 2
341
+ self.ori_scale = 1.2 / torch.max(vmax - vmin).item()
342
+ self.v = (self.v - self.ori_center) * self.ori_scale
343
+
344
+ def auto_normal(self):
345
+ i0, i1, i2 = self.f[:, 0].long(), self.f[:, 1].long(), self.f[:, 2].long()
346
+ v0, v1, v2 = self.v[i0, :], self.v[i1, :], self.v[i2, :]
347
+
348
+ face_normals = torch.cross(v1 - v0, v2 - v0)
349
+
350
+ # Splat face normals to vertices
351
+ vn = torch.zeros_like(self.v)
352
+ vn.scatter_add_(0, i0[:, None].repeat(1, 3), face_normals)
353
+ vn.scatter_add_(0, i1[:, None].repeat(1, 3), face_normals)
354
+ vn.scatter_add_(0, i2[:, None].repeat(1, 3), face_normals)
355
+
356
+ # Normalize, replace zero (degenerated) normals with some default value
357
+ vn = torch.where(
358
+ dot(vn, vn) > 1e-20,
359
+ vn,
360
+ torch.tensor([0.0, 0.0, 1.0], dtype=torch.float32, device=vn.device),
361
+ )
362
+ vn = safe_normalize(vn)
363
+
364
+ self.vn = vn
365
+ self.fn = self.f
366
+
367
+ def auto_uv(self, cache_path=None, vmap=True):
368
+ # try to load cache
369
+ if cache_path is not None:
370
+ cache_path = os.path.splitext(cache_path)[0] + "_uv.npz"
371
+ if cache_path is not None and os.path.exists(cache_path):
372
+ data = np.load(cache_path)
373
+ vt_np, ft_np, vmapping = data["vt"], data["ft"], data["vmapping"]
374
+ else:
375
+ import xatlas
376
+
377
+ v_np = self.v.detach().cpu().numpy()
378
+ f_np = self.f.detach().int().cpu().numpy()
379
+ atlas = xatlas.Atlas()
380
+ atlas.add_mesh(v_np, f_np)
381
+ chart_options = xatlas.ChartOptions()
382
+ # chart_options.max_iterations = 4
383
+ atlas.generate(chart_options=chart_options)
384
+ vmapping, ft_np, vt_np = atlas[0] # [N], [M, 3], [N, 2]
385
+
386
+ # save to cache
387
+ if cache_path is not None:
388
+ np.savez(cache_path, vt=vt_np, ft=ft_np, vmapping=vmapping)
389
+
390
+ vt = torch.from_numpy(vt_np.astype(np.float32)).to(self.device)
391
+ ft = torch.from_numpy(ft_np.astype(np.int32)).to(self.device)
392
+ self.vt = vt
393
+ self.ft = ft
394
+
395
+ if vmap:
396
+ # remap v/f to vt/ft, so each v correspond to a unique vt. (necessary for gltf)
397
+ vmapping = torch.from_numpy(vmapping.astype(np.int64)).long().to(self.device)
398
+ self.align_v_to_vt(vmapping)
399
+
400
+ def align_v_to_vt(self, vmapping=None):
401
+ # remap v/f and vn/vn to vt/ft.
402
+ if vmapping is None:
403
+ ft = self.ft.view(-1).long()
404
+ f = self.f.view(-1).long()
405
+ vmapping = torch.zeros(self.vt.shape[0], dtype=torch.long, device=self.device)
406
+ vmapping[ft] = f # scatter, randomly choose one if index is not unique
407
+
408
+ self.v = self.v[vmapping]
409
+ self.f = self.ft
410
+ # assume fn == f
411
+ if self.vn is not None:
412
+ self.vn = self.vn[vmapping]
413
+ self.fn = self.ft
414
+
415
+ def to(self, device):
416
+ self.device = device
417
+ for name in ["v", "f", "vn", "fn", "vt", "ft", "albedo"]:
418
+ tensor = getattr(self, name)
419
+ if tensor is not None:
420
+ setattr(self, name, tensor.to(device))
421
+ return self
422
+
423
+ def write(self, path):
424
+ if path.endswith(".ply"):
425
+ self.write_ply(path)
426
+ elif path.endswith(".obj"):
427
+ self.write_obj(path)
428
+ elif path.endswith(".glb") or path.endswith(".gltf"):
429
+ self.write_glb(path)
430
+ else:
431
+ raise NotImplementedError(f"format {path} not supported!")
432
+
433
+ # write to ply file (only geom)
434
+ def write_ply(self, path):
435
+
436
+ v_np = self.v.detach().cpu().numpy()
437
+ f_np = self.f.detach().cpu().numpy()
438
+
439
+ _mesh = trimesh.Trimesh(vertices=v_np, faces=f_np)
440
+ _mesh.export(path)
441
+
442
+ # write to gltf/glb file (geom + texture)
443
+ def write_glb(self, path):
444
+
445
+ assert self.vn is not None and self.vt is not None # should be improved to support export without texture...
446
+
447
+ # assert self.v.shape[0] == self.vn.shape[0] and self.v.shape[0] == self.vt.shape[0]
448
+ if self.v.shape[0] != self.vt.shape[0]:
449
+ self.align_v_to_vt()
450
+
451
+ # assume f == fn == ft
452
+
453
+ import pygltflib
454
+
455
+ f_np = self.f.detach().cpu().numpy().astype(np.uint32)
456
+ v_np = self.v.detach().cpu().numpy().astype(np.float32)
457
+ # vn_np = self.vn.detach().cpu().numpy().astype(np.float32)
458
+ vt_np = self.vt.detach().cpu().numpy().astype(np.float32)
459
+
460
+ albedo = self.albedo.detach().cpu().numpy()
461
+ albedo = (albedo * 255).astype(np.uint8)
462
+ albedo = cv2.cvtColor(albedo, cv2.COLOR_RGB2BGR)
463
+
464
+ f_np_blob = f_np.flatten().tobytes()
465
+ v_np_blob = v_np.tobytes()
466
+ # vn_np_blob = vn_np.tobytes()
467
+ vt_np_blob = vt_np.tobytes()
468
+ albedo_blob = cv2.imencode('.png', albedo)[1].tobytes()
469
+
470
+ gltf = pygltflib.GLTF2(
471
+ scene=0,
472
+ scenes=[pygltflib.Scene(nodes=[0])],
473
+ nodes=[pygltflib.Node(mesh=0)],
474
+ meshes=[pygltflib.Mesh(primitives=[
475
+ pygltflib.Primitive(
476
+ # indices to accessors (0 is triangles)
477
+ attributes=pygltflib.Attributes(
478
+ POSITION=1, TEXCOORD_0=2,
479
+ ),
480
+ indices=0, material=0,
481
+ )
482
+ ])],
483
+ materials=[
484
+ pygltflib.Material(
485
+ pbrMetallicRoughness=pygltflib.PbrMetallicRoughness(
486
+ baseColorTexture=pygltflib.TextureInfo(index=0, texCoord=0),
487
+ metallicFactor=0.0,
488
+ roughnessFactor=1.0,
489
+ ),
490
+ alphaCutoff=0,
491
+ doubleSided=True,
492
+ )
493
+ ],
494
+ textures=[
495
+ pygltflib.Texture(sampler=0, source=0),
496
+ ],
497
+ samplers=[
498
+ pygltflib.Sampler(magFilter=pygltflib.LINEAR, minFilter=pygltflib.LINEAR_MIPMAP_LINEAR, wrapS=pygltflib.REPEAT, wrapT=pygltflib.REPEAT),
499
+ ],
500
+ images=[
501
+ # use embedded (buffer) image
502
+ pygltflib.Image(bufferView=3, mimeType="image/png"),
503
+ ],
504
+ buffers=[
505
+ pygltflib.Buffer(byteLength=len(f_np_blob) + len(v_np_blob) + len(vt_np_blob) + len(albedo_blob))
506
+ ],
507
+ # buffer view (based on dtype)
508
+ bufferViews=[
509
+ # triangles; as flatten (element) array
510
+ pygltflib.BufferView(
511
+ buffer=0,
512
+ byteLength=len(f_np_blob),
513
+ target=pygltflib.ELEMENT_ARRAY_BUFFER, # GL_ELEMENT_ARRAY_BUFFER (34963)
514
+ ),
515
+ # positions; as vec3 array
516
+ pygltflib.BufferView(
517
+ buffer=0,
518
+ byteOffset=len(f_np_blob),
519
+ byteLength=len(v_np_blob),
520
+ byteStride=12, # vec3
521
+ target=pygltflib.ARRAY_BUFFER, # GL_ARRAY_BUFFER (34962)
522
+ ),
523
+ # texcoords; as vec2 array
524
+ pygltflib.BufferView(
525
+ buffer=0,
526
+ byteOffset=len(f_np_blob) + len(v_np_blob),
527
+ byteLength=len(vt_np_blob),
528
+ byteStride=8, # vec2
529
+ target=pygltflib.ARRAY_BUFFER,
530
+ ),
531
+ # texture; as none target
532
+ pygltflib.BufferView(
533
+ buffer=0,
534
+ byteOffset=len(f_np_blob) + len(v_np_blob) + len(vt_np_blob),
535
+ byteLength=len(albedo_blob),
536
+ ),
537
+ ],
538
+ accessors=[
539
+ # 0 = triangles
540
+ pygltflib.Accessor(
541
+ bufferView=0,
542
+ componentType=pygltflib.UNSIGNED_INT, # GL_UNSIGNED_INT (5125)
543
+ count=f_np.size,
544
+ type=pygltflib.SCALAR,
545
+ max=[int(f_np.max())],
546
+ min=[int(f_np.min())],
547
+ ),
548
+ # 1 = positions
549
+ pygltflib.Accessor(
550
+ bufferView=1,
551
+ componentType=pygltflib.FLOAT, # GL_FLOAT (5126)
552
+ count=len(v_np),
553
+ type=pygltflib.VEC3,
554
+ max=v_np.max(axis=0).tolist(),
555
+ min=v_np.min(axis=0).tolist(),
556
+ ),
557
+ # 2 = texcoords
558
+ pygltflib.Accessor(
559
+ bufferView=2,
560
+ componentType=pygltflib.FLOAT,
561
+ count=len(vt_np),
562
+ type=pygltflib.VEC2,
563
+ max=vt_np.max(axis=0).tolist(),
564
+ min=vt_np.min(axis=0).tolist(),
565
+ ),
566
+ ],
567
+ )
568
+
569
+ # set actual data
570
+ gltf.set_binary_blob(f_np_blob + v_np_blob + vt_np_blob + albedo_blob)
571
+
572
+ # glb = b"".join(gltf.save_to_bytes())
573
+ gltf.save(path)
574
+
575
+ # write to obj file (geom + texture)
576
+ def write_obj(self, path):
577
+
578
+ mtl_path = path.replace(".obj", ".mtl")
579
+ albedo_path = path.replace(".obj", "_albedo.png")
580
+
581
+ v_np = self.v.detach().cpu().numpy()
582
+ vt_np = self.vt.detach().cpu().numpy() if self.vt is not None else None
583
+ vn_np = self.vn.detach().cpu().numpy() if self.vn is not None else None
584
+ f_np = self.f.detach().cpu().numpy()
585
+ ft_np = self.ft.detach().cpu().numpy() if self.ft is not None else None
586
+ fn_np = self.fn.detach().cpu().numpy() if self.fn is not None else None
587
+
588
+ with open(path, "w") as fp:
589
+ fp.write(f"mtllib {os.path.basename(mtl_path)} \n")
590
+
591
+ for v in v_np:
592
+ fp.write(f"v {v[0]} {v[1]} {v[2]} \n")
593
+
594
+ if vt_np is not None:
595
+ for v in vt_np:
596
+ fp.write(f"vt {v[0]} {1 - v[1]} \n")
597
+
598
+ if vn_np is not None:
599
+ for v in vn_np:
600
+ fp.write(f"vn {v[0]} {v[1]} {v[2]} \n")
601
+
602
+ fp.write(f"usemtl defaultMat \n")
603
+ for i in range(len(f_np)):
604
+ fp.write(
605
+ f'f {f_np[i, 0] + 1}/{ft_np[i, 0] + 1 if ft_np is not None else ""}/{fn_np[i, 0] + 1 if fn_np is not None else ""} \
606
+ {f_np[i, 1] + 1}/{ft_np[i, 1] + 1 if ft_np is not None else ""}/{fn_np[i, 1] + 1 if fn_np is not None else ""} \
607
+ {f_np[i, 2] + 1}/{ft_np[i, 2] + 1 if ft_np is not None else ""}/{fn_np[i, 2] + 1 if fn_np is not None else ""} \n'
608
+ )
609
+
610
+ with open(mtl_path, "w") as fp:
611
+ fp.write(f"newmtl defaultMat \n")
612
+ fp.write(f"Ka 1 1 1 \n")
613
+ fp.write(f"Kd 1 1 1 \n")
614
+ fp.write(f"Ks 0 0 0 \n")
615
+ fp.write(f"Tr 1 \n")
616
+ fp.write(f"illum 1 \n")
617
+ fp.write(f"Ns 0 \n")
618
+ fp.write(f"map_Kd {os.path.basename(albedo_path)} \n")
619
+
620
+ albedo = self.albedo.detach().cpu().numpy()
621
+ albedo = (albedo * 255).astype(np.uint8)
622
+ cv2.imwrite(albedo_path, cv2.cvtColor(albedo, cv2.COLOR_RGB2BGR))
mesh_renderer.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import math
3
+ import cv2
4
+ import trimesh
5
+ import numpy as np
6
+
7
+ import torch
8
+ import torch.nn as nn
9
+ import torch.nn.functional as F
10
+
11
+ import nvdiffrast.torch as dr
12
+ from mesh import Mesh, safe_normalize
13
+
14
+ def scale_img_nhwc(x, size, mag='bilinear', min='bilinear'):
15
+ assert (x.shape[1] >= size[0] and x.shape[2] >= size[1]) or (x.shape[1] < size[0] and x.shape[2] < size[1]), "Trying to magnify image in one dimension and minify in the other"
16
+ y = x.permute(0, 3, 1, 2) # NHWC -> NCHW
17
+ if x.shape[1] > size[0] and x.shape[2] > size[1]: # Minification, previous size was bigger
18
+ y = torch.nn.functional.interpolate(y, size, mode=min)
19
+ else: # Magnification
20
+ if mag == 'bilinear' or mag == 'bicubic':
21
+ y = torch.nn.functional.interpolate(y, size, mode=mag, align_corners=True)
22
+ else:
23
+ y = torch.nn.functional.interpolate(y, size, mode=mag)
24
+ return y.permute(0, 2, 3, 1).contiguous() # NCHW -> NHWC
25
+
26
+ def scale_img_hwc(x, size, mag='bilinear', min='bilinear'):
27
+ return scale_img_nhwc(x[None, ...], size, mag, min)[0]
28
+
29
+ def scale_img_nhw(x, size, mag='bilinear', min='bilinear'):
30
+ return scale_img_nhwc(x[..., None], size, mag, min)[..., 0]
31
+
32
+ def scale_img_hw(x, size, mag='bilinear', min='bilinear'):
33
+ return scale_img_nhwc(x[None, ..., None], size, mag, min)[0, ..., 0]
34
+
35
+ def trunc_rev_sigmoid(x, eps=1e-6):
36
+ x = x.clamp(eps, 1 - eps)
37
+ return torch.log(x / (1 - x))
38
+
39
+ def make_divisible(x, m=8):
40
+ return int(math.ceil(x / m) * m)
41
+
42
+ class Renderer(nn.Module):
43
+ def __init__(self, opt):
44
+
45
+ super().__init__()
46
+
47
+ self.opt = opt
48
+
49
+ self.mesh = Mesh.load(self.opt.mesh, resize=False)
50
+
51
+ if not self.opt.force_cuda_rast and (not self.opt.gui or os.name == 'nt'):
52
+ self.glctx = dr.RasterizeGLContext()
53
+ else:
54
+ self.glctx = dr.RasterizeCudaContext()
55
+
56
+ # extract trainable parameters
57
+ self.v_offsets = nn.Parameter(torch.zeros_like(self.mesh.v))
58
+ self.raw_albedo = nn.Parameter(trunc_rev_sigmoid(self.mesh.albedo))
59
+
60
+
61
+ def get_params(self):
62
+
63
+ params = [
64
+ {'params': self.raw_albedo, 'lr': self.opt.texture_lr},
65
+ ]
66
+
67
+ if self.opt.train_geo:
68
+ params.append({'params': self.v_offsets, 'lr': self.opt.geom_lr})
69
+
70
+ return params
71
+
72
+ @torch.no_grad()
73
+ def export_mesh(self, save_path):
74
+ self.mesh.v = (self.mesh.v + self.v_offsets).detach()
75
+ self.mesh.albedo = torch.sigmoid(self.raw_albedo.detach())
76
+ self.mesh.write(save_path)
77
+
78
+
79
+ def render(self, pose, proj, h0, w0, ssaa=1, bg_color=1, texture_filter='linear-mipmap-linear'):
80
+
81
+ # do super-sampling
82
+ if ssaa != 1:
83
+ h = make_divisible(h0 * ssaa, 8)
84
+ w = make_divisible(w0 * ssaa, 8)
85
+ else:
86
+ h, w = h0, w0
87
+
88
+ results = {}
89
+
90
+ # get v
91
+ if self.opt.train_geo:
92
+ v = self.mesh.v + self.v_offsets # [N, 3]
93
+ else:
94
+ v = self.mesh.v
95
+
96
+ pose = torch.from_numpy(pose.astype(np.float32)).to(v.device)
97
+ proj = torch.from_numpy(proj.astype(np.float32)).to(v.device)
98
+
99
+ # get v_clip and render rgb
100
+ v_cam = torch.matmul(F.pad(v, pad=(0, 1), mode='constant', value=1.0), torch.inverse(pose).T).float().unsqueeze(0)
101
+ v_clip = v_cam @ proj.T
102
+
103
+ rast, rast_db = dr.rasterize(self.glctx, v_clip, self.mesh.f, (h, w))
104
+
105
+ alpha = (rast[0, ..., 3:] > 0).float()
106
+ depth, _ = dr.interpolate(-v_cam[..., [2]], rast, self.mesh.f) # [1, H, W, 1]
107
+ depth = depth.squeeze(0) # [H, W, 1]
108
+
109
+ texc, texc_db = dr.interpolate(self.mesh.vt.unsqueeze(0).contiguous(), rast, self.mesh.ft, rast_db=rast_db, diff_attrs='all')
110
+ albedo = dr.texture(self.raw_albedo.unsqueeze(0), texc, uv_da=texc_db, filter_mode=texture_filter) # [1, H, W, 3]
111
+ albedo = torch.sigmoid(albedo)
112
+ # get vn and render normal
113
+ if self.opt.train_geo:
114
+ i0, i1, i2 = self.mesh.f[:, 0].long(), self.mesh.f[:, 1].long(), self.mesh.f[:, 2].long()
115
+ v0, v1, v2 = v[i0, :], v[i1, :], v[i2, :]
116
+
117
+ face_normals = torch.cross(v1 - v0, v2 - v0)
118
+ face_normals = safe_normalize(face_normals)
119
+
120
+ vn = torch.zeros_like(v)
121
+ vn.scatter_add_(0, i0[:, None].repeat(1,3), face_normals)
122
+ vn.scatter_add_(0, i1[:, None].repeat(1,3), face_normals)
123
+ vn.scatter_add_(0, i2[:, None].repeat(1,3), face_normals)
124
+
125
+ vn = torch.where(torch.sum(vn * vn, -1, keepdim=True) > 1e-20, vn, torch.tensor([0.0, 0.0, 1.0], dtype=torch.float32, device=vn.device))
126
+ else:
127
+ vn = self.mesh.vn
128
+
129
+ normal, _ = dr.interpolate(vn.unsqueeze(0).contiguous(), rast, self.mesh.fn)
130
+ normal = safe_normalize(normal[0])
131
+
132
+ # rotated normal (where [0, 0, 1] always faces camera)
133
+ rot_normal = normal @ pose[:3, :3]
134
+ viewcos = rot_normal[..., [2]]
135
+
136
+ # antialias
137
+ albedo = dr.antialias(albedo, rast, v_clip, self.mesh.f).squeeze(0) # [H, W, 3]
138
+ albedo = alpha * albedo + (1 - alpha) * bg_color
139
+
140
+ # ssaa
141
+ if ssaa != 1:
142
+ albedo = scale_img_hwc(albedo, (h0, w0))
143
+ alpha = scale_img_hwc(alpha, (h0, w0))
144
+ depth = scale_img_hwc(depth, (h0, w0))
145
+ normal = scale_img_hwc(normal, (h0, w0))
146
+ viewcos = scale_img_hwc(viewcos, (h0, w0))
147
+
148
+ results['image'] = albedo.clamp(0, 1)
149
+ results['alpha'] = alpha
150
+ results['depth'] = depth
151
+ results['normal'] = (normal + 1) / 2
152
+ results['viewcos'] = viewcos
153
+
154
+ return results
mesh_utils.py ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import pymeshlab as pml
3
+
4
+
5
+ def poisson_mesh_reconstruction(points, normals=None):
6
+ # points/normals: [N, 3] np.ndarray
7
+
8
+ import open3d as o3d
9
+
10
+ pcd = o3d.geometry.PointCloud()
11
+ pcd.points = o3d.utility.Vector3dVector(points)
12
+
13
+ # outlier removal
14
+ pcd, ind = pcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=10)
15
+
16
+ # normals
17
+ if normals is None:
18
+ pcd.estimate_normals()
19
+ else:
20
+ pcd.normals = o3d.utility.Vector3dVector(normals[ind])
21
+
22
+ # visualize
23
+ o3d.visualization.draw_geometries([pcd], point_show_normal=False)
24
+
25
+ mesh, densities = o3d.geometry.TriangleMesh.create_from_point_cloud_poisson(
26
+ pcd, depth=9
27
+ )
28
+ vertices_to_remove = densities < np.quantile(densities, 0.1)
29
+ mesh.remove_vertices_by_mask(vertices_to_remove)
30
+
31
+ # visualize
32
+ o3d.visualization.draw_geometries([mesh])
33
+
34
+ vertices = np.asarray(mesh.vertices)
35
+ triangles = np.asarray(mesh.triangles)
36
+
37
+ print(
38
+ f"[INFO] poisson mesh reconstruction: {points.shape} --> {vertices.shape} / {triangles.shape}"
39
+ )
40
+
41
+ return vertices, triangles
42
+
43
+
44
+ def decimate_mesh(
45
+ verts, faces, target, backend="pymeshlab", remesh=False, optimalplacement=True
46
+ ):
47
+ # optimalplacement: default is True, but for flat mesh must turn False to prevent spike artifect.
48
+
49
+ _ori_vert_shape = verts.shape
50
+ _ori_face_shape = faces.shape
51
+
52
+ if backend == "pyfqmr":
53
+ import pyfqmr
54
+
55
+ solver = pyfqmr.Simplify()
56
+ solver.setMesh(verts, faces)
57
+ solver.simplify_mesh(target_count=target, preserve_border=False, verbose=False)
58
+ verts, faces, normals = solver.getMesh()
59
+ else:
60
+ m = pml.Mesh(verts, faces)
61
+ ms = pml.MeshSet()
62
+ ms.add_mesh(m, "mesh") # will copy!
63
+
64
+ # filters
65
+ # ms.meshing_decimation_clustering(threshold=pml.Percentage(1))
66
+ ms.meshing_decimation_quadric_edge_collapse(
67
+ targetfacenum=int(target), optimalplacement=optimalplacement
68
+ )
69
+
70
+ if remesh:
71
+ # ms.apply_coord_taubin_smoothing()
72
+ ms.meshing_isotropic_explicit_remeshing(
73
+ iterations=3, targetlen=pml.Percentage(1)
74
+ )
75
+
76
+ # extract mesh
77
+ m = ms.current_mesh()
78
+ verts = m.vertex_matrix()
79
+ faces = m.face_matrix()
80
+
81
+ print(
82
+ f"[INFO] mesh decimation: {_ori_vert_shape} --> {verts.shape}, {_ori_face_shape} --> {faces.shape}"
83
+ )
84
+
85
+ return verts, faces
86
+
87
+
88
+ def clean_mesh(
89
+ verts,
90
+ faces,
91
+ v_pct=1,
92
+ min_f=64,
93
+ min_d=20,
94
+ repair=True,
95
+ remesh=True,
96
+ remesh_size=0.01,
97
+ ):
98
+ # verts: [N, 3]
99
+ # faces: [N, 3]
100
+
101
+ _ori_vert_shape = verts.shape
102
+ _ori_face_shape = faces.shape
103
+
104
+ m = pml.Mesh(verts, faces)
105
+ ms = pml.MeshSet()
106
+ ms.add_mesh(m, "mesh") # will copy!
107
+
108
+ # filters
109
+ ms.meshing_remove_unreferenced_vertices() # verts not refed by any faces
110
+
111
+ if v_pct > 0:
112
+ ms.meshing_merge_close_vertices(
113
+ threshold=pml.Percentage(v_pct)
114
+ ) # 1/10000 of bounding box diagonal
115
+
116
+ ms.meshing_remove_duplicate_faces() # faces defined by the same verts
117
+ ms.meshing_remove_null_faces() # faces with area == 0
118
+
119
+ if min_d > 0:
120
+ ms.meshing_remove_connected_component_by_diameter(
121
+ mincomponentdiag=pml.Percentage(min_d)
122
+ )
123
+
124
+ if min_f > 0:
125
+ ms.meshing_remove_connected_component_by_face_number(mincomponentsize=min_f)
126
+
127
+ if repair:
128
+ # ms.meshing_remove_t_vertices(method=0, threshold=40, repeat=True)
129
+ ms.meshing_repair_non_manifold_edges(method=0)
130
+ ms.meshing_repair_non_manifold_vertices(vertdispratio=0)
131
+
132
+ if remesh:
133
+ # ms.apply_coord_taubin_smoothing()
134
+ ms.meshing_isotropic_explicit_remeshing(
135
+ iterations=3, targetlen=pml.AbsoluteValue(remesh_size)
136
+ )
137
+
138
+ # extract mesh
139
+ m = ms.current_mesh()
140
+ verts = m.vertex_matrix()
141
+ faces = m.face_matrix()
142
+
143
+ print(
144
+ f"[INFO] mesh cleaning: {_ori_vert_shape} --> {verts.shape}, {_ori_face_shape} --> {faces.shape}"
145
+ )
146
+
147
+ return verts, faces
process.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import glob
3
+ import sys
4
+ import cv2
5
+ import argparse
6
+ import numpy as np
7
+ import matplotlib.pyplot as plt
8
+
9
+ import torch
10
+ import torch.nn as nn
11
+ import torch.nn.functional as F
12
+ from torchvision import transforms
13
+ from PIL import Image
14
+ import rembg
15
+
16
+ class BLIP2():
17
+ def __init__(self, device='cuda'):
18
+ self.device = device
19
+ from transformers import AutoProcessor, Blip2ForConditionalGeneration
20
+ self.processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
21
+ self.model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16).to(device)
22
+
23
+ @torch.no_grad()
24
+ def __call__(self, image):
25
+ image = Image.fromarray(image)
26
+ inputs = self.processor(image, return_tensors="pt").to(self.device, torch.float16)
27
+
28
+ generated_ids = self.model.generate(**inputs, max_new_tokens=20)
29
+ generated_text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
30
+
31
+ return generated_text
32
+
33
+
34
+ if __name__ == '__main__':
35
+
36
+ parser = argparse.ArgumentParser()
37
+ parser.add_argument('path', type=str, help="path to image (png, jpeg, etc.)")
38
+ parser.add_argument('--model', default='u2net', type=str, help="rembg model, see https://github.com/danielgatis/rembg#models")
39
+ parser.add_argument('--size', default=256, type=int, help="output resolution")
40
+ parser.add_argument('--border_ratio', default=0.2, type=float, help="output border ratio")
41
+ parser.add_argument('--recenter', type=bool, default=True, help="recenter, potentially not helpful for multiview zero123")
42
+ opt = parser.parse_args()
43
+
44
+ session = rembg.new_session(model_name=opt.model)
45
+
46
+ if os.path.isdir(opt.path):
47
+ print(f'[INFO] processing directory {opt.path}...')
48
+ files = glob.glob(f'{opt.path}/*')
49
+ out_dir = opt.path
50
+ else: # isfile
51
+ files = [opt.path]
52
+ out_dir = os.path.dirname(opt.path)
53
+
54
+ for file in files:
55
+
56
+ out_base = os.path.basename(file).split('.')[0]
57
+ out_rgba = os.path.join(out_dir, out_base + '_rgba.png')
58
+
59
+ # load image
60
+ print(f'[INFO] loading image {file}...')
61
+ image = cv2.imread(file, cv2.IMREAD_UNCHANGED)
62
+
63
+ # carve background
64
+ print(f'[INFO] background removal...')
65
+ carved_image = rembg.remove(image, session=session) # [H, W, 4]
66
+ mask = carved_image[..., -1] > 0
67
+
68
+ # recenter
69
+ if opt.recenter:
70
+ print(f'[INFO] recenter...')
71
+ final_rgba = np.zeros((opt.size, opt.size, 4), dtype=np.uint8)
72
+
73
+ coords = np.nonzero(mask)
74
+ x_min, x_max = coords[0].min(), coords[0].max()
75
+ y_min, y_max = coords[1].min(), coords[1].max()
76
+ h = x_max - x_min
77
+ w = y_max - y_min
78
+ desired_size = int(opt.size * (1 - opt.border_ratio))
79
+ scale = desired_size / max(h, w)
80
+ h2 = int(h * scale)
81
+ w2 = int(w * scale)
82
+ x2_min = (opt.size - h2) // 2
83
+ x2_max = x2_min + h2
84
+ y2_min = (opt.size - w2) // 2
85
+ y2_max = y2_min + w2
86
+ final_rgba[x2_min:x2_max, y2_min:y2_max] = cv2.resize(carved_image[x_min:x_max, y_min:y_max], (w2, h2), interpolation=cv2.INTER_AREA)
87
+
88
+ else:
89
+ final_rgba = carved_image
90
+
91
+ # write image
92
+ cv2.imwrite(out_rgba, final_rgba)
readme.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DreamGaussian
2
+
3
+ This repository contains the official implementation for [DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation](https://arxiv.org/abs/2309.16653).
4
+
5
+ ### [Project Page](https://dreamgaussian.github.io) | [Arxiv](https://arxiv.org/abs/2309.16653)
6
+
7
+
8
+ https://github.com/dreamgaussian/dreamgaussian/assets/25863658/db860801-7b9c-4b30-9eb9-87330175f5c8
9
+
10
+ ### [Colab demo](https://github.com/camenduru/dreamgaussian-colab)
11
+ * Image-to-3D: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sLpYmmLS209-e5eHgcuqdryFRRO6ZhFS?usp=sharing)
12
+ * Text-to-3D: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/camenduru/dreamgaussian-colab/blob/main/dreamgaussian_colab.ipynb)
13
+
14
+ ### [Gradio demo](https://huggingface.co/spaces/jiawei011/dreamgaussian)
15
+ * Image-to-3D: <a href="https://huggingface.co/spaces/jiawei011/dreamgaussian"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange"></a>
16
+
17
+ ## Install
18
+ ```bash
19
+ pip install -r requirements.txt
20
+
21
+ # a modified gaussian splatting (+ depth, alpha rendering)
22
+ git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
23
+ pip install ./diff-gaussian-rasterization
24
+
25
+ # simple-knn
26
+ pip install ./simple-knn
27
+
28
+ # nvdiffrast
29
+ pip install git+https://github.com/NVlabs/nvdiffrast/
30
+
31
+ # kiuikit
32
+ pip install git+https://github.com/ashawkey/kiuikit
33
+ ```
34
+
35
+ Tested on:
36
+ * Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.
37
+ * Windows 10 with torch 2.1 & CUDA 12.1 on a 3070.
38
+
39
+ ## Usage
40
+
41
+ Image-to-3D:
42
+ ```bash
43
+ ### preprocess
44
+ # background removal and recentering, save rgba at 256x256
45
+ python process.py data/name.jpg
46
+
47
+ # save at a larger resolution
48
+ python process.py data/name.jpg --size 512
49
+
50
+ # process all jpg images under a dir
51
+ python process.py data
52
+
53
+ ### training gaussian stage
54
+ # train 500 iters (~1min) and export ckpt & coarse_mesh to logs
55
+ python main.py --config configs/image.yaml input=data/name_rgba.png save_path=name
56
+
57
+ # gui mode (supports visualizing training)
58
+ python main.py --config configs/image.yaml input=data/name_rgba.png save_path=name gui=True
59
+
60
+ # load and visualize a saved ckpt
61
+ python main.py --config configs/image.yaml load=logs/name_model.ply gui=True
62
+
63
+ # use an estimated elevation angle if image is not front-view (e.g., common looking-down image can use -30)
64
+ python main.py --config configs/image.yaml input=data/name_rgba.png save_path=name elevation=-30
65
+
66
+ ### training mesh stage
67
+ # auto load coarse_mesh and refine 50 iters (~1min), export fine_mesh to logs
68
+ python main2.py --config configs/image.yaml input=data/name_rgba.png save_path=name
69
+
70
+ # specify coarse mesh path explicity
71
+ python main2.py --config configs/image.yaml input=data/name_rgba.png save_path=name mesh=logs/name_mesh.obj
72
+
73
+ # gui mode
74
+ python main2.py --config configs/image.yaml input=data/name_rgba.png save_path=name gui=True
75
+
76
+ # export glb instead of obj
77
+ python main2.py --config configs/image.yaml input=data/name_rgba.png save_path=name mesh_format=glb
78
+
79
+ ### visualization
80
+ # gui for visualizing mesh
81
+ python -m kiui.render logs/name.obj
82
+
83
+ # save 360 degree video of mesh (can run without gui)
84
+ python -m kiui.render logs/name.obj --save_video name.mp4 --wogui
85
+
86
+ # save 8 view images of mesh (can run without gui)
87
+ python -m kiui.render logs/name.obj --save images/name/ --wogui
88
+
89
+ ### evaluation of CLIP-similarity
90
+ python -m kiui.cli.clip_sim data/name_rgba.png logs/name.obj
91
+ ```
92
+ Please check `./configs/image.yaml` for more options.
93
+
94
+ Text-to-3D:
95
+ ```bash
96
+ ### training gaussian stage
97
+ python main.py --config configs/text.yaml prompt="a photo of an icecream" save_path=icecream
98
+
99
+ ### training mesh stage
100
+ python main2.py --config configs/text.yaml prompt="a photo of an icecream" save_path=icecream
101
+ ```
102
+ Please check `./configs/text.yaml` for more options.
103
+
104
+ Helper scripts:
105
+ ```bash
106
+ # run all image samples (*_rgba.png) in ./data
107
+ python scripts/runall.py --dir ./data --gpu 0
108
+
109
+ # run all text samples (hardcoded in runall_sd.py)
110
+ python scripts/runall_sd.py --gpu 0
111
+
112
+ # export all ./logs/*.obj to mp4 in ./videos
113
+ python scripts/convert_obj_to_video.py --dir ./logs
114
+ ```
115
+
116
+ ### Gradio Demo
117
+ ```bash
118
+ python gradio_app.py
119
+ ```
120
+
121
+ ## Acknowledgement
122
+
123
+ This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!
124
+
125
+ * [gaussian-splatting](https://github.com/graphdeco-inria/gaussian-splatting) and [diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization)
126
+ * [threestudio](https://github.com/threestudio-project/threestudio)
127
+ * [nvdiffrast](https://github.com/NVlabs/nvdiffrast)
128
+ * [dearpygui](https://github.com/hoffstadt/DearPyGui)
129
+
130
+ ## Citation
131
+
132
+ ```
133
+ @article{tang2023dreamgaussian,
134
+ title={DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation},
135
+ author={Tang, Jiaxiang and Ren, Jiawei and Zhou, Hang and Liu, Ziwei and Zeng, Gang},
136
+ journal={arXiv preprint arXiv:2309.16653},
137
+ year={2023}
138
+ }
139
+ ```
requirements.txt ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ tqdm
2
+ rich
3
+ ninja
4
+ numpy
5
+ pandas
6
+ scipy
7
+ scikit-learn
8
+ matplotlib
9
+ opencv-python
10
+ imageio
11
+ imageio-ffmpeg
12
+ omegaconf
13
+
14
+ torch
15
+ einops
16
+ plyfile
17
+ pygltflib
18
+
19
+ # for gui
20
+ dearpygui
21
+
22
+ # for stable-diffusion
23
+ huggingface_hub
24
+ diffusers >= 0.9.0
25
+ accelerate
26
+ transformers
27
+
28
+ # for dmtet and mesh export
29
+ xatlas
30
+ trimesh
31
+ PyMCubes
32
+ pymeshlab
33
+
34
+ rembg[gpu,cli]
35
+
36
+ # gradio demo
37
+ gradio
scripts/convert_obj_to_video.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import glob
3
+ import argparse
4
+
5
+ parser = argparse.ArgumentParser()
6
+ parser.add_argument('--dir', default='logs', type=str, help='Directory where obj files are stored')
7
+ parser.add_argument('--out', default='videos', type=str, help='Directory where videos will be saved')
8
+ args = parser.parse_args()
9
+
10
+ out = args.out
11
+ os.makedirs(out, exist_ok=True)
12
+
13
+ files = glob.glob(f'{args.dir}/*.obj')
14
+ for f in files:
15
+ name = os.path.basename(f)
16
+ # first stage model, ignore
17
+ if name.endswith('_mesh.obj'):
18
+ continue
19
+ print(f'[INFO] process {name}')
20
+ os.system(f"python -m kiui.render {f} --save_video {os.path.join(out, name.replace('.obj', '.mp4'))} ")
scripts/run.sh ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ export CUDA_VISIBLE_DEVICES=5
2
+
3
+ python main.py --config configs/image.yaml input=data/anya_rgba.png save_path=anya
4
+ python main2.py --config configs/image.yaml input=data/anya_rgba.png save_path=anya
5
+ python -m kiui.render logs/anya.obj --save_video videos/anya.mp4 --wogui
scripts/run_sd.sh ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ export CUDA_VISIBLE_DEVICES=6
2
+
3
+ # easy samples
4
+ python main.py --config configs/text.yaml prompt="a photo of an icecream" save_path=icecream
5
+ python main2.py --config configs/text.yaml prompt="a photo of an icecream" save_path=icecream
6
+ python main.py --config configs/text.yaml prompt="a ripe strawberry" save_path=strawberry
7
+ python main2.py --config configs/text.yaml prompt="a ripe strawberry" save_path=strawberry
8
+ python main.py --config configs/text.yaml prompt="a blue tulip" save_path=tulip
9
+ python main2.py --config configs/text.yaml prompt="a blue tulip" save_path=tulip
10
+
11
+ python main.py --config configs/text.yaml prompt="a golden goblet" save_path=goblet
12
+ python main2.py --config configs/text.yaml prompt="a golden goblet" save_path=goblet
13
+ python main.py --config configs/text.yaml prompt="a photo of a hamburger" save_path=hamburger
14
+ python main2.py --config configs/text.yaml prompt="a photo of a hamburger" save_path=hamburger
15
+ python main.py --config configs/text.yaml prompt="a delicious croissant" save_path=croissant
16
+ python main2.py --config configs/text.yaml prompt="a delicious croissant" save_path=croissant
17
+
18
+ # hard samples
19
+ python main.py --config configs/text.yaml prompt="a baby bunny sitting on top of a stack of pancake" save_path=bunny_pancake
20
+ python main2.py --config configs/text.yaml prompt="a baby bunny sitting on top of a stack of pancake" save_path=bunny_pancake
21
+ python main.py --config configs/text.yaml prompt="a typewriter" save_path=typewriter
22
+ python main2.py --config configs/text.yaml prompt="a typewriter" save_path=typewriter
23
+ python main.py --config configs/text.yaml prompt="a pineapple" save_path=pineapple
24
+ python main2.py --config configs/text.yaml prompt="a pineapple" save_path=pineapple
25
+
26
+ python main.py --config configs/text.yaml prompt="a model of a house in Tudor style" save_path=tudor_house
27
+ python main2.py --config configs/text.yaml prompt="a model of a house in Tudor style" save_path=tudor_house
28
+ python main.py --config configs/text.yaml prompt="a lionfish" save_path=lionfish
29
+ python main2.py --config configs/text.yaml prompt="a lionfish" save_path=lionfish
30
+ python main.py --config configs/text.yaml prompt="a bunch of yellow rose, highly detailed" save_path=rose
31
+ python main2.py --config configs/text.yaml prompt="a bunch of yellow rose, highly detailed" save_path=rose
scripts/runall.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import glob
3
+ import argparse
4
+
5
+ parser = argparse.ArgumentParser()
6
+ parser.add_argument('--dir', default='data', type=str, help='Directory where processed images are stored')
7
+ parser.add_argument('--out', default='logs', type=str, help='Directory where obj files will be saved')
8
+ parser.add_argument('--video-out', default='videos', type=str, help='Directory where videos will be saved')
9
+ parser.add_argument('--gpu', default=0, type=int, help='ID of GPU to use')
10
+ parser.add_argument('--elevation', default=0, type=int, help='Elevation angle of view in degrees')
11
+ parser.add_argument('--config', default='configs', type=str, help='Path to config directory, which contains image.yaml')
12
+ args = parser.parse_args()
13
+
14
+ files = glob.glob(f'{args.dir}/*_rgba.png')
15
+ configs_dir = args.config
16
+
17
+ # check if image.yaml exists
18
+ if not os.path.exists(os.path.join(configs_dir, 'image.yaml')):
19
+ raise FileNotFoundError(
20
+ f'image.yaml not found in {configs_dir} directory. Please check if the directory is correct.'
21
+ )
22
+
23
+ # create output directories if not exists
24
+ out_dir = args.out
25
+ os.makedirs(out_dir, exist_ok=True)
26
+ video_dir = args.video_out
27
+ os.makedirs(video_dir, exist_ok=True)
28
+
29
+
30
+ for file in files:
31
+ name = os.path.basename(file).replace("_rgba.png", "")
32
+ print(f'======== processing {name} ========')
33
+ # first stage
34
+ os.system(f'CUDA_VISIBLE_DEVICES={args.gpu} python main.py '
35
+ f'--config {configs_dir}/image.yaml '
36
+ f'input={file} '
37
+ f'save_path={name} elevation={args.elevation}')
38
+ # second stage
39
+ os.system(f'CUDA_VISIBLE_DEVICES={args.gpu} python main2.py '
40
+ f'--config {configs_dir}/image.yaml '
41
+ f'input={file} '
42
+ f'save_path={name} elevation={args.elevation}')
43
+ # export video
44
+ mesh_path = os.path.join(out_dir, f'{name}.obj')
45
+ os.system(f'python -m kiui.render {mesh_path} '
46
+ f'--save_video {video_dir}/{name}.mp4 '
47
+ f'--wogui '
48
+ f'--elevation {args.elevation}')
scripts/runall_sd.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import glob
3
+ import argparse
4
+
5
+ parser = argparse.ArgumentParser()
6
+ parser.add_argument('--gpu', default=0, type=int)
7
+ args = parser.parse_args()
8
+
9
+ prompts = [
10
+ ('strawberry', 'a ripe strawberry'),
11
+ ('cactus_pot', 'a small saguaro cactus planted in a clay pot'),
12
+ ('hamburger', 'a delicious hamburger'),
13
+ ('icecream', 'an icecream'),
14
+ ('tulip', 'a blue tulip'),
15
+ ('pineapple', 'a ripe pineapple'),
16
+ ('goblet', 'a golden goblet'),
17
+ # ('squitopus', 'a squirrel-octopus hybrid'),
18
+ # ('astronaut', 'Michelangelo style statue of an astronaut'),
19
+ # ('teddy_bear', 'a teddy bear'),
20
+ # ('corgi_nurse', 'a plush toy of a corgi nurse'),
21
+ # ('teapot', 'a blue and white porcelain teapot'),
22
+ # ('skull', "a human skull"),
23
+ # ('penguin', 'a penguin'),
24
+ # ('campfire', 'a campfire'),
25
+ # ('donut', 'a donut with pink icing'),
26
+ # ('cupcake', 'a birthday cupcake'),
27
+ # ('pie', 'shepherds pie'),
28
+ # ('cone', 'a traffic cone'),
29
+ # ('schoolbus', 'a schoolbus'),
30
+ # ('avocado_chair', 'a chair that looks like an avocado'),
31
+ # ('glasses', 'a pair of sunglasses')
32
+ # ('potion', 'a bottle of green potion'),
33
+ # ('chalice', 'a delicate chalice'),
34
+ ]
35
+
36
+ for name, prompt in prompts:
37
+ print(f'======== processing {name} ========')
38
+ # first stage
39
+ os.system(f'CUDA_VISIBLE_DEVICES={args.gpu} python main.py --config configs/text.yaml prompt="{prompt}" save_path={name}')
40
+ # second stage
41
+ os.system(f'CUDA_VISIBLE_DEVICES={args.gpu} python main2.py --config configs/text.yaml prompt="{prompt}" save_path={name}')
42
+ # export video
43
+ mesh_path = os.path.join('logs', f'{name}.obj')
44
+ os.makedirs('videos', exist_ok=True)
45
+ os.system(f'python -m kiui.render {mesh_path} --save_video videos/{name}.mp4 --wogui')
sh_utils.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2021 The PlenOctree Authors.
2
+ # Redistribution and use in source and binary forms, with or without
3
+ # modification, are permitted provided that the following conditions are met:
4
+ #
5
+ # 1. Redistributions of source code must retain the above copyright notice,
6
+ # this list of conditions and the following disclaimer.
7
+ #
8
+ # 2. Redistributions in binary form must reproduce the above copyright notice,
9
+ # this list of conditions and the following disclaimer in the documentation
10
+ # and/or other materials provided with the distribution.
11
+ #
12
+ # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
13
+ # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14
+ # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15
+ # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
16
+ # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
17
+ # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
18
+ # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
19
+ # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
20
+ # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
21
+ # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
22
+ # POSSIBILITY OF SUCH DAMAGE.
23
+
24
+ import torch
25
+
26
+ C0 = 0.28209479177387814
27
+ C1 = 0.4886025119029199
28
+ C2 = [
29
+ 1.0925484305920792,
30
+ -1.0925484305920792,
31
+ 0.31539156525252005,
32
+ -1.0925484305920792,
33
+ 0.5462742152960396
34
+ ]
35
+ C3 = [
36
+ -0.5900435899266435,
37
+ 2.890611442640554,
38
+ -0.4570457994644658,
39
+ 0.3731763325901154,
40
+ -0.4570457994644658,
41
+ 1.445305721320277,
42
+ -0.5900435899266435
43
+ ]
44
+ C4 = [
45
+ 2.5033429417967046,
46
+ -1.7701307697799304,
47
+ 0.9461746957575601,
48
+ -0.6690465435572892,
49
+ 0.10578554691520431,
50
+ -0.6690465435572892,
51
+ 0.47308734787878004,
52
+ -1.7701307697799304,
53
+ 0.6258357354491761,
54
+ ]
55
+
56
+
57
+ def eval_sh(deg, sh, dirs):
58
+ """
59
+ Evaluate spherical harmonics at unit directions
60
+ using hardcoded SH polynomials.
61
+ Works with torch/np/jnp.
62
+ ... Can be 0 or more batch dimensions.
63
+ Args:
64
+ deg: int SH deg. Currently, 0-3 supported
65
+ sh: jnp.ndarray SH coeffs [..., C, (deg + 1) ** 2]
66
+ dirs: jnp.ndarray unit directions [..., 3]
67
+ Returns:
68
+ [..., C]
69
+ """
70
+ assert deg <= 4 and deg >= 0
71
+ coeff = (deg + 1) ** 2
72
+ assert sh.shape[-1] >= coeff
73
+
74
+ result = C0 * sh[..., 0]
75
+ if deg > 0:
76
+ x, y, z = dirs[..., 0:1], dirs[..., 1:2], dirs[..., 2:3]
77
+ result = (result -
78
+ C1 * y * sh[..., 1] +
79
+ C1 * z * sh[..., 2] -
80
+ C1 * x * sh[..., 3])
81
+
82
+ if deg > 1:
83
+ xx, yy, zz = x * x, y * y, z * z
84
+ xy, yz, xz = x * y, y * z, x * z
85
+ result = (result +
86
+ C2[0] * xy * sh[..., 4] +
87
+ C2[1] * yz * sh[..., 5] +
88
+ C2[2] * (2.0 * zz - xx - yy) * sh[..., 6] +
89
+ C2[3] * xz * sh[..., 7] +
90
+ C2[4] * (xx - yy) * sh[..., 8])
91
+
92
+ if deg > 2:
93
+ result = (result +
94
+ C3[0] * y * (3 * xx - yy) * sh[..., 9] +
95
+ C3[1] * xy * z * sh[..., 10] +
96
+ C3[2] * y * (4 * zz - xx - yy)* sh[..., 11] +
97
+ C3[3] * z * (2 * zz - 3 * xx - 3 * yy) * sh[..., 12] +
98
+ C3[4] * x * (4 * zz - xx - yy) * sh[..., 13] +
99
+ C3[5] * z * (xx - yy) * sh[..., 14] +
100
+ C3[6] * x * (xx - 3 * yy) * sh[..., 15])
101
+
102
+ if deg > 3:
103
+ result = (result + C4[0] * xy * (xx - yy) * sh[..., 16] +
104
+ C4[1] * yz * (3 * xx - yy) * sh[..., 17] +
105
+ C4[2] * xy * (7 * zz - 1) * sh[..., 18] +
106
+ C4[3] * yz * (7 * zz - 3) * sh[..., 19] +
107
+ C4[4] * (zz * (35 * zz - 30) + 3) * sh[..., 20] +
108
+ C4[5] * xz * (7 * zz - 3) * sh[..., 21] +
109
+ C4[6] * (xx - yy) * (7 * zz - 1) * sh[..., 22] +
110
+ C4[7] * xz * (xx - 3 * yy) * sh[..., 23] +
111
+ C4[8] * (xx * (xx - 3 * yy) - yy * (3 * xx - yy)) * sh[..., 24])
112
+ return result
113
+
114
+ def RGB2SH(rgb):
115
+ return (rgb - 0.5) / C0
116
+
117
+ def SH2RGB(sh):
118
+ return sh * C0 + 0.5
simple-knn/ext.cpp ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*
2
+ * Copyright (C) 2023, Inria
3
+ * GRAPHDECO research group, https://team.inria.fr/graphdeco
4
+ * All rights reserved.
5
+ *
6
+ * This software is free for non-commercial, research and evaluation use
7
+ * under the terms of the LICENSE.md file.
8
+ *
9
+ * For inquiries contact george.drettakis@inria.fr
10
+ */
11
+
12
+ #include <torch/extension.h>
13
+ #include "spatial.h"
14
+
15
+ PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
16
+ m.def("distCUDA2", &distCUDA2);
17
+ }
simple-knn/setup.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #
2
+ # Copyright (C) 2023, Inria
3
+ # GRAPHDECO research group, https://team.inria.fr/graphdeco
4
+ # All rights reserved.
5
+ #
6
+ # This software is free for non-commercial, research and evaluation use
7
+ # under the terms of the LICENSE.md file.
8
+ #
9
+ # For inquiries contact george.drettakis@inria.fr
10
+ #
11
+
12
+ from setuptools import setup
13
+ from torch.utils.cpp_extension import CUDAExtension, BuildExtension
14
+ import os
15
+
16
+ cxx_compiler_flags = []
17
+
18
+ if os.name == 'nt':
19
+ cxx_compiler_flags.append("/wd4624")
20
+
21
+ setup(
22
+ name="simple_knn",
23
+ ext_modules=[
24
+ CUDAExtension(
25
+ name="simple_knn._C",
26
+ sources=[
27
+ "spatial.cu",
28
+ "simple_knn.cu",
29
+ "ext.cpp"],
30
+ extra_compile_args={"nvcc": [], "cxx": cxx_compiler_flags})
31
+ ],
32
+ cmdclass={
33
+ 'build_ext': BuildExtension
34
+ }
35
+ )
simple-knn/simple_knn.cu ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*
2
+ * Copyright (C) 2023, Inria
3
+ * GRAPHDECO research group, https://team.inria.fr/graphdeco
4
+ * All rights reserved.
5
+ *
6
+ * This software is free for non-commercial, research and evaluation use
7
+ * under the terms of the LICENSE.md file.
8
+ *
9
+ * For inquiries contact george.drettakis@inria.fr
10
+ */
11
+
12
+ #define BOX_SIZE 1024
13
+
14
+ #include "cuda_runtime.h"
15
+ #include "device_launch_parameters.h"
16
+ #include "simple_knn.h"
17
+ #include <cub/cub.cuh>
18
+ #include <cub/device/device_radix_sort.cuh>
19
+ #include <vector>
20
+ #include <cuda_runtime_api.h>
21
+ #include <thrust/device_vector.h>
22
+ #include <thrust/sequence.h>
23
+ #define __CUDACC__
24
+ #include <cooperative_groups.h>
25
+ #include <cooperative_groups/reduce.h>
26
+
27
+ namespace cg = cooperative_groups;
28
+
29
+ struct CustomMin
30
+ {
31
+ __device__ __forceinline__
32
+ float3 operator()(const float3& a, const float3& b) const {
33
+ return { min(a.x, b.x), min(a.y, b.y), min(a.z, b.z) };
34
+ }
35
+ };
36
+
37
+ struct CustomMax
38
+ {
39
+ __device__ __forceinline__
40
+ float3 operator()(const float3& a, const float3& b) const {
41
+ return { max(a.x, b.x), max(a.y, b.y), max(a.z, b.z) };
42
+ }
43
+ };
44
+
45
+ __host__ __device__ uint32_t prepMorton(uint32_t x)
46
+ {
47
+ x = (x | (x << 16)) & 0x030000FF;
48
+ x = (x | (x << 8)) & 0x0300F00F;
49
+ x = (x | (x << 4)) & 0x030C30C3;
50
+ x = (x | (x << 2)) & 0x09249249;
51
+ return x;
52
+ }
53
+
54
+ __host__ __device__ uint32_t coord2Morton(float3 coord, float3 minn, float3 maxx)
55
+ {
56
+ uint32_t x = prepMorton(((coord.x - minn.x) / (maxx.x - minn.x)) * ((1 << 10) - 1));
57
+ uint32_t y = prepMorton(((coord.y - minn.y) / (maxx.y - minn.y)) * ((1 << 10) - 1));
58
+ uint32_t z = prepMorton(((coord.z - minn.z) / (maxx.z - minn.z)) * ((1 << 10) - 1));
59
+
60
+ return x | (y << 1) | (z << 2);
61
+ }
62
+
63
+ __global__ void coord2Morton(int P, const float3* points, float3 minn, float3 maxx, uint32_t* codes)
64
+ {
65
+ auto idx = cg::this_grid().thread_rank();
66
+ if (idx >= P)
67
+ return;
68
+
69
+ codes[idx] = coord2Morton(points[idx], minn, maxx);
70
+ }
71
+
72
+ struct MinMax
73
+ {
74
+ float3 minn;
75
+ float3 maxx;
76
+ };
77
+
78
+ __global__ void boxMinMax(uint32_t P, float3* points, uint32_t* indices, MinMax* boxes)
79
+ {
80
+ auto idx = cg::this_grid().thread_rank();
81
+
82
+ MinMax me;
83
+ if (idx < P)
84
+ {
85
+ me.minn = points[indices[idx]];
86
+ me.maxx = points[indices[idx]];
87
+ }
88
+ else
89
+ {
90
+ me.minn = { FLT_MAX, FLT_MAX, FLT_MAX };
91
+ me.maxx = { -FLT_MAX,-FLT_MAX,-FLT_MAX };
92
+ }
93
+
94
+ __shared__ MinMax redResult[BOX_SIZE];
95
+
96
+ for (int off = BOX_SIZE / 2; off >= 1; off /= 2)
97
+ {
98
+ if (threadIdx.x < 2 * off)
99
+ redResult[threadIdx.x] = me;
100
+ __syncthreads();
101
+
102
+ if (threadIdx.x < off)
103
+ {
104
+ MinMax other = redResult[threadIdx.x + off];
105
+ me.minn.x = min(me.minn.x, other.minn.x);
106
+ me.minn.y = min(me.minn.y, other.minn.y);
107
+ me.minn.z = min(me.minn.z, other.minn.z);
108
+ me.maxx.x = max(me.maxx.x, other.maxx.x);
109
+ me.maxx.y = max(me.maxx.y, other.maxx.y);
110
+ me.maxx.z = max(me.maxx.z, other.maxx.z);
111
+ }
112
+ __syncthreads();
113
+ }
114
+
115
+ if (threadIdx.x == 0)
116
+ boxes[blockIdx.x] = me;
117
+ }
118
+
119
+ __device__ __host__ float distBoxPoint(const MinMax& box, const float3& p)
120
+ {
121
+ float3 diff = { 0, 0, 0 };
122
+ if (p.x < box.minn.x || p.x > box.maxx.x)
123
+ diff.x = min(abs(p.x - box.minn.x), abs(p.x - box.maxx.x));
124
+ if (p.y < box.minn.y || p.y > box.maxx.y)
125
+ diff.y = min(abs(p.y - box.minn.y), abs(p.y - box.maxx.y));
126
+ if (p.z < box.minn.z || p.z > box.maxx.z)
127
+ diff.z = min(abs(p.z - box.minn.z), abs(p.z - box.maxx.z));
128
+ return diff.x * diff.x + diff.y * diff.y + diff.z * diff.z;
129
+ }
130
+
131
+ template<int K>
132
+ __device__ void updateKBest(const float3& ref, const float3& point, float* knn)
133
+ {
134
+ float3 d = { point.x - ref.x, point.y - ref.y, point.z - ref.z };
135
+ float dist = d.x * d.x + d.y * d.y + d.z * d.z;
136
+ for (int j = 0; j < K; j++)
137
+ {
138
+ if (knn[j] > dist)
139
+ {
140
+ float t = knn[j];
141
+ knn[j] = dist;
142
+ dist = t;
143
+ }
144
+ }
145
+ }
146
+
147
+ __global__ void boxMeanDist(uint32_t P, float3* points, uint32_t* indices, MinMax* boxes, float* dists)
148
+ {
149
+ int idx = cg::this_grid().thread_rank();
150
+ if (idx >= P)
151
+ return;
152
+
153
+ float3 point = points[indices[idx]];
154
+ float best[3] = { FLT_MAX, FLT_MAX, FLT_MAX };
155
+
156
+ for (int i = max(0, idx - 3); i <= min(P - 1, idx + 3); i++)
157
+ {
158
+ if (i == idx)
159
+ continue;
160
+ updateKBest<3>(point, points[indices[i]], best);
161
+ }
162
+
163
+ float reject = best[2];
164
+ best[0] = FLT_MAX;
165
+ best[1] = FLT_MAX;
166
+ best[2] = FLT_MAX;
167
+
168
+ for (int b = 0; b < (P + BOX_SIZE - 1) / BOX_SIZE; b++)
169
+ {
170
+ MinMax box = boxes[b];
171
+ float dist = distBoxPoint(box, point);
172
+ if (dist > reject || dist > best[2])
173
+ continue;
174
+
175
+ for (int i = b * BOX_SIZE; i < min(P, (b + 1) * BOX_SIZE); i++)
176
+ {
177
+ if (i == idx)
178
+ continue;
179
+ updateKBest<3>(point, points[indices[i]], best);
180
+ }
181
+ }
182
+ dists[indices[idx]] = (best[0] + best[1] + best[2]) / 3.0f;
183
+ }
184
+
185
+ void SimpleKNN::knn(int P, float3* points, float* meanDists)
186
+ {
187
+ float3* result;
188
+ cudaMalloc(&result, sizeof(float3));
189
+ size_t temp_storage_bytes;
190
+
191
+ float3 init = { 0, 0, 0 }, minn, maxx;
192
+
193
+ cub::DeviceReduce::Reduce(nullptr, temp_storage_bytes, points, result, P, CustomMin(), init);
194
+ thrust::device_vector<char> temp_storage(temp_storage_bytes);
195
+
196
+ cub::DeviceReduce::Reduce(temp_storage.data().get(), temp_storage_bytes, points, result, P, CustomMin(), init);
197
+ cudaMemcpy(&minn, result, sizeof(float3), cudaMemcpyDeviceToHost);
198
+
199
+ cub::DeviceReduce::Reduce(temp_storage.data().get(), temp_storage_bytes, points, result, P, CustomMax(), init);
200
+ cudaMemcpy(&maxx, result, sizeof(float3), cudaMemcpyDeviceToHost);
201
+
202
+ thrust::device_vector<uint32_t> morton(P);
203
+ thrust::device_vector<uint32_t> morton_sorted(P);
204
+ coord2Morton << <(P + 255) / 256, 256 >> > (P, points, minn, maxx, morton.data().get());
205
+
206
+ thrust::device_vector<uint32_t> indices(P);
207
+ thrust::sequence(indices.begin(), indices.end());
208
+ thrust::device_vector<uint32_t> indices_sorted(P);
209
+
210
+ cub::DeviceRadixSort::SortPairs(nullptr, temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
211
+ temp_storage.resize(temp_storage_bytes);
212
+
213
+ cub::DeviceRadixSort::SortPairs(temp_storage.data().get(), temp_storage_bytes, morton.data().get(), morton_sorted.data().get(), indices.data().get(), indices_sorted.data().get(), P);
214
+
215
+ uint32_t num_boxes = (P + BOX_SIZE - 1) / BOX_SIZE;
216
+ thrust::device_vector<MinMax> boxes(num_boxes);
217
+ boxMinMax << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get());
218
+ boxMeanDist << <num_boxes, BOX_SIZE >> > (P, points, indices_sorted.data().get(), boxes.data().get(), meanDists);
219
+
220
+ cudaFree(result);
221
+ }
simple-knn/simple_knn.h ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*
2
+ * Copyright (C) 2023, Inria
3
+ * GRAPHDECO research group, https://team.inria.fr/graphdeco
4
+ * All rights reserved.
5
+ *
6
+ * This software is free for non-commercial, research and evaluation use
7
+ * under the terms of the LICENSE.md file.
8
+ *
9
+ * For inquiries contact george.drettakis@inria.fr
10
+ */
11
+
12
+ #ifndef SIMPLEKNN_H_INCLUDED
13
+ #define SIMPLEKNN_H_INCLUDED
14
+
15
+ class SimpleKNN
16
+ {
17
+ public:
18
+ static void knn(int P, float3* points, float* meanDists);
19
+ };
20
+
21
+ #endif
simple-knn/simple_knn/.gitkeep ADDED
File without changes
simple-knn/spatial.cu ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*
2
+ * Copyright (C) 2023, Inria
3
+ * GRAPHDECO research group, https://team.inria.fr/graphdeco
4
+ * All rights reserved.
5
+ *
6
+ * This software is free for non-commercial, research and evaluation use
7
+ * under the terms of the LICENSE.md file.
8
+ *
9
+ * For inquiries contact george.drettakis@inria.fr
10
+ */
11
+
12
+ #include "spatial.h"
13
+ #include "simple_knn.h"
14
+
15
+ torch::Tensor
16
+ distCUDA2(const torch::Tensor& points)
17
+ {
18
+ const int P = points.size(0);
19
+
20
+ auto float_opts = points.options().dtype(torch::kFloat32);
21
+ torch::Tensor means = torch::full({P}, 0.0, float_opts);
22
+
23
+ SimpleKNN::knn(P, (float3*)points.contiguous().data<float>(), means.contiguous().data<float>());
24
+
25
+ return means;
26
+ }
simple-knn/spatial.h ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*
2
+ * Copyright (C) 2023, Inria
3
+ * GRAPHDECO research group, https://team.inria.fr/graphdeco
4
+ * All rights reserved.
5
+ *
6
+ * This software is free for non-commercial, research and evaluation use
7
+ * under the terms of the LICENSE.md file.
8
+ *
9
+ * For inquiries contact george.drettakis@inria.fr
10
+ */
11
+
12
+ #include <torch/extension.h>
13
+
14
+ torch::Tensor distCUDA2(const torch::Tensor& points);
zero123.py ADDED
@@ -0,0 +1,666 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2023 The HuggingFace Team. All rights reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ import inspect
16
+ import math
17
+ import warnings
18
+ from typing import Any, Callable, Dict, List, Optional, Union
19
+
20
+ import PIL
21
+ import torch
22
+ import torchvision.transforms.functional as TF
23
+ from diffusers.configuration_utils import ConfigMixin, FrozenDict, register_to_config
24
+ from diffusers.image_processor import VaeImageProcessor
25
+ from diffusers.models import AutoencoderKL, UNet2DConditionModel
26
+ from diffusers.models.modeling_utils import ModelMixin
27
+ from diffusers.pipelines.pipeline_utils import DiffusionPipeline
28
+ from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput
29
+ from diffusers.pipelines.stable_diffusion.safety_checker import (
30
+ StableDiffusionSafetyChecker,
31
+ )
32
+ from diffusers.schedulers import KarrasDiffusionSchedulers
33
+ from diffusers.utils import deprecate, is_accelerate_available, logging
34
+ from diffusers.utils.torch_utils import randn_tensor
35
+ from packaging import version
36
+ from transformers import CLIPImageProcessor, CLIPVisionModelWithProjection
37
+
38
+ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
39
+
40
+
41
+ class CLIPCameraProjection(ModelMixin, ConfigMixin):
42
+ """
43
+ A Projection layer for CLIP embedding and camera embedding.
44
+
45
+ Parameters:
46
+ embedding_dim (`int`, *optional*, defaults to 768): The dimension of the model input `clip_embed`
47
+ additional_embeddings (`int`, *optional*, defaults to 4): The number of additional tokens appended to the
48
+ projected `hidden_states`. The actual length of the used `hidden_states` is `num_embeddings +
49
+ additional_embeddings`.
50
+ """
51
+
52
+ @register_to_config
53
+ def __init__(self, embedding_dim: int = 768, additional_embeddings: int = 4):
54
+ super().__init__()
55
+ self.embedding_dim = embedding_dim
56
+ self.additional_embeddings = additional_embeddings
57
+
58
+ self.input_dim = self.embedding_dim + self.additional_embeddings
59
+ self.output_dim = self.embedding_dim
60
+
61
+ self.proj = torch.nn.Linear(self.input_dim, self.output_dim)
62
+
63
+ def forward(
64
+ self,
65
+ embedding: torch.FloatTensor,
66
+ ):
67
+ """
68
+ The [`PriorTransformer`] forward method.
69
+
70
+ Args:
71
+ hidden_states (`torch.FloatTensor` of shape `(batch_size, input_dim)`):
72
+ The currently input embeddings.
73
+
74
+ Returns:
75
+ The output embedding projection (`torch.FloatTensor` of shape `(batch_size, output_dim)`).
76
+ """
77
+ proj_embedding = self.proj(embedding)
78
+ return proj_embedding
79
+
80
+
81
+ class Zero123Pipeline(DiffusionPipeline):
82
+ r"""
83
+ Pipeline to generate variations from an input image using Stable Diffusion.
84
+
85
+ This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
86
+ library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
87
+
88
+ Args:
89
+ vae ([`AutoencoderKL`]):
90
+ Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
91
+ image_encoder ([`CLIPVisionModelWithProjection`]):
92
+ Frozen CLIP image-encoder. Stable Diffusion Image Variation uses the vision portion of
93
+ [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPVisionModelWithProjection),
94
+ specifically the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant.
95
+ unet ([`UNet2DConditionModel`]): Conditional U-Net architecture to denoise the encoded image latents.
96
+ scheduler ([`SchedulerMixin`]):
97
+ A scheduler to be used in combination with `unet` to denoise the encoded image latents. Can be one of
98
+ [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`].
99
+ safety_checker ([`StableDiffusionSafetyChecker`]):
100
+ Classification module that estimates whether generated images could be considered offensive or harmful.
101
+ Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
102
+ feature_extractor ([`CLIPImageProcessor`]):
103
+ Model that extracts features from generated images to be used as inputs for the `safety_checker`.
104
+ """
105
+ # TODO: feature_extractor is required to encode images (if they are in PIL format),
106
+ # we should give a descriptive message if the pipeline doesn't have one.
107
+ _optional_components = ["safety_checker"]
108
+
109
+ def __init__(
110
+ self,
111
+ vae: AutoencoderKL,
112
+ image_encoder: CLIPVisionModelWithProjection,
113
+ unet: UNet2DConditionModel,
114
+ scheduler: KarrasDiffusionSchedulers,
115
+ safety_checker: StableDiffusionSafetyChecker,
116
+ feature_extractor: CLIPImageProcessor,
117
+ clip_camera_projection: CLIPCameraProjection,
118
+ requires_safety_checker: bool = True,
119
+ ):
120
+ super().__init__()
121
+
122
+ if safety_checker is None and requires_safety_checker:
123
+ logger.warn(
124
+ f"You have disabled the safety checker for {self.__class__} by passing `safety_checker=None`. Ensure"
125
+ " that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered"
126
+ " results in services or applications open to the public. Both the diffusers team and Hugging Face"
127
+ " strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling"
128
+ " it only for use-cases that involve analyzing network behavior or auditing its results. For more"
129
+ " information, please have a look at https://github.com/huggingface/diffusers/pull/254 ."
130
+ )
131
+
132
+ if safety_checker is not None and feature_extractor is None:
133
+ raise ValueError(
134
+ "Make sure to define a feature extractor when loading {self.__class__} if you want to use the safety"
135
+ " checker. If you do not want to use the safety checker, you can pass `'safety_checker=None'` instead."
136
+ )
137
+
138
+ is_unet_version_less_0_9_0 = hasattr(
139
+ unet.config, "_diffusers_version"
140
+ ) and version.parse(
141
+ version.parse(unet.config._diffusers_version).base_version
142
+ ) < version.parse(
143
+ "0.9.0.dev0"
144
+ )
145
+ is_unet_sample_size_less_64 = (
146
+ hasattr(unet.config, "sample_size") and unet.config.sample_size < 64
147
+ )
148
+ if is_unet_version_less_0_9_0 and is_unet_sample_size_less_64:
149
+ deprecation_message = (
150
+ "The configuration file of the unet has set the default `sample_size` to smaller than"
151
+ " 64 which seems highly unlikely .If you're checkpoint is a fine-tuned version of any of the"
152
+ " following: \n- CompVis/stable-diffusion-v1-4 \n- CompVis/stable-diffusion-v1-3 \n-"
153
+ " CompVis/stable-diffusion-v1-2 \n- CompVis/stable-diffusion-v1-1 \n- runwayml/stable-diffusion-v1-5"
154
+ " \n- runwayml/stable-diffusion-inpainting \n you should change 'sample_size' to 64 in the"
155
+ " configuration file. Please make sure to update the config accordingly as leaving `sample_size=32`"
156
+ " in the config might lead to incorrect results in future versions. If you have downloaded this"
157
+ " checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for"
158
+ " the `unet/config.json` file"
159
+ )
160
+ deprecate(
161
+ "sample_size<64", "1.0.0", deprecation_message, standard_warn=False
162
+ )
163
+ new_config = dict(unet.config)
164
+ new_config["sample_size"] = 64
165
+ unet._internal_dict = FrozenDict(new_config)
166
+
167
+ self.register_modules(
168
+ vae=vae,
169
+ image_encoder=image_encoder,
170
+ unet=unet,
171
+ scheduler=scheduler,
172
+ safety_checker=safety_checker,
173
+ feature_extractor=feature_extractor,
174
+ clip_camera_projection=clip_camera_projection,
175
+ )
176
+ self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
177
+ self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
178
+ self.register_to_config(requires_safety_checker=requires_safety_checker)
179
+
180
+ def enable_sequential_cpu_offload(self, gpu_id=0):
181
+ r"""
182
+ Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, unet,
183
+ text_encoder, vae and safety checker have their state dicts saved to CPU and then are moved to a
184
+ `torch.device('meta') and loaded to GPU only when their specific submodule has its `forward` method called.
185
+ """
186
+ if is_accelerate_available():
187
+ from accelerate import cpu_offload
188
+ else:
189
+ raise ImportError("Please install accelerate via `pip install accelerate`")
190
+
191
+ device = torch.device(f"cuda:{gpu_id}")
192
+
193
+ for cpu_offloaded_model in [
194
+ self.unet,
195
+ self.image_encoder,
196
+ self.vae,
197
+ self.safety_checker,
198
+ ]:
199
+ if cpu_offloaded_model is not None:
200
+ cpu_offload(cpu_offloaded_model, device)
201
+
202
+ @property
203
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline._execution_device
204
+ def _execution_device(self):
205
+ r"""
206
+ Returns the device on which the pipeline's models will be executed. After calling
207
+ `pipeline.enable_sequential_cpu_offload()` the execution device can only be inferred from Accelerate's module
208
+ hooks.
209
+ """
210
+ if not hasattr(self.unet, "_hf_hook"):
211
+ return self.device
212
+ for module in self.unet.modules():
213
+ if (
214
+ hasattr(module, "_hf_hook")
215
+ and hasattr(module._hf_hook, "execution_device")
216
+ and module._hf_hook.execution_device is not None
217
+ ):
218
+ return torch.device(module._hf_hook.execution_device)
219
+ return self.device
220
+
221
+ def _encode_image(
222
+ self,
223
+ image,
224
+ elevation,
225
+ azimuth,
226
+ distance,
227
+ device,
228
+ num_images_per_prompt,
229
+ do_classifier_free_guidance,
230
+ clip_image_embeddings=None,
231
+ image_camera_embeddings=None,
232
+ ):
233
+ dtype = next(self.image_encoder.parameters()).dtype
234
+
235
+ if image_camera_embeddings is None:
236
+ if image is None:
237
+ assert clip_image_embeddings is not None
238
+ image_embeddings = clip_image_embeddings.to(device=device, dtype=dtype)
239
+ else:
240
+ if not isinstance(image, torch.Tensor):
241
+ image = self.feature_extractor(
242
+ images=image, return_tensors="pt"
243
+ ).pixel_values
244
+
245
+ image = image.to(device=device, dtype=dtype)
246
+ image_embeddings = self.image_encoder(image).image_embeds
247
+ image_embeddings = image_embeddings.unsqueeze(1)
248
+
249
+ bs_embed, seq_len, _ = image_embeddings.shape
250
+
251
+ if isinstance(elevation, float):
252
+ elevation = torch.as_tensor(
253
+ [elevation] * bs_embed, dtype=dtype, device=device
254
+ )
255
+ if isinstance(azimuth, float):
256
+ azimuth = torch.as_tensor(
257
+ [azimuth] * bs_embed, dtype=dtype, device=device
258
+ )
259
+ if isinstance(distance, float):
260
+ distance = torch.as_tensor(
261
+ [distance] * bs_embed, dtype=dtype, device=device
262
+ )
263
+
264
+ camera_embeddings = torch.stack(
265
+ [
266
+ torch.deg2rad(elevation),
267
+ torch.sin(torch.deg2rad(azimuth)),
268
+ torch.cos(torch.deg2rad(azimuth)),
269
+ distance,
270
+ ],
271
+ dim=-1,
272
+ )[:, None, :]
273
+
274
+ image_embeddings = torch.cat([image_embeddings, camera_embeddings], dim=-1)
275
+
276
+ # project (image, camera) embeddings to the same dimension as clip embeddings
277
+ image_embeddings = self.clip_camera_projection(image_embeddings)
278
+ else:
279
+ image_embeddings = image_camera_embeddings.to(device=device, dtype=dtype)
280
+ bs_embed, seq_len, _ = image_embeddings.shape
281
+
282
+ # duplicate image embeddings for each generation per prompt, using mps friendly method
283
+ image_embeddings = image_embeddings.repeat(1, num_images_per_prompt, 1)
284
+ image_embeddings = image_embeddings.view(
285
+ bs_embed * num_images_per_prompt, seq_len, -1
286
+ )
287
+
288
+ if do_classifier_free_guidance:
289
+ negative_prompt_embeds = torch.zeros_like(image_embeddings)
290
+
291
+ # For classifier free guidance, we need to do two forward passes.
292
+ # Here we concatenate the unconditional and text embeddings into a single batch
293
+ # to avoid doing two forward passes
294
+ image_embeddings = torch.cat([negative_prompt_embeds, image_embeddings])
295
+
296
+ return image_embeddings
297
+
298
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.run_safety_checker
299
+ def run_safety_checker(self, image, device, dtype):
300
+ if self.safety_checker is None:
301
+ has_nsfw_concept = None
302
+ else:
303
+ if torch.is_tensor(image):
304
+ feature_extractor_input = self.image_processor.postprocess(
305
+ image, output_type="pil"
306
+ )
307
+ else:
308
+ feature_extractor_input = self.image_processor.numpy_to_pil(image)
309
+ safety_checker_input = self.feature_extractor(
310
+ feature_extractor_input, return_tensors="pt"
311
+ ).to(device)
312
+ image, has_nsfw_concept = self.safety_checker(
313
+ images=image, clip_input=safety_checker_input.pixel_values.to(dtype)
314
+ )
315
+ return image, has_nsfw_concept
316
+
317
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.decode_latents
318
+ def decode_latents(self, latents):
319
+ warnings.warn(
320
+ "The decode_latents method is deprecated and will be removed in a future version. Please"
321
+ " use VaeImageProcessor instead",
322
+ FutureWarning,
323
+ )
324
+ latents = 1 / self.vae.config.scaling_factor * latents
325
+ image = self.vae.decode(latents, return_dict=False)[0]
326
+ image = (image / 2 + 0.5).clamp(0, 1)
327
+ # we always cast to float32 as this does not cause significant overhead and is compatible with bfloat16
328
+ image = image.cpu().permute(0, 2, 3, 1).float().numpy()
329
+ return image
330
+
331
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
332
+ def prepare_extra_step_kwargs(self, generator, eta):
333
+ # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
334
+ # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
335
+ # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
336
+ # and should be between [0, 1]
337
+
338
+ accepts_eta = "eta" in set(
339
+ inspect.signature(self.scheduler.step).parameters.keys()
340
+ )
341
+ extra_step_kwargs = {}
342
+ if accepts_eta:
343
+ extra_step_kwargs["eta"] = eta
344
+
345
+ # check if the scheduler accepts generator
346
+ accepts_generator = "generator" in set(
347
+ inspect.signature(self.scheduler.step).parameters.keys()
348
+ )
349
+ if accepts_generator:
350
+ extra_step_kwargs["generator"] = generator
351
+ return extra_step_kwargs
352
+
353
+ def check_inputs(self, image, height, width, callback_steps):
354
+ # TODO: check image size or adjust image size to (height, width)
355
+
356
+ if height % 8 != 0 or width % 8 != 0:
357
+ raise ValueError(
358
+ f"`height` and `width` have to be divisible by 8 but are {height} and {width}."
359
+ )
360
+
361
+ if (callback_steps is None) or (
362
+ callback_steps is not None
363
+ and (not isinstance(callback_steps, int) or callback_steps <= 0)
364
+ ):
365
+ raise ValueError(
366
+ f"`callback_steps` has to be a positive integer but is {callback_steps} of type"
367
+ f" {type(callback_steps)}."
368
+ )
369
+
370
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
371
+ def prepare_latents(
372
+ self,
373
+ batch_size,
374
+ num_channels_latents,
375
+ height,
376
+ width,
377
+ dtype,
378
+ device,
379
+ generator,
380
+ latents=None,
381
+ ):
382
+ shape = (
383
+ batch_size,
384
+ num_channels_latents,
385
+ height // self.vae_scale_factor,
386
+ width // self.vae_scale_factor,
387
+ )
388
+ if isinstance(generator, list) and len(generator) != batch_size:
389
+ raise ValueError(
390
+ f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
391
+ f" size of {batch_size}. Make sure the batch size matches the length of the generators."
392
+ )
393
+
394
+ if latents is None:
395
+ latents = randn_tensor(
396
+ shape, generator=generator, device=device, dtype=dtype
397
+ )
398
+ else:
399
+ latents = latents.to(device)
400
+
401
+ # scale the initial noise by the standard deviation required by the scheduler
402
+ latents = latents * self.scheduler.init_noise_sigma
403
+ return latents
404
+
405
+ def _get_latent_model_input(
406
+ self,
407
+ latents: torch.FloatTensor,
408
+ image: Optional[
409
+ Union[PIL.Image.Image, List[PIL.Image.Image], torch.FloatTensor]
410
+ ],
411
+ num_images_per_prompt: int,
412
+ do_classifier_free_guidance: bool,
413
+ image_latents: Optional[torch.FloatTensor] = None,
414
+ ):
415
+ if isinstance(image, PIL.Image.Image):
416
+ image_pt = TF.to_tensor(image).unsqueeze(0).to(latents)
417
+ elif isinstance(image, list):
418
+ image_pt = torch.stack([TF.to_tensor(img) for img in image], dim=0).to(
419
+ latents
420
+ )
421
+ elif isinstance(image, torch.Tensor):
422
+ image_pt = image
423
+ else:
424
+ image_pt = None
425
+
426
+ if image_pt is None:
427
+ assert image_latents is not None
428
+ image_pt = image_latents.repeat_interleave(num_images_per_prompt, dim=0)
429
+ else:
430
+ image_pt = image_pt * 2.0 - 1.0 # scale to [-1, 1]
431
+ # FIXME: encoded latents should be multiplied with self.vae.config.scaling_factor
432
+ # but zero123 was not trained this way
433
+ image_pt = self.vae.encode(image_pt).latent_dist.mode()
434
+ image_pt = image_pt.repeat_interleave(num_images_per_prompt, dim=0)
435
+ if do_classifier_free_guidance:
436
+ latent_model_input = torch.cat(
437
+ [
438
+ torch.cat([latents, latents], dim=0),
439
+ torch.cat([torch.zeros_like(image_pt), image_pt], dim=0),
440
+ ],
441
+ dim=1,
442
+ )
443
+ else:
444
+ latent_model_input = torch.cat([latents, image_pt], dim=1)
445
+
446
+ return latent_model_input
447
+
448
+ @torch.no_grad()
449
+ def __call__(
450
+ self,
451
+ image: Optional[
452
+ Union[PIL.Image.Image, List[PIL.Image.Image], torch.FloatTensor]
453
+ ] = None,
454
+ elevation: Optional[Union[float, torch.FloatTensor]] = None,
455
+ azimuth: Optional[Union[float, torch.FloatTensor]] = None,
456
+ distance: Optional[Union[float, torch.FloatTensor]] = None,
457
+ height: Optional[int] = None,
458
+ width: Optional[int] = None,
459
+ num_inference_steps: int = 50,
460
+ guidance_scale: float = 3.0,
461
+ num_images_per_prompt: int = 1,
462
+ eta: float = 0.0,
463
+ generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
464
+ latents: Optional[torch.FloatTensor] = None,
465
+ clip_image_embeddings: Optional[torch.FloatTensor] = None,
466
+ image_camera_embeddings: Optional[torch.FloatTensor] = None,
467
+ image_latents: Optional[torch.FloatTensor] = None,
468
+ output_type: Optional[str] = "pil",
469
+ return_dict: bool = True,
470
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
471
+ callback_steps: int = 1,
472
+ cross_attention_kwargs: Optional[Dict[str, Any]] = None,
473
+ ):
474
+ r"""
475
+ Function invoked when calling the pipeline for generation.
476
+
477
+ Args:
478
+ image (`PIL.Image.Image` or `List[PIL.Image.Image]` or `torch.FloatTensor`):
479
+ The image or images to guide the image generation. If you provide a tensor, it needs to comply with the
480
+ configuration of
481
+ [this](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/blob/main/feature_extractor/preprocessor_config.json)
482
+ `CLIPImageProcessor`
483
+ height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
484
+ The height in pixels of the generated image.
485
+ width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
486
+ The width in pixels of the generated image.
487
+ num_inference_steps (`int`, *optional*, defaults to 50):
488
+ The number of denoising steps. More denoising steps usually lead to a higher quality image at the
489
+ expense of slower inference.
490
+ guidance_scale (`float`, *optional*, defaults to 7.5):
491
+ Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
492
+ `guidance_scale` is defined as `w` of equation 2. of [Imagen
493
+ Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
494
+ 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
495
+ usually at the expense of lower image quality.
496
+ num_images_per_prompt (`int`, *optional*, defaults to 1):
497
+ The number of images to generate per prompt.
498
+ eta (`float`, *optional*, defaults to 0.0):
499
+ Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
500
+ [`schedulers.DDIMScheduler`], will be ignored for others.
501
+ generator (`torch.Generator`, *optional*):
502
+ One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
503
+ to make generation deterministic.
504
+ latents (`torch.FloatTensor`, *optional*):
505
+ Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
506
+ generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
507
+ tensor will ge generated by sampling using the supplied random `generator`.
508
+ output_type (`str`, *optional*, defaults to `"pil"`):
509
+ The output format of the generate image. Choose between
510
+ [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
511
+ return_dict (`bool`, *optional*, defaults to `True`):
512
+ Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
513
+ plain tuple.
514
+ callback (`Callable`, *optional*):
515
+ A function that will be called every `callback_steps` steps during inference. The function will be
516
+ called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
517
+ callback_steps (`int`, *optional*, defaults to 1):
518
+ The frequency at which the `callback` function will be called. If not specified, the callback will be
519
+ called at every step.
520
+
521
+ Returns:
522
+ [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`:
523
+ [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] if `return_dict` is True, otherwise a `tuple.
524
+ When returning a tuple, the first element is a list with the generated images, and the second element is a
525
+ list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work"
526
+ (nsfw) content, according to the `safety_checker`.
527
+ """
528
+ # 0. Default height and width to unet
529
+ height = height or self.unet.config.sample_size * self.vae_scale_factor
530
+ width = width or self.unet.config.sample_size * self.vae_scale_factor
531
+
532
+ # 1. Check inputs. Raise error if not correct
533
+ # TODO: check input elevation, azimuth, and distance
534
+ # TODO: check image, clip_image_embeddings, image_latents
535
+ self.check_inputs(image, height, width, callback_steps)
536
+
537
+ # 2. Define call parameters
538
+ if isinstance(image, PIL.Image.Image):
539
+ batch_size = 1
540
+ elif isinstance(image, list):
541
+ batch_size = len(image)
542
+ elif isinstance(image, torch.Tensor):
543
+ batch_size = image.shape[0]
544
+ else:
545
+ assert image_latents is not None
546
+ assert (
547
+ clip_image_embeddings is not None or image_camera_embeddings is not None
548
+ )
549
+ batch_size = image_latents.shape[0]
550
+
551
+ device = self._execution_device
552
+ # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
553
+ # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
554
+ # corresponds to doing no classifier free guidance.
555
+ do_classifier_free_guidance = guidance_scale > 1.0
556
+
557
+ # 3. Encode input image
558
+ if isinstance(image, PIL.Image.Image) or isinstance(image, list):
559
+ pil_image = image
560
+ elif isinstance(image, torch.Tensor):
561
+ pil_image = [TF.to_pil_image(image[i]) for i in range(image.shape[0])]
562
+ else:
563
+ pil_image = None
564
+ image_embeddings = self._encode_image(
565
+ pil_image,
566
+ elevation,
567
+ azimuth,
568
+ distance,
569
+ device,
570
+ num_images_per_prompt,
571
+ do_classifier_free_guidance,
572
+ clip_image_embeddings,
573
+ image_camera_embeddings,
574
+ )
575
+
576
+ # 4. Prepare timesteps
577
+ self.scheduler.set_timesteps(num_inference_steps, device=device)
578
+ timesteps = self.scheduler.timesteps
579
+
580
+ # 5. Prepare latent variables
581
+ # num_channels_latents = self.unet.config.in_channels
582
+ num_channels_latents = 4 # FIXME: hard-coded
583
+ latents = self.prepare_latents(
584
+ batch_size * num_images_per_prompt,
585
+ num_channels_latents,
586
+ height,
587
+ width,
588
+ image_embeddings.dtype,
589
+ device,
590
+ generator,
591
+ latents,
592
+ )
593
+
594
+ # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
595
+ extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
596
+
597
+ # 7. Denoising loop
598
+ num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
599
+ with self.progress_bar(total=num_inference_steps) as progress_bar:
600
+ for i, t in enumerate(timesteps):
601
+ # expand the latents if we are doing classifier free guidance
602
+ latent_model_input = self._get_latent_model_input(
603
+ latents,
604
+ image,
605
+ num_images_per_prompt,
606
+ do_classifier_free_guidance,
607
+ image_latents,
608
+ )
609
+ latent_model_input = self.scheduler.scale_model_input(
610
+ latent_model_input, t
611
+ )
612
+
613
+ # predict the noise residual
614
+ noise_pred = self.unet(
615
+ latent_model_input,
616
+ t,
617
+ encoder_hidden_states=image_embeddings,
618
+ cross_attention_kwargs=cross_attention_kwargs,
619
+ ).sample
620
+
621
+ # perform guidance
622
+ if do_classifier_free_guidance:
623
+ noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
624
+ noise_pred = noise_pred_uncond + guidance_scale * (
625
+ noise_pred_text - noise_pred_uncond
626
+ )
627
+
628
+ # compute the previous noisy sample x_t -> x_t-1
629
+ latents = self.scheduler.step(
630
+ noise_pred, t, latents, **extra_step_kwargs
631
+ ).prev_sample
632
+
633
+ # call the callback, if provided
634
+ if i == len(timesteps) - 1 or (
635
+ (i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0
636
+ ):
637
+ progress_bar.update()
638
+ if callback is not None and i % callback_steps == 0:
639
+ callback(i, t, latents)
640
+
641
+ if not output_type == "latent":
642
+ image = self.vae.decode(
643
+ latents / self.vae.config.scaling_factor, return_dict=False
644
+ )[0]
645
+ image, has_nsfw_concept = self.run_safety_checker(
646
+ image, device, image_embeddings.dtype
647
+ )
648
+ else:
649
+ image = latents
650
+ has_nsfw_concept = None
651
+
652
+ if has_nsfw_concept is None:
653
+ do_denormalize = [True] * image.shape[0]
654
+ else:
655
+ do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]
656
+
657
+ image = self.image_processor.postprocess(
658
+ image, output_type=output_type, do_denormalize=do_denormalize
659
+ )
660
+
661
+ if not return_dict:
662
+ return (image, has_nsfw_concept)
663
+
664
+ return StableDiffusionPipelineOutput(
665
+ images=image, nsfw_content_detected=has_nsfw_concept
666
+ )