Wuvin commited on
Commit
37aeb5b
1 Parent(s): 69a849f
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .ipynb_checkpoints/README-checkpoint.md +68 -0
  2. README_zh.md +56 -0
  3. app/__init__.py +0 -0
  4. app/all_models.py +22 -0
  5. app/custom_models/image2image-objaverseF-rgb2normal.yaml +61 -0
  6. app/custom_models/image2mvimage-objaverseFrot-wonder3d.yaml +63 -0
  7. app/custom_models/mvimg_prediction.py +57 -0
  8. app/custom_models/normal_prediction.py +26 -0
  9. app/custom_models/utils.py +75 -0
  10. app/examples/Groot.png +3 -0
  11. app/examples/aaa.png +3 -0
  12. app/examples/abma.png +3 -0
  13. app/examples/akun.png +3 -0
  14. app/examples/anya.png +3 -0
  15. app/examples/bag.png +3 -0
  16. app/examples/generated_1715761545_frame0.png +3 -0
  17. app/examples/generated_1715762357_frame0.png +3 -0
  18. app/examples/generated_1715763329_frame0.png +3 -0
  19. app/examples/hatsune_miku.png +3 -0
  20. app/examples/princess-large.png +3 -0
  21. app/examples/shoe.png +3 -0
  22. app/gradio_3dgen.py +71 -0
  23. app/gradio_3dgen_steps.py +87 -0
  24. app/gradio_local.py +76 -0
  25. app/utils.py +112 -0
  26. assets/teaser.jpg +0 -0
  27. ckpt/controlnet-tile/config.json +52 -0
  28. ckpt/controlnet-tile/diffusion_pytorch_model.safetensors +3 -0
  29. ckpt/image2normal/feature_extractor/preprocessor_config.json +44 -0
  30. ckpt/image2normal/image_encoder/config.json +23 -0
  31. ckpt/image2normal/image_encoder/model.safetensors +3 -0
  32. ckpt/image2normal/model_index.json +31 -0
  33. ckpt/image2normal/scheduler/scheduler_config.json +16 -0
  34. ckpt/image2normal/unet/config.json +68 -0
  35. ckpt/image2normal/unet/diffusion_pytorch_model.safetensors +3 -0
  36. ckpt/image2normal/unet_state_dict.pth +3 -0
  37. ckpt/image2normal/vae/config.json +34 -0
  38. ckpt/image2normal/vae/diffusion_pytorch_model.safetensors +3 -0
  39. ckpt/img2mvimg/feature_extractor/preprocessor_config.json +44 -0
  40. ckpt/img2mvimg/image_encoder/config.json +23 -0
  41. ckpt/img2mvimg/image_encoder/model.safetensors +3 -0
  42. ckpt/img2mvimg/model_index.json +31 -0
  43. ckpt/img2mvimg/scheduler/scheduler_config.json +20 -0
  44. ckpt/img2mvimg/unet/config.json +68 -0
  45. ckpt/img2mvimg/unet/diffusion_pytorch_model.safetensors +3 -0
  46. ckpt/img2mvimg/unet_state_dict.pth +3 -0
  47. ckpt/img2mvimg/vae/config.json +34 -0
  48. ckpt/img2mvimg/vae/diffusion_pytorch_model.safetensors +3 -0
  49. ckpt/realesrgan-x4.onnx +3 -0
  50. ckpt/v1-inference.yaml +70 -0
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **中文版本 [中文](README_zh.md)**
2
+
3
+ # Unique3D
4
+ High-Quality and Efficient 3D Mesh Generation from a Single Image
5
+
6
+ ## [Paper]() | [Project page](https://wukailu.github.io/Unique3D/) | [Huggingface Demo]() | [Online Demo](https://www.aiuni.ai/)
7
+
8
+ ![](assets/fig_teaser.png)
9
+
10
+ High-fidelity and diverse textured meshes generated by Unique3D from single-view wild images in 30 seconds.
11
+
12
+ ## More features
13
+
14
+ The repo is still being under construction, thanks for your patience.
15
+ - [x] Local gradio demo.
16
+ - [ ] Detailed tutorial.
17
+ - [ ] Huggingface demo.
18
+ - [ ] Detailed local demo.
19
+ - [ ] Comfyui support.
20
+ - [ ] Windows support.
21
+ - [ ] Docker support.
22
+ - [ ] More stable reconstruction with normal.
23
+ - [ ] Training code release.
24
+
25
+ ## Preparation for inference
26
+
27
+ ### Linux System Setup.
28
+ ```angular2html
29
+ conda create -n unique3d
30
+ conda activate unique3d
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ ### Interactive inference: run your local gradio demo.
35
+
36
+ 1. Download the [ckpt.zip](), and extract it to `ckpt/*`.
37
+ ```
38
+ Unique3D
39
+ ├──ckpt
40
+ ├── controlnet-tile/
41
+ ├── image2normal/
42
+ ├── img2mvimg/
43
+ ├── realesrgan-x4.onnx
44
+ └── v1-inference.yaml
45
+ ```
46
+
47
+ 2. Run the interactive inference locally.
48
+ ```bash
49
+ python app/gradio_local.py --port 7860
50
+ ```
51
+
52
+ ## Tips to get better results
53
+
54
+ 1. Unique3D is sensitive to the facing direction of input images. Due to the distribution of the training data, orthographic front-facing images with a rest pose always lead to good reconstructions.
55
+ 2. Images with occlusions will cause worse reconstructions, since four views cannot cover the complete object. Images with fewer occlusions lead to better results.
56
+ 3. Pass an image with as high a resolution as possible to the input when resolution is a factor.
57
+
58
+ ## Acknowledgement
59
+
60
+ We have intensively borrowed code from the following repositories. Many thanks to the authors for sharing their code.
61
+ - [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
62
+ - [Wonder3d](https://github.com/xxlong0/Wonder3D)
63
+ - [Zero123Plus](https://github.com/SUDO-AI-3D/zero123plus)
64
+ - [Continues Remeshing](https://github.com/Profactor/continuous-remeshing)
65
+ - [Depth from Normals](https://github.com/YertleTurtleGit/depth-from-normals)
66
+
67
+ ## Collaborations
68
+ Our mission is to create a 4D generative model with 3D concepts. This is just our first step, and the road ahead is still long, but we are confident. We warmly invite you to join the discussion and explore potential collaborations in any capacity. <span style="color:red">**If you're interested in connecting or partnering with us, please don't hesitate to reach out via email (wkl22@mails.tsinghua.edu.cn)**</span>.
README_zh.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **其他语言版本 [English](README.md)**
2
+
3
+ # Unique3D
4
+ High-Quality and Efficient 3D Mesh Generation from a Single Image
5
+
6
+ ## [论文]() | [项目页面](https://wukailu.github.io/Unique3D/) | [Huggingface Demo]() | [在线演示](https://www.aiuni.ai/)
7
+
8
+ ![](assets/fig_teaser.png)
9
+
10
+ Unique3D从单视图图像生成高保真度和多样化纹理的网格,在4090上大约需要30秒。
11
+
12
+ ### 推理准备
13
+
14
+ #### Linux系统设置
15
+ ```angular2html
16
+ conda create -n unique3d
17
+ conda activate unique3d
18
+ pip install -r requirements.txt
19
+ ```
20
+
21
+ #### 交互式推理:运行您的本地gradio演示
22
+
23
+ 1. 下载[ckpt.zip](),并将其解压到`ckpt/*`。
24
+ ```
25
+ Unique3D
26
+ ├──ckpt
27
+ ├── controlnet-tile/
28
+ ├── image2normal/
29
+ ├── img2mvimg/
30
+ ├── realesrgan-x4.onnx
31
+ └── v1-inference.yaml
32
+ ```
33
+
34
+ 2. 在本地运行交互式推理。
35
+ ```bash
36
+ python app/gradio_local.py --port 7860
37
+ ```
38
+
39
+ ## 获取更好结果的提示
40
+
41
+ 1. Unique3D对输入图像的朝向非常敏感。由于训练数据的分布,**正交正视图像**通常总是能带来良好的重建。对于人物而言,最好是 A-pose 或者 T-pose,因为目前训练数据很少含有其他类型姿态。
42
+ 2. 有遮挡的图像会导致更差的重建,因为4个视图无法覆盖完整的对象。遮挡较少的图像会带来更好的结果。
43
+ 3. 尽可能将高分辨率的图像用作输入。
44
+
45
+ ## 致谢
46
+
47
+ 我们借用了以下代码库的代码。非常感谢作者们分享他们的代码。
48
+ - [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
49
+ - [Wonder3d](https://github.com/xxlong0/Wonder3D)
50
+ - [Zero123Plus](https://github.com/SUDO-AI-3D/zero123plus)
51
+ - [Continues Remeshing](https://github.com/Profactor/continuous-remeshing)
52
+ - [Depth from Normals](https://github.com/YertleTurtleGit/depth-from-normals)
53
+
54
+ ## 合作
55
+
56
+ 我们使命是创建一个具有3D概念的4D生成模型。这只是我们的第一步,前方的道路仍然很长,但我们有信心。我们热情邀请您加入讨论,并探索任何形式的潜在合作。<span style="color:red">**如果您有兴趣联系或与我们合作,欢迎通过电子邮件(wkl22@mails.tsinghua.edu.cn)与我们联系**</span>。
app/__init__.py ADDED
File without changes
app/all_models.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from scripts.sd_model_zoo import load_common_sd15_pipe
3
+ from diffusers import StableDiffusionControlNetImg2ImgPipeline, StableDiffusionPipeline
4
+
5
+
6
+ class MyModelZoo:
7
+ _pipe_disney_controlnet_lineart_ipadapter_i2i: StableDiffusionControlNetImg2ImgPipeline = None
8
+
9
+ base_model = "runwayml/stable-diffusion-v1-5"
10
+
11
+ def __init__(self, base_model=None) -> None:
12
+ if base_model is not None:
13
+ self.base_model = base_model
14
+
15
+ @property
16
+ def pipe_disney_controlnet_tile_ipadapter_i2i(self):
17
+ return self._pipe_disney_controlnet_lineart_ipadapter_i2i
18
+
19
+ def init_models(self):
20
+ self._pipe_disney_controlnet_lineart_ipadapter_i2i = load_common_sd15_pipe(base_model=self.base_model, ip_adapter=True, plus_model=False, controlnet="./ckpt/controlnet-tile", pipeline_class=StableDiffusionControlNetImg2ImgPipeline)
21
+
22
+ model_zoo = MyModelZoo()
app/custom_models/image2image-objaverseF-rgb2normal.yaml ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pretrained_model_name_or_path: "lambdalabs/sd-image-variations-diffusers"
2
+ mixed_precision: "bf16"
3
+
4
+ init_config:
5
+ # enable controls
6
+ enable_cross_attn_lora: False
7
+ enable_cross_attn_ip: False
8
+ enable_self_attn_lora: False
9
+ enable_self_attn_ref: True
10
+ enable_multiview_attn: False
11
+
12
+ # for cross attention
13
+ init_cross_attn_lora: False
14
+ init_cross_attn_ip: False
15
+ cross_attn_lora_rank: 512 # 0 for not enabled
16
+ cross_attn_lora_only_kv: False
17
+ ipadapter_pretrained_name: "h94/IP-Adapter"
18
+ ipadapter_subfolder_name: "models"
19
+ ipadapter_weight_name: "ip-adapter_sd15.safetensors"
20
+ ipadapter_effect_on: "all" # all, first
21
+
22
+ # for self attention
23
+ init_self_attn_lora: False
24
+ self_attn_lora_rank: 512
25
+ self_attn_lora_only_kv: False
26
+
27
+ # for self attention ref
28
+ init_self_attn_ref: True
29
+ self_attn_ref_position: "attn1"
30
+ self_attn_ref_other_model_name: "lambdalabs/sd-image-variations-diffusers"
31
+ self_attn_ref_pixel_wise_crosspond: True
32
+ self_attn_ref_effect_on: "all"
33
+
34
+ # for multiview attention
35
+ init_multiview_attn: False
36
+ multiview_attn_position: "attn1"
37
+ num_modalities: 1
38
+
39
+ # for unet
40
+ init_unet_path: "${pretrained_model_name_or_path}"
41
+ init_num_cls_label: 0 # for initialize
42
+ cls_labels: [] # for current task
43
+
44
+ trainers:
45
+ - trainer_type: "image2image_trainer"
46
+ trainer:
47
+ pretrained_model_name_or_path: "${pretrained_model_name_or_path}"
48
+ attn_config:
49
+ cls_labels: [] # for current task
50
+ enable_cross_attn_lora: False
51
+ enable_cross_attn_ip: False
52
+ enable_self_attn_lora: False
53
+ enable_self_attn_ref: True
54
+ enable_multiview_attn: False
55
+ resolution: "512"
56
+ condition_image_resolution: "512"
57
+ condition_image_column_name: "conditioning_image"
58
+ image_column_name: "image"
59
+
60
+
61
+
app/custom_models/image2mvimage-objaverseFrot-wonder3d.yaml ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pretrained_model_name_or_path: "./ckpt/img2mvimg"
2
+ mixed_precision: "bf16"
3
+
4
+ init_config:
5
+ # enable controls
6
+ enable_cross_attn_lora: False
7
+ enable_cross_attn_ip: False
8
+ enable_self_attn_lora: False
9
+ enable_self_attn_ref: False
10
+ enable_multiview_attn: True
11
+
12
+ # for cross attention
13
+ init_cross_attn_lora: False
14
+ init_cross_attn_ip: False
15
+ cross_attn_lora_rank: 256 # 0 for not enabled
16
+ cross_attn_lora_only_kv: False
17
+ ipadapter_pretrained_name: "h94/IP-Adapter"
18
+ ipadapter_subfolder_name: "models"
19
+ ipadapter_weight_name: "ip-adapter_sd15.safetensors"
20
+ ipadapter_effect_on: "all" # all, first
21
+
22
+ # for self attention
23
+ init_self_attn_lora: False
24
+ self_attn_lora_rank: 256
25
+ self_attn_lora_only_kv: False
26
+
27
+ # for self attention ref
28
+ init_self_attn_ref: False
29
+ self_attn_ref_position: "attn1"
30
+ self_attn_ref_other_model_name: "lambdalabs/sd-image-variations-diffusers"
31
+ self_attn_ref_pixel_wise_crosspond: False
32
+ self_attn_ref_effect_on: "all"
33
+
34
+ # for multiview attention
35
+ init_multiview_attn: True
36
+ multiview_attn_position: "attn1"
37
+ use_mv_joint_attn: True
38
+ num_modalities: 1
39
+
40
+ # for unet
41
+ init_unet_path: "${pretrained_model_name_or_path}"
42
+ cat_condition: True # cat condition to input
43
+
44
+ # for cls embedding
45
+ init_num_cls_label: 8 # for initialize
46
+ cls_labels: [0, 1, 2, 3] # for current task
47
+
48
+ trainers:
49
+ - trainer_type: "image2mvimage_trainer"
50
+ trainer:
51
+ pretrained_model_name_or_path: "${pretrained_model_name_or_path}"
52
+ attn_config:
53
+ cls_labels: [0, 1, 2, 3] # for current task
54
+ enable_cross_attn_lora: False
55
+ enable_cross_attn_ip: False
56
+ enable_self_attn_lora: False
57
+ enable_self_attn_ref: False
58
+ enable_multiview_attn: True
59
+ resolution: "256"
60
+ condition_image_resolution: "256"
61
+ normal_cls_offset: 4
62
+ condition_image_column_name: "conditioning_image"
63
+ image_column_name: "image"
app/custom_models/mvimg_prediction.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import torch
3
+ import gradio as gr
4
+ from PIL import Image
5
+ import numpy as np
6
+ from rembg import remove
7
+ from app.utils import change_rgba_bg, rgba_to_rgb
8
+ from app.custom_models.utils import load_pipeline
9
+ from scripts.all_typing import *
10
+ from scripts.utils import session, simple_preprocess
11
+
12
+ training_config = "app/custom_models/image2mvimage-objaverseFrot-wonder3d.yaml"
13
+ checkpoint_path = "ckpt/img2mvimg/unet_state_dict.pth"
14
+ trainer, pipeline = load_pipeline(training_config, checkpoint_path)
15
+ pipeline.enable_model_cpu_offload()
16
+
17
+ def predict(img_list: List[Image.Image], guidance_scale=2., **kwargs):
18
+ if isinstance(img_list, Image.Image):
19
+ img_list = [img_list]
20
+ img_list = [rgba_to_rgb(i) if i.mode == 'RGBA' else i for i in img_list]
21
+ ret = []
22
+ for img in img_list:
23
+ images = trainer.pipeline_forward(
24
+ pipeline=pipeline,
25
+ image=img,
26
+ guidance_scale=guidance_scale,
27
+ **kwargs
28
+ ).images
29
+ ret.extend(images)
30
+ return ret
31
+
32
+
33
+ def run_mvprediction(input_image: Image.Image, remove_bg=True, guidance_scale=1.5, seed=1145):
34
+ if input_image.mode == 'RGB' or np.array(input_image)[..., -1].mean() == 255.:
35
+ # still do remove using rembg, since simple_preprocess requires RGBA image
36
+ print("RGB image not RGBA! still remove bg!")
37
+ remove_bg = True
38
+
39
+ if remove_bg:
40
+ input_image = remove(input_image, session=session)
41
+
42
+ # make front_pil RGBA with white bg
43
+ input_image = change_rgba_bg(input_image, "white")
44
+ single_image = simple_preprocess(input_image)
45
+
46
+ generator = torch.Generator(device="cuda").manual_seed(int(seed)) if seed >= 0 else None
47
+
48
+ rgb_pils = predict(
49
+ single_image,
50
+ generator=generator,
51
+ guidance_scale=guidance_scale,
52
+ width=256,
53
+ height=256,
54
+ num_inference_steps=30,
55
+ )
56
+
57
+ return rgb_pils, single_image
app/custom_models/normal_prediction.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from PIL import Image
3
+ from app.utils import rgba_to_rgb, simple_remove
4
+ from app.custom_models.utils import load_pipeline
5
+ from scripts.utils import rotate_normals_torch
6
+ from scripts.all_typing import *
7
+
8
+ training_config = "app/custom_models/image2image-objaverseF-rgb2normal.yaml"
9
+ checkpoint_path = "ckpt/image2normal/unet_state_dict.pth"
10
+ trainer, pipeline = load_pipeline(training_config, checkpoint_path)
11
+ pipeline.enable_model_cpu_offload()
12
+
13
+ def predict_normals(image: List[Image.Image], guidance_scale=2., do_rotate=True, num_inference_steps=30, **kwargs):
14
+ img_list = image if isinstance(image, list) else [image]
15
+ img_list = [rgba_to_rgb(i) if i.mode == 'RGBA' else i for i in img_list]
16
+ images = trainer.pipeline_forward(
17
+ pipeline=pipeline,
18
+ image=img_list,
19
+ num_inference_steps=num_inference_steps,
20
+ guidance_scale=guidance_scale,
21
+ **kwargs
22
+ ).images
23
+ images = simple_remove(images)
24
+ if do_rotate and len(images) > 1:
25
+ images = rotate_normals_torch(images, return_types='pil')
26
+ return images
app/custom_models/utils.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from typing import List
3
+ from dataclasses import dataclass
4
+ from app.utils import rgba_to_rgb
5
+ from custum_3d_diffusion.trainings.config_classes import ExprimentConfig, TrainerSubConfig
6
+ from custum_3d_diffusion import modules
7
+ from custum_3d_diffusion.custum_modules.unifield_processor import AttnConfig, ConfigurableUNet2DConditionModel
8
+ from custum_3d_diffusion.trainings.base import BasicTrainer
9
+ from custum_3d_diffusion.trainings.utils import load_config
10
+
11
+
12
+ @dataclass
13
+ class FakeAccelerator:
14
+ device: torch.device = torch.device("cuda")
15
+
16
+
17
+ def init_trainers(cfg_path: str, weight_dtype: torch.dtype, extras: dict):
18
+ accelerator = FakeAccelerator()
19
+ cfg: ExprimentConfig = load_config(ExprimentConfig, cfg_path, extras)
20
+ init_config: AttnConfig = load_config(AttnConfig, cfg.init_config)
21
+ configurable_unet = ConfigurableUNet2DConditionModel(init_config, weight_dtype)
22
+ configurable_unet.enable_xformers_memory_efficient_attention()
23
+ trainer_cfgs: List[TrainerSubConfig] = [load_config(TrainerSubConfig, trainer) for trainer in cfg.trainers]
24
+ trainers: List[BasicTrainer] = [modules.find(trainer.trainer_type)(accelerator, None, configurable_unet, trainer.trainer, weight_dtype, i) for i, trainer in enumerate(trainer_cfgs)]
25
+ return trainers, configurable_unet
26
+
27
+ from app.utils import make_image_grid, split_image
28
+ def process_image(function, img, guidance_scale=2., merged_image=False, remove_bg=True):
29
+ from rembg import remove
30
+ if remove_bg:
31
+ img = remove(img)
32
+ img = rgba_to_rgb(img)
33
+ if merged_image:
34
+ img = split_image(img, rows=2)
35
+ images = function(
36
+ image=img,
37
+ guidance_scale=guidance_scale,
38
+ )
39
+ if len(images) > 1:
40
+ return make_image_grid(images, rows=2)
41
+ else:
42
+ return images[0]
43
+
44
+
45
+ def process_text(trainer, pipeline, img, guidance_scale=2.):
46
+ pipeline.cfg.validation_prompts = [img]
47
+ titles, images = trainer.batched_validation_forward(pipeline, guidance_scale=[guidance_scale])
48
+ return images[0]
49
+
50
+
51
+ def load_pipeline(config_path, ckpt_path, pipeline_filter=lambda x: True, weight_dtype = torch.bfloat16):
52
+ training_config = config_path
53
+ load_from_checkpoint = ckpt_path
54
+ extras = []
55
+ device = "cuda"
56
+ trainers, configurable_unet = init_trainers(training_config, weight_dtype, extras)
57
+ shared_modules = dict()
58
+ for trainer in trainers:
59
+ shared_modules = trainer.init_shared_modules(shared_modules)
60
+
61
+ if load_from_checkpoint is not None:
62
+ state_dict = torch.load(load_from_checkpoint)
63
+ configurable_unet.unet.load_state_dict(state_dict, strict=False)
64
+ # Move unet, vae and text_encoder to device and cast to weight_dtype
65
+ configurable_unet.unet.to(device, dtype=weight_dtype)
66
+
67
+ pipeline = None
68
+ trainer_out = None
69
+ for trainer in trainers:
70
+ if pipeline_filter(trainer.cfg.trainer_name):
71
+ pipeline = trainer.construct_pipeline(shared_modules, configurable_unet.unet)
72
+ pipeline.set_progress_bar_config(disable=False)
73
+ trainer_out = trainer
74
+ pipeline = pipeline.to(device)
75
+ return trainer_out, pipeline
app/examples/Groot.png ADDED

Git LFS Details

  • SHA256: e9096d048ec8deb3673765c577c7030118a75fc87d3da08cec657f66dfd22479
  • Pointer size: 131 Bytes
  • Size of remote file: 778 kB
app/examples/aaa.png ADDED

Git LFS Details

  • SHA256: 0733f0c5ed507e3fc0a9f921c1b078e7a66526335ee8efee61e919233a05a1c1
  • Pointer size: 131 Bytes
  • Size of remote file: 903 kB
app/examples/abma.png ADDED

Git LFS Details

  • SHA256: 24640851ccf40f2e61313c81e702abffe2361f1c5a1ab6e5b46f328daba103b3
  • Pointer size: 130 Bytes
  • Size of remote file: 93.5 kB
app/examples/akun.png ADDED

Git LFS Details

  • SHA256: b60404d448f09a3c11147f5d9e0e0544f0c2d4473425f110ded783cebf9c1f76
  • Pointer size: 131 Bytes
  • Size of remote file: 181 kB
app/examples/anya.png ADDED

Git LFS Details

  • SHA256: eb2ae59e3bb9c028f12c6c587cae7219c389df4593379c74211a6c643cf0ffa7
  • Pointer size: 131 Bytes
  • Size of remote file: 612 kB
app/examples/bag.png ADDED

Git LFS Details

  • SHA256: ac798ea1f112091c04f5bdfa47c490806fb433a02fe17758aa1f8c55cd64b66e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.54 MB
app/examples/generated_1715761545_frame0.png ADDED

Git LFS Details

  • SHA256: ff813fe203a97a916bc73fa2bb61229c6c81884484cee1da53ff131093780636
  • Pointer size: 131 Bytes
  • Size of remote file: 208 kB
app/examples/generated_1715762357_frame0.png ADDED

Git LFS Details

  • SHA256: 4f211e298d5e6ffc2fc7d8ad5133e81b471d13ab6398931e8386ea9698021b4b
  • Pointer size: 131 Bytes
  • Size of remote file: 235 kB
app/examples/generated_1715763329_frame0.png ADDED

Git LFS Details

  • SHA256: e86aee7707d9870e1f56a24be9c52ff42048d4f45ed39d52e86f293336189580
  • Pointer size: 131 Bytes
  • Size of remote file: 182 kB
app/examples/hatsune_miku.png ADDED

Git LFS Details

  • SHA256: fbb6285c5a9a670bdee0992c6db2e43b51c584f3adb052d89136000b52eedc97
  • Pointer size: 130 Bytes
  • Size of remote file: 96.2 kB
app/examples/princess-large.png ADDED

Git LFS Details

  • SHA256: 203fd1fef34720656e51d27b1bfdc8c0a082a9fbbf48f3100039a63dcc59fd57
  • Pointer size: 130 Bytes
  • Size of remote file: 65.5 kB
app/examples/shoe.png ADDED

Git LFS Details

  • SHA256: 2b3798b58377246626b0ff7d38fd0a5ff028399b3e5b9b53b92785707a3ca081
  • Pointer size: 131 Bytes
  • Size of remote file: 249 kB
app/gradio_3dgen.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ from PIL import Image
4
+ from pytorch3d.structures import Meshes
5
+ from app.utils import clean_up
6
+ from app.custom_models.mvimg_prediction import run_mvprediction
7
+ from app.custom_models.normal_prediction import predict_normals
8
+ from scripts.refine_lr_to_sr import run_sr_fast
9
+ from scripts.utils import save_glb_and_video
10
+ from scripts.multiview_inference import geo_reconstruct
11
+
12
+ def generate3dv2(preview_img, input_processing, seed, render_video=True, do_refine=True, expansion_weight=0.1, init_type="std"):
13
+ if preview_img is None:
14
+ raise gr.Error("preview_img is none")
15
+ if isinstance(preview_img, str):
16
+ preview_img = Image.open(preview_img)
17
+
18
+ if preview_img.size[0] <= 512:
19
+ preview_img = run_sr_fast([preview_img])[0]
20
+ rgb_pils, front_pil = run_mvprediction(preview_img, remove_bg=input_processing, seed=int(seed)) # 6s
21
+ new_meshes = geo_reconstruct(rgb_pils, None, front_pil, do_refine=do_refine, predict_normal=True, expansion_weight=expansion_weight, init_type=init_type)
22
+ vertices = new_meshes.verts_packed()
23
+ vertices = vertices / 2 * 1.35
24
+ vertices[..., [0, 2]] = - vertices[..., [0, 2]]
25
+ new_meshes = Meshes(verts=[vertices], faces=new_meshes.faces_list(), textures=new_meshes.textures)
26
+
27
+ ret_mesh, video = save_glb_and_video("/tmp/gradio/generated", new_meshes, with_timestamp=True, dist=3.5, fov_in_degrees=2 / 1.35, cam_type="ortho", export_video=render_video)
28
+ return ret_mesh, video
29
+
30
+ #######################################
31
+ def create_ui(concurrency_id="wkl"):
32
+ with gr.Row():
33
+ with gr.Column(scale=2):
34
+ input_image = gr.Image(type='pil', image_mode='RGBA', label='Frontview')
35
+
36
+ example_folder = os.path.join(os.path.dirname(__file__), "./examples")
37
+ example_fns = sorted([os.path.join(example_folder, example) for example in os.listdir(example_folder)])
38
+ gr.Examples(
39
+ examples=example_fns,
40
+ inputs=[input_image],
41
+ cache_examples=False,
42
+ label='Examples (click one of the images below to start)',
43
+ examples_per_page=12
44
+ )
45
+
46
+
47
+ with gr.Column(scale=3):
48
+ # export mesh display
49
+ output_mesh = gr.Model3D(value=None, label="Mesh Model", show_label=True, height=320)
50
+ output_video = gr.Video(label="Preview", show_label=True, show_share_button=True, height=320, visible=False)
51
+
52
+ input_processing = gr.Checkbox(
53
+ value=True,
54
+ label='Remove Background',
55
+ visible=True,
56
+ )
57
+ do_refine = gr.Checkbox(value=True, label="Refine Multiview Details", visible=False)
58
+ expansion_weight = gr.Slider(minimum=-1., maximum=1.0, value=0.1, step=0.1, label="Expansion Weight", visible=False)
59
+ init_type = gr.Dropdown(choices=["std", "thin"], label="Mesh Initialization", value="std", visible=False)
60
+ setable_seed = gr.Slider(-1, 1000000000, -1, step=1, visible=True, label="Seed")
61
+ render_video = gr.Checkbox(value=False, visible=False, label="generate video")
62
+ fullrunv2_btn = gr.Button('Generate 3D', interactive=True)
63
+
64
+ fullrunv2_btn.click(
65
+ fn = generate3dv2,
66
+ inputs=[input_image, input_processing, setable_seed, render_video, do_refine, expansion_weight, init_type],
67
+ outputs=[output_mesh, output_video],
68
+ concurrency_id=concurrency_id,
69
+ api_name="generate3dv2",
70
+ ).success(clean_up, api_name=False)
71
+ return input_image
app/gradio_3dgen_steps.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from PIL import Image
3
+
4
+ from app.custom_models.mvimg_prediction import run_mvprediction
5
+ from app.utils import make_image_grid, split_image
6
+ from scripts.utils import save_glb_and_video
7
+
8
+ def concept_to_multiview(preview_img, input_processing, seed, guidance=1.):
9
+ seed = int(seed)
10
+ if preview_img is None:
11
+ raise gr.Error("preview_img is none.")
12
+ if isinstance(preview_img, str):
13
+ preview_img = Image.open(preview_img)
14
+
15
+ rgb_pils, front_pil = run_mvprediction(preview_img, remove_bg=input_processing, seed=seed, guidance_scale=guidance)
16
+ rgb_pil = make_image_grid(rgb_pils, rows=2)
17
+ return rgb_pil, front_pil
18
+
19
+ def concept_to_multiview_ui(concurrency_id="wkl"):
20
+ with gr.Row():
21
+ with gr.Column(scale=2):
22
+ preview_img = gr.Image(type='pil', image_mode='RGBA', label='Frontview')
23
+ input_processing = gr.Checkbox(
24
+ value=True,
25
+ label='Remove Background',
26
+ )
27
+ seed = gr.Slider(minimum=-1, maximum=1000000000, value=-1, step=1.0, label="seed")
28
+ guidance = gr.Slider(minimum=1.0, maximum=5.0, value=1.0, label="Guidance Scale", step=0.5)
29
+ run_btn = gr.Button('Generate Multiview', interactive=True)
30
+ with gr.Column(scale=3):
31
+ # export mesh display
32
+ output_rgb = gr.Image(type='pil', label="RGB", show_label=True)
33
+ output_front = gr.Image(type='pil', image_mode='RGBA', label="Frontview", show_label=True)
34
+ run_btn.click(
35
+ fn = concept_to_multiview,
36
+ inputs=[preview_img, input_processing, seed, guidance],
37
+ outputs=[output_rgb, output_front],
38
+ concurrency_id=concurrency_id,
39
+ api_name=False,
40
+ )
41
+ return output_rgb, output_front
42
+
43
+ from app.custom_models.normal_prediction import predict_normals
44
+ from scripts.multiview_inference import geo_reconstruct
45
+ def multiview_to_mesh_v2(rgb_pil, normal_pil, front_pil, do_refine=False, expansion_weight=0.1, init_type="std"):
46
+ rgb_pils = split_image(rgb_pil, rows=2)
47
+ if normal_pil is not None:
48
+ normal_pil = split_image(normal_pil, rows=2)
49
+ if front_pil is None:
50
+ front_pil = rgb_pils[0]
51
+ new_meshes = geo_reconstruct(rgb_pils, normal_pil, front_pil, do_refine=do_refine, predict_normal=normal_pil is None, expansion_weight=expansion_weight, init_type=init_type)
52
+ ret_mesh, video = save_glb_and_video("/tmp/gradio/generated", new_meshes, with_timestamp=True, dist=3.5, fov_in_degrees=2 / 1.35, cam_type="ortho", export_video=False)
53
+ return ret_mesh
54
+
55
+ def new_multiview_to_mesh_ui(concurrency_id="wkl"):
56
+ with gr.Row():
57
+ with gr.Column(scale=2):
58
+ rgb_pil = gr.Image(type='pil', image_mode='RGB', label='RGB')
59
+ front_pil = gr.Image(type='pil', image_mode='RGBA', label='Frontview(Optinal)')
60
+ normal_pil = gr.Image(type='pil', image_mode='RGBA', label='Normal(Optinal)')
61
+ do_refine = gr.Checkbox(
62
+ value=False,
63
+ label='Refine rgb',
64
+ visible=False,
65
+ )
66
+ expansion_weight = gr.Slider(minimum=-1.0, maximum=1.0, value=0.1, step=0.1, label="Expansion Weight", visible=False)
67
+ init_type = gr.Dropdown(choices=["std", "thin"], label="Mesh initialization", value="std", visible=False)
68
+ run_btn = gr.Button('Generate 3D', interactive=True)
69
+ with gr.Column(scale=3):
70
+ # export mesh display
71
+ output_mesh = gr.Model3D(value=None, label="mesh model", show_label=True)
72
+ run_btn.click(
73
+ fn = multiview_to_mesh_v2,
74
+ inputs=[rgb_pil, normal_pil, front_pil, do_refine, expansion_weight, init_type],
75
+ outputs=[output_mesh],
76
+ concurrency_id=concurrency_id,
77
+ api_name="multiview_to_mesh",
78
+ )
79
+ return rgb_pil, front_pil, output_mesh
80
+
81
+
82
+ #######################################
83
+ def create_step_ui(concurrency_id="wkl"):
84
+ with gr.Tab(label="3D:concept_to_multiview"):
85
+ concept_to_multiview_ui(concurrency_id)
86
+ with gr.Tab(label="3D:new_multiview_to_mesh"):
87
+ new_multiview_to_mesh_ui(concurrency_id)
app/gradio_local.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ if __name__ == "__main__":
2
+ import os
3
+ import sys
4
+ sys.path.append(os.curdir)
5
+ if 'CUDA_VISIBLE_DEVICES' not in os.environ:
6
+ os.environ['CUDA_VISIBLE_DEVICES'] = '0'
7
+ os.environ['TRANSFORMERS_OFFLINE']='0'
8
+ os.environ['DIFFUSERS_OFFLINE']='0'
9
+ os.environ['HF_HUB_OFFLINE']='0'
10
+ os.environ['GRADIO_ANALYTICS_ENABLED']='False'
11
+ os.environ['HF_ENDPOINT']='https://hf-mirror.com'
12
+ import torch
13
+ torch.set_float32_matmul_precision('medium')
14
+ torch.backends.cuda.matmul.allow_tf32 = True
15
+ torch.set_grad_enabled(False)
16
+
17
+ import gradio as gr
18
+ import argparse
19
+
20
+ from app.gradio_3dgen import create_ui as create_3d_ui
21
+ # from app.gradio_3dgen_steps import create_step_ui
22
+ from app.all_models import model_zoo
23
+
24
+
25
+ _TITLE = '''Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image'''
26
+ _DESCRIPTION = '''
27
+ [Project page](https://wukailu.github.io/Unique3D/)
28
+
29
+ * High-fidelity and diverse textured meshes generated by Unique3D from single-view images.
30
+
31
+ * The demo is still under construction, and more features are expected to be implemented soon.
32
+ '''
33
+
34
+ def launch(
35
+ port,
36
+ listen=False,
37
+ share=False,
38
+ gradio_root="",
39
+ ):
40
+ model_zoo.init_models()
41
+
42
+ with gr.Blocks(
43
+ title=_TITLE,
44
+ theme=gr.themes.Monochrome(),
45
+ ) as demo:
46
+ with gr.Row():
47
+ with gr.Column(scale=1):
48
+ gr.Markdown('# ' + _TITLE)
49
+ gr.Markdown(_DESCRIPTION)
50
+ create_3d_ui("wkl")
51
+
52
+ launch_args = {}
53
+ if listen:
54
+ launch_args["server_name"] = "0.0.0.0"
55
+
56
+ demo.queue(default_concurrency_limit=1).launch(
57
+ server_port=None if port == 0 else port,
58
+ share=share,
59
+ root_path=gradio_root if gradio_root != "" else None, # "/myapp"
60
+ **launch_args,
61
+ )
62
+
63
+ if __name__ == "__main__":
64
+ parser = argparse.ArgumentParser()
65
+ args, extra = parser.parse_known_args()
66
+ parser.add_argument("--listen", action="store_true")
67
+ parser.add_argument("--port", type=int, default=0)
68
+ parser.add_argument("--share", action="store_true")
69
+ parser.add_argument("--gradio_root", default="")
70
+ args = parser.parse_args()
71
+ launch(
72
+ args.port,
73
+ listen=args.listen,
74
+ share=args.share,
75
+ gradio_root=args.gradio_root,
76
+ )
app/utils.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import numpy as np
3
+ from PIL import Image
4
+ import gc
5
+ import numpy as np
6
+ import numpy as np
7
+ from PIL import Image
8
+ from scripts.refine_lr_to_sr import run_sr_fast
9
+
10
+ GRADIO_CACHE = "/tmp/gradio/"
11
+
12
+ def clean_up():
13
+ torch.cuda.empty_cache()
14
+ gc.collect()
15
+
16
+ def remove_color(arr):
17
+ if arr.shape[-1] == 4:
18
+ arr = arr[..., :3]
19
+ # calc diffs
20
+ base = arr[0, 0]
21
+ diffs = np.abs(arr.astype(np.int32) - base.astype(np.int32)).sum(axis=-1)
22
+ alpha = (diffs <= 80)
23
+
24
+ arr[alpha] = 255
25
+ alpha = ~alpha
26
+ arr = np.concatenate([arr, alpha[..., None].astype(np.int32) * 255], axis=-1)
27
+ return arr
28
+
29
+ def simple_remove(imgs, run_sr=True):
30
+ """Only works for normal"""
31
+ if not isinstance(imgs, list):
32
+ imgs = [imgs]
33
+ single_input = True
34
+ else:
35
+ single_input = False
36
+ if run_sr:
37
+ imgs = run_sr_fast(imgs)
38
+ rets = []
39
+ for img in imgs:
40
+ arr = np.array(img)
41
+ arr = remove_color(arr)
42
+ rets.append(Image.fromarray(arr.astype(np.uint8)))
43
+ if single_input:
44
+ return rets[0]
45
+ return rets
46
+
47
+ def rgba_to_rgb(rgba: Image.Image, bkgd="WHITE"):
48
+ new_image = Image.new("RGBA", rgba.size, bkgd)
49
+ new_image.paste(rgba, (0, 0), rgba)
50
+ new_image = new_image.convert('RGB')
51
+ return new_image
52
+
53
+ def change_rgba_bg(rgba: Image.Image, bkgd="WHITE"):
54
+ rgb_white = rgba_to_rgb(rgba, bkgd)
55
+ new_rgba = Image.fromarray(np.concatenate([np.array(rgb_white), np.array(rgba)[:, :, 3:4]], axis=-1))
56
+ return new_rgba
57
+
58
+ def split_image(image, rows=None, cols=None):
59
+ """
60
+ inverse function of make_image_grid
61
+ """
62
+ # image is in square
63
+ if rows is None and cols is None:
64
+ # image.size [W, H]
65
+ rows = 1
66
+ cols = image.size[0] // image.size[1]
67
+ assert cols * image.size[1] == image.size[0]
68
+ subimg_size = image.size[1]
69
+ elif rows is None:
70
+ subimg_size = image.size[0] // cols
71
+ rows = image.size[1] // subimg_size
72
+ assert rows * subimg_size == image.size[1]
73
+ elif cols is None:
74
+ subimg_size = image.size[1] // rows
75
+ cols = image.size[0] // subimg_size
76
+ assert cols * subimg_size == image.size[0]
77
+ else:
78
+ subimg_size = image.size[1] // rows
79
+ assert cols * subimg_size == image.size[0]
80
+ subimgs = []
81
+ for i in range(rows):
82
+ for j in range(cols):
83
+ subimg = image.crop((j*subimg_size, i*subimg_size, (j+1)*subimg_size, (i+1)*subimg_size))
84
+ subimgs.append(subimg)
85
+ return subimgs
86
+
87
+ def make_image_grid(images, rows=None, cols=None, resize=None):
88
+ if rows is None and cols is None:
89
+ rows = 1
90
+ cols = len(images)
91
+ if rows is None:
92
+ rows = len(images) // cols
93
+ if len(images) % cols != 0:
94
+ rows += 1
95
+ if cols is None:
96
+ cols = len(images) // rows
97
+ if len(images) % rows != 0:
98
+ cols += 1
99
+ total_imgs = rows * cols
100
+ if total_imgs > len(images):
101
+ images += [Image.new(images[0].mode, images[0].size) for _ in range(total_imgs - len(images))]
102
+
103
+ if resize is not None:
104
+ images = [img.resize((resize, resize)) for img in images]
105
+
106
+ w, h = images[0].size
107
+ grid = Image.new(images[0].mode, size=(cols * w, rows * h))
108
+
109
+ for i, img in enumerate(images):
110
+ grid.paste(img, box=(i % cols * w, i // cols * h))
111
+ return grid
112
+
assets/teaser.jpg ADDED
ckpt/controlnet-tile/config.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lllyasviel/control_v11f1e_sd15_tile",
5
+ "act_fn": "silu",
6
+ "addition_embed_type": null,
7
+ "addition_embed_type_num_heads": 64,
8
+ "addition_time_embed_dim": null,
9
+ "attention_head_dim": 8,
10
+ "block_out_channels": [
11
+ 320,
12
+ 640,
13
+ 1280,
14
+ 1280
15
+ ],
16
+ "class_embed_type": null,
17
+ "conditioning_channels": 3,
18
+ "conditioning_embedding_out_channels": [
19
+ 16,
20
+ 32,
21
+ 96,
22
+ 256
23
+ ],
24
+ "controlnet_conditioning_channel_order": "rgb",
25
+ "cross_attention_dim": 768,
26
+ "down_block_types": [
27
+ "CrossAttnDownBlock2D",
28
+ "CrossAttnDownBlock2D",
29
+ "CrossAttnDownBlock2D",
30
+ "DownBlock2D"
31
+ ],
32
+ "downsample_padding": 1,
33
+ "encoder_hid_dim": null,
34
+ "encoder_hid_dim_type": null,
35
+ "flip_sin_to_cos": true,
36
+ "freq_shift": 0,
37
+ "global_pool_conditions": false,
38
+ "in_channels": 4,
39
+ "layers_per_block": 2,
40
+ "mid_block_scale_factor": 1,
41
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
42
+ "norm_eps": 1e-05,
43
+ "norm_num_groups": 32,
44
+ "num_attention_heads": null,
45
+ "num_class_embeds": null,
46
+ "only_cross_attention": false,
47
+ "projection_class_embeddings_input_dim": null,
48
+ "resnet_time_scale_shift": "default",
49
+ "transformer_layers_per_block": 1,
50
+ "upcast_attention": false,
51
+ "use_linear_projection": false
52
+ }
ckpt/controlnet-tile/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:845d3845053912728cd1453029a0ef87d3c0a3082a083ba393f36eaa5fb0e218
3
+ size 1445157120
ckpt/image2normal/feature_extractor/preprocessor_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_valid_processor_keys": [
3
+ "images",
4
+ "do_resize",
5
+ "size",
6
+ "resample",
7
+ "do_center_crop",
8
+ "crop_size",
9
+ "do_rescale",
10
+ "rescale_factor",
11
+ "do_normalize",
12
+ "image_mean",
13
+ "image_std",
14
+ "do_convert_rgb",
15
+ "return_tensors",
16
+ "data_format",
17
+ "input_data_format"
18
+ ],
19
+ "crop_size": {
20
+ "height": 224,
21
+ "width": 224
22
+ },
23
+ "do_center_crop": true,
24
+ "do_convert_rgb": true,
25
+ "do_normalize": true,
26
+ "do_rescale": true,
27
+ "do_resize": true,
28
+ "image_mean": [
29
+ 0.48145466,
30
+ 0.4578275,
31
+ 0.40821073
32
+ ],
33
+ "image_processor_type": "CLIPImageProcessor",
34
+ "image_std": [
35
+ 0.26862954,
36
+ 0.26130258,
37
+ 0.27577711
38
+ ],
39
+ "resample": 3,
40
+ "rescale_factor": 0.00392156862745098,
41
+ "size": {
42
+ "shortest_edge": 224
43
+ }
44
+ }
ckpt/image2normal/image_encoder/config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
3
+ "architectures": [
4
+ "CLIPVisionModelWithProjection"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "dropout": 0.0,
8
+ "hidden_act": "quick_gelu",
9
+ "hidden_size": 1024,
10
+ "image_size": 224,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-05,
15
+ "model_type": "clip_vision_model",
16
+ "num_attention_heads": 16,
17
+ "num_channels": 3,
18
+ "num_hidden_layers": 24,
19
+ "patch_size": 14,
20
+ "projection_dim": 768,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.39.3"
23
+ }
ckpt/image2normal/image_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4b33d864f89a793357a768cb07d0dc18d6a14e6664f4110a0d535ca9ba78da8
3
+ size 607980488
ckpt/image2normal/model_index.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionImageCustomPipeline",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
5
+ "feature_extractor": [
6
+ "transformers",
7
+ "CLIPImageProcessor"
8
+ ],
9
+ "image_encoder": [
10
+ "transformers",
11
+ "CLIPVisionModelWithProjection"
12
+ ],
13
+ "noisy_cond_latents": false,
14
+ "requires_safety_checker": true,
15
+ "safety_checker": [
16
+ null,
17
+ null
18
+ ],
19
+ "scheduler": [
20
+ "diffusers",
21
+ "EulerAncestralDiscreteScheduler"
22
+ ],
23
+ "unet": [
24
+ "diffusers",
25
+ "UNet2DConditionModel"
26
+ ],
27
+ "vae": [
28
+ "diffusers",
29
+ "AutoencoderKL"
30
+ ]
31
+ }
ckpt/image2normal/scheduler/scheduler_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "EulerAncestralDiscreteScheduler",
3
+ "_diffusers_version": "0.27.2",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "num_train_timesteps": 1000,
9
+ "prediction_type": "epsilon",
10
+ "rescale_betas_zero_snr": false,
11
+ "set_alpha_to_one": false,
12
+ "skip_prk_steps": true,
13
+ "steps_offset": 1,
14
+ "timestep_spacing": "linspace",
15
+ "trained_betas": null
16
+ }
ckpt/image2normal/unet/config.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UnifieldWrappedUNet",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
5
+ "act_fn": "silu",
6
+ "addition_embed_type": null,
7
+ "addition_embed_type_num_heads": 64,
8
+ "addition_time_embed_dim": null,
9
+ "attention_head_dim": 8,
10
+ "attention_type": "default",
11
+ "block_out_channels": [
12
+ 320,
13
+ 640,
14
+ 1280,
15
+ 1280
16
+ ],
17
+ "center_input_sample": false,
18
+ "class_embed_type": null,
19
+ "class_embeddings_concat": false,
20
+ "conv_in_kernel": 3,
21
+ "conv_out_kernel": 3,
22
+ "cross_attention_dim": 768,
23
+ "cross_attention_norm": null,
24
+ "down_block_types": [
25
+ "CrossAttnDownBlock2D",
26
+ "CrossAttnDownBlock2D",
27
+ "CrossAttnDownBlock2D",
28
+ "DownBlock2D"
29
+ ],
30
+ "downsample_padding": 1,
31
+ "dropout": 0.0,
32
+ "dual_cross_attention": false,
33
+ "encoder_hid_dim": null,
34
+ "encoder_hid_dim_type": null,
35
+ "flip_sin_to_cos": true,
36
+ "freq_shift": 0,
37
+ "in_channels": 4,
38
+ "layers_per_block": 2,
39
+ "mid_block_only_cross_attention": null,
40
+ "mid_block_scale_factor": 1,
41
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
42
+ "norm_eps": 1e-05,
43
+ "norm_num_groups": 32,
44
+ "num_attention_heads": null,
45
+ "num_class_embeds": null,
46
+ "only_cross_attention": false,
47
+ "out_channels": 4,
48
+ "projection_class_embeddings_input_dim": null,
49
+ "resnet_out_scale_factor": 1.0,
50
+ "resnet_skip_time_act": false,
51
+ "resnet_time_scale_shift": "default",
52
+ "reverse_transformer_layers_per_block": null,
53
+ "sample_size": 64,
54
+ "time_cond_proj_dim": null,
55
+ "time_embedding_act_fn": null,
56
+ "time_embedding_dim": null,
57
+ "time_embedding_type": "positional",
58
+ "timestep_post_act": null,
59
+ "transformer_layers_per_block": 1,
60
+ "up_block_types": [
61
+ "UpBlock2D",
62
+ "CrossAttnUpBlock2D",
63
+ "CrossAttnUpBlock2D",
64
+ "CrossAttnUpBlock2D"
65
+ ],
66
+ "upcast_attention": false,
67
+ "use_linear_projection": false
68
+ }
ckpt/image2normal/unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5cbaf1d56619345ce78de8cfbb20d94923b3305a364bf6a5b2a2cc422d4b701
3
+ size 3537503456
ckpt/image2normal/unet_state_dict.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8df80d09e953d338aa6d8decd0351c5045f52ec6e2645eee2027ccb8792c8ed8
3
+ size 3537964654
ckpt/image2normal/vae/config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "force_upcast": true,
19
+ "in_channels": 3,
20
+ "latent_channels": 4,
21
+ "latents_mean": null,
22
+ "latents_std": null,
23
+ "layers_per_block": 2,
24
+ "norm_num_groups": 32,
25
+ "out_channels": 3,
26
+ "sample_size": 256,
27
+ "scaling_factor": 0.18215,
28
+ "up_block_types": [
29
+ "UpDecoderBlock2D",
30
+ "UpDecoderBlock2D",
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D"
33
+ ]
34
+ }
ckpt/image2normal/vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d0c34f57abe50f323040f2366c8e22b941068dcdf53c8eb1d6fafb838afecb7
3
+ size 167335590
ckpt/img2mvimg/feature_extractor/preprocessor_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_valid_processor_keys": [
3
+ "images",
4
+ "do_resize",
5
+ "size",
6
+ "resample",
7
+ "do_center_crop",
8
+ "crop_size",
9
+ "do_rescale",
10
+ "rescale_factor",
11
+ "do_normalize",
12
+ "image_mean",
13
+ "image_std",
14
+ "do_convert_rgb",
15
+ "return_tensors",
16
+ "data_format",
17
+ "input_data_format"
18
+ ],
19
+ "crop_size": {
20
+ "height": 224,
21
+ "width": 224
22
+ },
23
+ "do_center_crop": true,
24
+ "do_convert_rgb": true,
25
+ "do_normalize": true,
26
+ "do_rescale": true,
27
+ "do_resize": true,
28
+ "image_mean": [
29
+ 0.48145466,
30
+ 0.4578275,
31
+ 0.40821073
32
+ ],
33
+ "image_processor_type": "CLIPImageProcessor",
34
+ "image_std": [
35
+ 0.26862954,
36
+ 0.26130258,
37
+ 0.27577711
38
+ ],
39
+ "resample": 3,
40
+ "rescale_factor": 0.00392156862745098,
41
+ "size": {
42
+ "shortest_edge": 224
43
+ }
44
+ }
ckpt/img2mvimg/image_encoder/config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
3
+ "architectures": [
4
+ "CLIPVisionModelWithProjection"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "dropout": 0.0,
8
+ "hidden_act": "quick_gelu",
9
+ "hidden_size": 1024,
10
+ "image_size": 224,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-05,
15
+ "model_type": "clip_vision_model",
16
+ "num_attention_heads": 16,
17
+ "num_channels": 3,
18
+ "num_hidden_layers": 24,
19
+ "patch_size": 14,
20
+ "projection_dim": 768,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.39.3"
23
+ }
ckpt/img2mvimg/image_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77b33d2a3a643650857672e880ccf73adbaf114fbbadec36d142ee9d48af7e20
3
+ size 1215912728
ckpt/img2mvimg/model_index.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionImage2MVCustomPipeline",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
5
+ "condition_offset": true,
6
+ "feature_extractor": [
7
+ "transformers",
8
+ "CLIPImageProcessor"
9
+ ],
10
+ "image_encoder": [
11
+ "transformers",
12
+ "CLIPVisionModelWithProjection"
13
+ ],
14
+ "requires_safety_checker": true,
15
+ "safety_checker": [
16
+ null,
17
+ null
18
+ ],
19
+ "scheduler": [
20
+ "diffusers",
21
+ "DDIMScheduler"
22
+ ],
23
+ "unet": [
24
+ "diffusers",
25
+ "UNet2DConditionModel"
26
+ ],
27
+ "vae": [
28
+ "diffusers",
29
+ "AutoencoderKL"
30
+ ]
31
+ }
ckpt/img2mvimg/scheduler/scheduler_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "DDIMScheduler",
3
+ "_diffusers_version": "0.27.2",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "clip_sample_range": 1.0,
9
+ "dynamic_thresholding_ratio": 0.995,
10
+ "num_train_timesteps": 1000,
11
+ "prediction_type": "epsilon",
12
+ "rescale_betas_zero_snr": false,
13
+ "sample_max_value": 1.0,
14
+ "set_alpha_to_one": false,
15
+ "skip_prk_steps": true,
16
+ "steps_offset": 1,
17
+ "thresholding": false,
18
+ "timestep_spacing": "leading",
19
+ "trained_betas": null
20
+ }
ckpt/img2mvimg/unet/config.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UnifieldWrappedUNet",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
5
+ "act_fn": "silu",
6
+ "addition_embed_type": null,
7
+ "addition_embed_type_num_heads": 64,
8
+ "addition_time_embed_dim": null,
9
+ "attention_head_dim": 8,
10
+ "attention_type": "default",
11
+ "block_out_channels": [
12
+ 320,
13
+ 640,
14
+ 1280,
15
+ 1280
16
+ ],
17
+ "center_input_sample": false,
18
+ "class_embed_type": null,
19
+ "class_embeddings_concat": false,
20
+ "conv_in_kernel": 3,
21
+ "conv_out_kernel": 3,
22
+ "cross_attention_dim": 768,
23
+ "cross_attention_norm": null,
24
+ "down_block_types": [
25
+ "CrossAttnDownBlock2D",
26
+ "CrossAttnDownBlock2D",
27
+ "CrossAttnDownBlock2D",
28
+ "DownBlock2D"
29
+ ],
30
+ "downsample_padding": 1,
31
+ "dropout": 0.0,
32
+ "dual_cross_attention": false,
33
+ "encoder_hid_dim": null,
34
+ "encoder_hid_dim_type": null,
35
+ "flip_sin_to_cos": true,
36
+ "freq_shift": 0,
37
+ "in_channels": 8,
38
+ "layers_per_block": 2,
39
+ "mid_block_only_cross_attention": null,
40
+ "mid_block_scale_factor": 1,
41
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
42
+ "norm_eps": 1e-05,
43
+ "norm_num_groups": 32,
44
+ "num_attention_heads": null,
45
+ "num_class_embeds": 8,
46
+ "only_cross_attention": false,
47
+ "out_channels": 4,
48
+ "projection_class_embeddings_input_dim": null,
49
+ "resnet_out_scale_factor": 1.0,
50
+ "resnet_skip_time_act": false,
51
+ "resnet_time_scale_shift": "default",
52
+ "reverse_transformer_layers_per_block": null,
53
+ "sample_size": 64,
54
+ "time_cond_proj_dim": null,
55
+ "time_embedding_act_fn": null,
56
+ "time_embedding_dim": null,
57
+ "time_embedding_type": "positional",
58
+ "timestep_post_act": null,
59
+ "transformer_layers_per_block": 1,
60
+ "up_block_types": [
61
+ "UpBlock2D",
62
+ "CrossAttnUpBlock2D",
63
+ "CrossAttnUpBlock2D",
64
+ "CrossAttnUpBlock2D"
65
+ ],
66
+ "upcast_attention": false,
67
+ "use_linear_projection": false
68
+ }
ckpt/img2mvimg/unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93a3b4e678efac0c997e76df465df13136a4b0f1732e534a1200fad9e04cd0f9
3
+ size 3438254688
ckpt/img2mvimg/unet_state_dict.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dff2fdba450af0e10c3a847ba66a530170be2e9b9c9f4c834483515e82738b5
3
+ size 3438460972
ckpt/img2mvimg/vae/config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.27.2",
4
+ "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "force_upcast": true,
19
+ "in_channels": 3,
20
+ "latent_channels": 4,
21
+ "latents_mean": null,
22
+ "latents_std": null,
23
+ "layers_per_block": 2,
24
+ "norm_num_groups": 32,
25
+ "out_channels": 3,
26
+ "sample_size": 256,
27
+ "scaling_factor": 0.18215,
28
+ "up_block_types": [
29
+ "UpDecoderBlock2D",
30
+ "UpDecoderBlock2D",
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D"
33
+ ]
34
+ }
ckpt/img2mvimg/vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2aa1f43011b553a4cba7f37456465cdbd48aab7b54b9348b890e8058ea7683ec
3
+ size 334643268
ckpt/realesrgan-x4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bc5d0c85207adad8bca26286f0c0007f266f85e7aa7c454c565da9b5f3c940a
3
+ size 67051617
ckpt/v1-inference.yaml ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ base_learning_rate: 1.0e-04
3
+ target: ldm.models.diffusion.ddpm.LatentDiffusion
4
+ params:
5
+ linear_start: 0.00085
6
+ linear_end: 0.0120
7
+ num_timesteps_cond: 1
8
+ log_every_t: 200
9
+ timesteps: 1000
10
+ first_stage_key: "jpg"
11
+ cond_stage_key: "txt"
12
+ image_size: 64
13
+ channels: 4
14
+ cond_stage_trainable: false # Note: different from the one we trained before
15
+ conditioning_key: crossattn
16
+ monitor: val/loss_simple_ema
17
+ scale_factor: 0.18215
18
+ use_ema: False
19
+
20
+ scheduler_config: # 10000 warmup steps
21
+ target: ldm.lr_scheduler.LambdaLinearScheduler
22
+ params:
23
+ warm_up_steps: [ 10000 ]
24
+ cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
25
+ f_start: [ 1.e-6 ]
26
+ f_max: [ 1. ]
27
+ f_min: [ 1. ]
28
+
29
+ unet_config:
30
+ target: ldm.modules.diffusionmodules.openaimodel.UNetModel
31
+ params:
32
+ image_size: 32 # unused
33
+ in_channels: 4
34
+ out_channels: 4
35
+ model_channels: 320
36
+ attention_resolutions: [ 4, 2, 1 ]
37
+ num_res_blocks: 2
38
+ channel_mult: [ 1, 2, 4, 4 ]
39
+ num_heads: 8
40
+ use_spatial_transformer: True
41
+ transformer_depth: 1
42
+ context_dim: 768
43
+ use_checkpoint: True
44
+ legacy: False
45
+
46
+ first_stage_config:
47
+ target: ldm.models.autoencoder.AutoencoderKL
48
+ params:
49
+ embed_dim: 4
50
+ monitor: val/rec_loss
51
+ ddconfig:
52
+ double_z: true
53
+ z_channels: 4
54
+ resolution: 256
55
+ in_channels: 3
56
+ out_ch: 3
57
+ ch: 128
58
+ ch_mult:
59
+ - 1
60
+ - 2
61
+ - 4
62
+ - 4
63
+ num_res_blocks: 2
64
+ attn_resolutions: []
65
+ dropout: 0.0
66
+ lossconfig:
67
+ target: torch.nn.Identity
68
+
69
+ cond_stage_config:
70
+ target: ldm.modules.encoders.modules.FrozenCLIPEmbedder