Spaces:

Wuvin
/

Unique3D

Build error

App Files Files Community

Wuvin commited on May 31, 2024

Commit

37aeb5b

1 Parent(s): 69a849f

init

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.ipynb_checkpoints/README-checkpoint.md +68 -0
README_zh.md +56 -0
app/__init__.py +0 -0
app/all_models.py +22 -0
app/custom_models/image2image-objaverseF-rgb2normal.yaml +61 -0
app/custom_models/image2mvimage-objaverseFrot-wonder3d.yaml +63 -0
app/custom_models/mvimg_prediction.py +57 -0
app/custom_models/normal_prediction.py +26 -0
app/custom_models/utils.py +75 -0
app/examples/Groot.png +3 -0
app/examples/aaa.png +3 -0
app/examples/abma.png +3 -0
app/examples/akun.png +3 -0
app/examples/anya.png +3 -0
app/examples/bag.png +3 -0
app/examples/generated_1715761545_frame0.png +3 -0
app/examples/generated_1715762357_frame0.png +3 -0
app/examples/generated_1715763329_frame0.png +3 -0
app/examples/hatsune_miku.png +3 -0
app/examples/princess-large.png +3 -0
app/examples/shoe.png +3 -0
app/gradio_3dgen.py +71 -0
app/gradio_3dgen_steps.py +87 -0
app/gradio_local.py +76 -0
app/utils.py +112 -0
assets/teaser.jpg +0 -0
ckpt/controlnet-tile/config.json +52 -0
ckpt/controlnet-tile/diffusion_pytorch_model.safetensors +3 -0
ckpt/image2normal/feature_extractor/preprocessor_config.json +44 -0
ckpt/image2normal/image_encoder/config.json +23 -0
ckpt/image2normal/image_encoder/model.safetensors +3 -0
ckpt/image2normal/model_index.json +31 -0
ckpt/image2normal/scheduler/scheduler_config.json +16 -0
ckpt/image2normal/unet/config.json +68 -0
ckpt/image2normal/unet/diffusion_pytorch_model.safetensors +3 -0
ckpt/image2normal/unet_state_dict.pth +3 -0
ckpt/image2normal/vae/config.json +34 -0
ckpt/image2normal/vae/diffusion_pytorch_model.safetensors +3 -0
ckpt/img2mvimg/feature_extractor/preprocessor_config.json +44 -0
ckpt/img2mvimg/image_encoder/config.json +23 -0
ckpt/img2mvimg/image_encoder/model.safetensors +3 -0
ckpt/img2mvimg/model_index.json +31 -0
ckpt/img2mvimg/scheduler/scheduler_config.json +20 -0
ckpt/img2mvimg/unet/config.json +68 -0
ckpt/img2mvimg/unet/diffusion_pytorch_model.safetensors +3 -0
ckpt/img2mvimg/unet_state_dict.pth +3 -0
ckpt/img2mvimg/vae/config.json +34 -0
ckpt/img2mvimg/vae/diffusion_pytorch_model.safetensors +3 -0
ckpt/realesrgan-x4.onnx +3 -0
ckpt/v1-inference.yaml +70 -0

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,68 @@

+**中文版本 [中文](README_zh.md)**
+# Unique3D
+High-Quality and Efficient 3D Mesh Generation from a Single Image
+## [Paper]() | [Project page](https://wukailu.github.io/Unique3D/) | [Huggingface Demo]() | [Online Demo](https://www.aiuni.ai/)
+![](assets/fig_teaser.png)
+High-fidelity and diverse textured meshes generated by Unique3D from single-view wild images in 30 seconds.
+## More features
+The repo is still being under construction, thanks for your patience.
+- [x] Local gradio demo.
+- [ ] Detailed tutorial.
+- [ ] Huggingface demo.
+- [ ] Detailed local demo.
+- [ ] Comfyui support.
+- [ ] Windows support.
+- [ ] Docker support.
+- [ ] More stable reconstruction with normal.
+- [ ] Training code release.
+## Preparation for inference
+### Linux System Setup.
+```angular2html
+conda create -n unique3d
+conda activate unique3d
+pip install -r requirements.txt
+```
+### Interactive inference: run your local gradio demo.
+1. Download the [ckpt.zip](), and extract it to `ckpt/*`.
+```
+Unique3D
+    ├──ckpt
+        ├── controlnet-tile/
+        ├── image2normal/
+        ├── img2mvimg/
+        ├── realesrgan-x4.onnx
+        └── v1-inference.yaml
+```
+2. Run the interactive inference locally.
+```bash
+python app/gradio_local.py --port 7860
+```
+## Tips to get better results
+1. Unique3D is sensitive to the facing direction of input images. Due to the distribution of the training data, orthographic front-facing images with a rest pose always lead to good reconstructions.
+2. Images with occlusions will cause worse reconstructions, since four views cannot cover the complete object. Images with fewer occlusions lead to better results.
+3. Pass an image with as high a resolution as possible to the input when resolution is a factor.
+## Acknowledgement
+We have intensively borrowed code from the following repositories. Many thanks to the authors for sharing their code.
+- [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
+- [Wonder3d](https://github.com/xxlong0/Wonder3D)
+- [Zero123Plus](https://github.com/SUDO-AI-3D/zero123plus)
+- [Continues Remeshing](https://github.com/Profactor/continuous-remeshing)
+- [Depth from Normals](https://github.com/YertleTurtleGit/depth-from-normals)
+## Collaborations
+Our mission is to create a 4D generative model with 3D concepts. This is just our first step, and the road ahead is still long, but we are confident. We warmly invite you to join the discussion and explore potential collaborations in any capacity. <span style="color:red">**If you're interested in connecting or partnering with us, please don't hesitate to reach out via email (wkl22@mails.tsinghua.edu.cn)**</span>.

README_zh.md ADDED Viewed

	@@ -0,0 +1,56 @@

+**其他语言版本 [English](README.md)**
+# Unique3D
+High-Quality and Efficient 3D Mesh Generation from a Single Image
+## [论文]() | [项目页面](https://wukailu.github.io/Unique3D/)  | [Huggingface Demo]() | [在线演示](https://www.aiuni.ai/)
+![](assets/fig_teaser.png)
+Unique3D从单视图图像生成高保真度和多样化纹理的网格，在4090上大约需要30秒。
+### 推理准备
+#### Linux系统设置
+```angular2html
+conda create -n unique3d
+conda activate unique3d
+pip install -r requirements.txt
+```
+#### 交互式推理：运行您的本地gradio演示
+1. 下载[ckpt.zip]()，并将其解压到`ckpt/*`。
+```
+Unique3D
+    ├──ckpt
+        ├── controlnet-tile/
+        ├── image2normal/
+        ├── img2mvimg/
+        ├── realesrgan-x4.onnx
+        └── v1-inference.yaml
+```
+2. 在本地运行交互式推理。
+```bash
+python app/gradio_local.py --port 7860
+```
+## 获取更好结果的提示
+1. Unique3D对输入图像的朝向非常敏感。由于训练数据的分布，**正交正视图像**通常总是能带来良好的重建。对于人物而言，最好是 A-pose 或者 T-pose，因为目前训练数据很少含有其他类型姿态。
+2. 有遮挡的图像会导致更差的重建，因为4个视图无法覆盖完整的对象。遮挡较少的图像会带来更好的结果。
+3. 尽可能将高分辨率的图像用作输入。
+## 致谢
+我们借用了以下代码库的代码。非常感谢作者们分享他们的代码。
+- [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
+- [Wonder3d](https://github.com/xxlong0/Wonder3D)
+- [Zero123Plus](https://github.com/SUDO-AI-3D/zero123plus)
+- [Continues Remeshing](https://github.com/Profactor/continuous-remeshing)
+- [Depth from Normals](https://github.com/YertleTurtleGit/depth-from-normals)
+## 合作
+我们使命是创建一个具有3D概念的4D生成模型。这只是我们的第一步，前方的道路仍然很长，但我们有信心。我们热情邀请您加入讨论，并探索任何形式的潜在合作。<span style="color:red">**如果您有兴趣联系或与我们合作，欢迎通过电子邮件(wkl22@mails.tsinghua.edu.cn)与我们联系**</span>。

app/__init__.py ADDED Viewed

File without changes

app/all_models.py ADDED Viewed

	@@ -0,0 +1,22 @@

+import torch
+from scripts.sd_model_zoo import load_common_sd15_pipe
+from diffusers import StableDiffusionControlNetImg2ImgPipeline, StableDiffusionPipeline
+class MyModelZoo:
+    _pipe_disney_controlnet_lineart_ipadapter_i2i: StableDiffusionControlNetImg2ImgPipeline = None
+    base_model = "runwayml/stable-diffusion-v1-5"
+    def __init__(self, base_model=None) -> None:
+        if base_model is not None:
+            self.base_model = base_model
+    @property
+    def pipe_disney_controlnet_tile_ipadapter_i2i(self):
+        return self._pipe_disney_controlnet_lineart_ipadapter_i2i
+    def init_models(self):
+        self._pipe_disney_controlnet_lineart_ipadapter_i2i = load_common_sd15_pipe(base_model=self.base_model, ip_adapter=True, plus_model=False, controlnet="./ckpt/controlnet-tile", pipeline_class=StableDiffusionControlNetImg2ImgPipeline)
+model_zoo = MyModelZoo()

app/custom_models/image2image-objaverseF-rgb2normal.yaml ADDED Viewed

	@@ -0,0 +1,61 @@

+pretrained_model_name_or_path: "lambdalabs/sd-image-variations-diffusers"
+mixed_precision: "bf16"
+init_config:
+  # enable controls
+  enable_cross_attn_lora: False
+  enable_cross_attn_ip: False
+  enable_self_attn_lora: False
+  enable_self_attn_ref: True
+  enable_multiview_attn: False
+  # for cross attention
+  init_cross_attn_lora: False
+  init_cross_attn_ip: False
+  cross_attn_lora_rank: 512        # 0 for not enabled
+  cross_attn_lora_only_kv: False
+  ipadapter_pretrained_name: "h94/IP-Adapter"
+  ipadapter_subfolder_name: "models"
+  ipadapter_weight_name: "ip-adapter_sd15.safetensors"
+  ipadapter_effect_on: "all"    # all, first
+  # for self attention
+  init_self_attn_lora: False
+  self_attn_lora_rank: 512
+  self_attn_lora_only_kv: False
+  # for self attention ref
+  init_self_attn_ref: True
+  self_attn_ref_position: "attn1"
+  self_attn_ref_other_model_name: "lambdalabs/sd-image-variations-diffusers"
+  self_attn_ref_pixel_wise_crosspond: True
+  self_attn_ref_effect_on: "all"
+  # for multiview attention
+  init_multiview_attn: False
+  multiview_attn_position: "attn1"
+  num_modalities: 1
+  # for unet
+  init_unet_path: "${pretrained_model_name_or_path}"
+  init_num_cls_label: 0     # for initialize
+  cls_labels: []  # for current task
+trainers:
+  - trainer_type: "image2image_trainer"
+    trainer:
+        pretrained_model_name_or_path: "${pretrained_model_name_or_path}"
+        attn_config:
+          cls_labels: []  # for current task
+          enable_cross_attn_lora: False
+          enable_cross_attn_ip: False
+          enable_self_attn_lora: False
+          enable_self_attn_ref: True
+          enable_multiview_attn: False
+        resolution: "512"
+        condition_image_resolution: "512"
+        condition_image_column_name: "conditioning_image"
+        image_column_name: "image"

app/custom_models/image2mvimage-objaverseFrot-wonder3d.yaml ADDED Viewed

	@@ -0,0 +1,63 @@

+pretrained_model_name_or_path: "./ckpt/img2mvimg"
+mixed_precision: "bf16"
+init_config:
+  # enable controls
+  enable_cross_attn_lora: False
+  enable_cross_attn_ip: False
+  enable_self_attn_lora: False
+  enable_self_attn_ref: False
+  enable_multiview_attn: True
+  # for cross attention
+  init_cross_attn_lora: False
+  init_cross_attn_ip: False
+  cross_attn_lora_rank: 256        # 0 for not enabled
+  cross_attn_lora_only_kv: False
+  ipadapter_pretrained_name: "h94/IP-Adapter"
+  ipadapter_subfolder_name: "models"
+  ipadapter_weight_name: "ip-adapter_sd15.safetensors"
+  ipadapter_effect_on: "all"    # all, first
+  # for self attention
+  init_self_attn_lora: False
+  self_attn_lora_rank: 256
+  self_attn_lora_only_kv: False
+  # for self attention ref
+  init_self_attn_ref: False
+  self_attn_ref_position: "attn1"
+  self_attn_ref_other_model_name: "lambdalabs/sd-image-variations-diffusers"
+  self_attn_ref_pixel_wise_crosspond: False
+  self_attn_ref_effect_on: "all"
+  # for multiview attention
+  init_multiview_attn: True
+  multiview_attn_position: "attn1"
+  use_mv_joint_attn: True
+  num_modalities: 1
+  # for unet
+  init_unet_path: "${pretrained_model_name_or_path}"
+  cat_condition: True       # cat condition to input
+  # for cls embedding
+  init_num_cls_label: 8     # for initialize
+  cls_labels: [0, 1, 2, 3]  # for current task
+trainers:
+  - trainer_type: "image2mvimage_trainer"
+    trainer:
+        pretrained_model_name_or_path: "${pretrained_model_name_or_path}"
+        attn_config:
+          cls_labels: [0, 1, 2, 3]  # for current task
+          enable_cross_attn_lora: False
+          enable_cross_attn_ip: False
+          enable_self_attn_lora: False
+          enable_self_attn_ref: False
+          enable_multiview_attn: True
+        resolution: "256"
+        condition_image_resolution: "256"
+        normal_cls_offset: 4
+        condition_image_column_name: "conditioning_image"
+        image_column_name: "image"

app/custom_models/mvimg_prediction.py ADDED Viewed

	@@ -0,0 +1,57 @@

+import sys
+import torch
+import gradio as gr
+from PIL import Image
+import numpy as np
+from rembg import remove
+from app.utils import change_rgba_bg, rgba_to_rgb
+from app.custom_models.utils import load_pipeline
+from scripts.all_typing import *
+from scripts.utils import session, simple_preprocess
+training_config = "app/custom_models/image2mvimage-objaverseFrot-wonder3d.yaml"
+checkpoint_path = "ckpt/img2mvimg/unet_state_dict.pth"
+trainer, pipeline = load_pipeline(training_config, checkpoint_path)
+pipeline.enable_model_cpu_offload()
+def predict(img_list: List[Image.Image], guidance_scale=2., **kwargs):
+    if isinstance(img_list, Image.Image):
+        img_list = [img_list]
+    img_list = [rgba_to_rgb(i) if i.mode == 'RGBA' else i for i in img_list]
+    ret = []
+    for img in img_list:
+        images = trainer.pipeline_forward(
+            pipeline=pipeline,
+            image=img,
+            guidance_scale=guidance_scale,
+            **kwargs
+        ).images
+        ret.extend(images)
+    return ret
+def run_mvprediction(input_image: Image.Image, remove_bg=True, guidance_scale=1.5, seed=1145):
+    if input_image.mode == 'RGB' or np.array(input_image)[..., -1].mean() == 255.:
+        # still do remove using rembg, since simple_preprocess requires RGBA image
+        print("RGB image not RGBA! still remove bg!")
+        remove_bg = True
+    if remove_bg:
+        input_image = remove(input_image, session=session)
+    # make front_pil RGBA with white bg
+    input_image = change_rgba_bg(input_image, "white")
+    single_image = simple_preprocess(input_image)
+    generator = torch.Generator(device="cuda").manual_seed(int(seed)) if seed >= 0 else None
+    rgb_pils = predict(
+        single_image,
+        generator=generator,
+        guidance_scale=guidance_scale,
+        width=256,
+        height=256,
+        num_inference_steps=30,
+    )
+    return rgb_pils, single_image

app/custom_models/normal_prediction.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import sys
+from PIL import Image
+from app.utils import rgba_to_rgb, simple_remove
+from app.custom_models.utils import load_pipeline
+from scripts.utils import rotate_normals_torch
+from scripts.all_typing import *
+training_config = "app/custom_models/image2image-objaverseF-rgb2normal.yaml"
+checkpoint_path = "ckpt/image2normal/unet_state_dict.pth"
+trainer, pipeline = load_pipeline(training_config, checkpoint_path)
+pipeline.enable_model_cpu_offload()
+def predict_normals(image: List[Image.Image], guidance_scale=2., do_rotate=True, num_inference_steps=30, **kwargs):
+    img_list = image if isinstance(image, list) else [image]
+    img_list = [rgba_to_rgb(i) if i.mode == 'RGBA' else i for i in img_list]
+    images = trainer.pipeline_forward(
+        pipeline=pipeline,
+        image=img_list,
+        num_inference_steps=num_inference_steps,
+        guidance_scale=guidance_scale,
+        **kwargs
+    ).images
+    images = simple_remove(images)
+    if do_rotate and len(images) > 1:
+        images = rotate_normals_torch(images, return_types='pil')
+    return images

app/custom_models/utils.py ADDED Viewed

	@@ -0,0 +1,75 @@

+import torch
+from typing import List
+from dataclasses import dataclass
+from app.utils import rgba_to_rgb
+from custum_3d_diffusion.trainings.config_classes import ExprimentConfig, TrainerSubConfig
+from custum_3d_diffusion import modules
+from custum_3d_diffusion.custum_modules.unifield_processor import AttnConfig, ConfigurableUNet2DConditionModel
+from custum_3d_diffusion.trainings.base import BasicTrainer
+from custum_3d_diffusion.trainings.utils import load_config
+@dataclass
+class FakeAccelerator:
+    device: torch.device = torch.device("cuda")
+def init_trainers(cfg_path: str, weight_dtype: torch.dtype, extras: dict):
+    accelerator = FakeAccelerator()
+    cfg: ExprimentConfig = load_config(ExprimentConfig, cfg_path, extras)
+    init_config: AttnConfig = load_config(AttnConfig, cfg.init_config)
+    configurable_unet = ConfigurableUNet2DConditionModel(init_config, weight_dtype)
+    configurable_unet.enable_xformers_memory_efficient_attention()
+    trainer_cfgs: List[TrainerSubConfig] = [load_config(TrainerSubConfig, trainer) for trainer in cfg.trainers]
+    trainers: List[BasicTrainer] = [modules.find(trainer.trainer_type)(accelerator, None, configurable_unet, trainer.trainer, weight_dtype, i) for i, trainer in enumerate(trainer_cfgs)]
+    return trainers, configurable_unet
+from app.utils import make_image_grid, split_image
+def process_image(function, img, guidance_scale=2., merged_image=False, remove_bg=True):
+    from rembg import remove
+    if remove_bg:
+        img = remove(img)
+    img = rgba_to_rgb(img)
+    if merged_image:
+        img = split_image(img, rows=2)
+    images = function(
+        image=img,
+        guidance_scale=guidance_scale,
+    )
+    if len(images) > 1:
+        return make_image_grid(images, rows=2)
+    else:
+        return images[0]
+def process_text(trainer, pipeline, img, guidance_scale=2.):
+    pipeline.cfg.validation_prompts = [img]
+    titles, images = trainer.batched_validation_forward(pipeline, guidance_scale=[guidance_scale])
+    return images[0]
+def load_pipeline(config_path, ckpt_path, pipeline_filter=lambda x: True, weight_dtype = torch.bfloat16):
+    training_config = config_path
+    load_from_checkpoint = ckpt_path
+    extras = []
+    device = "cuda"
+    trainers, configurable_unet = init_trainers(training_config, weight_dtype, extras)
+    shared_modules = dict()
+    for trainer in trainers:
+        shared_modules = trainer.init_shared_modules(shared_modules)
+    if load_from_checkpoint is not None:
+        state_dict = torch.load(load_from_checkpoint)
+        configurable_unet.unet.load_state_dict(state_dict, strict=False)
+    # Move unet, vae and text_encoder to device and cast to weight_dtype
+    configurable_unet.unet.to(device, dtype=weight_dtype)
+    pipeline = None
+    trainer_out = None
+    for trainer in trainers:
+        if pipeline_filter(trainer.cfg.trainer_name):
+            pipeline = trainer.construct_pipeline(shared_modules, configurable_unet.unet)
+            pipeline.set_progress_bar_config(disable=False)
+            trainer_out = trainer
+    pipeline = pipeline.to(device)
+    return trainer_out, pipeline

app/examples/Groot.png ADDED Viewed

Git LFS Details

SHA256: e9096d048ec8deb3673765c577c7030118a75fc87d3da08cec657f66dfd22479
Pointer size: 131 Bytes
Size of remote file: 778 kB

app/examples/aaa.png ADDED Viewed

Git LFS Details

SHA256: 0733f0c5ed507e3fc0a9f921c1b078e7a66526335ee8efee61e919233a05a1c1
Pointer size: 131 Bytes
Size of remote file: 903 kB

app/examples/abma.png ADDED Viewed

Git LFS Details

SHA256: 24640851ccf40f2e61313c81e702abffe2361f1c5a1ab6e5b46f328daba103b3
Pointer size: 130 Bytes
Size of remote file: 93.5 kB

app/examples/akun.png ADDED Viewed

Git LFS Details

SHA256: b60404d448f09a3c11147f5d9e0e0544f0c2d4473425f110ded783cebf9c1f76
Pointer size: 131 Bytes
Size of remote file: 181 kB

app/examples/anya.png ADDED Viewed

Git LFS Details

SHA256: eb2ae59e3bb9c028f12c6c587cae7219c389df4593379c74211a6c643cf0ffa7
Pointer size: 131 Bytes
Size of remote file: 612 kB

app/examples/bag.png ADDED Viewed

Git LFS Details

SHA256: ac798ea1f112091c04f5bdfa47c490806fb433a02fe17758aa1f8c55cd64b66e
Pointer size: 132 Bytes
Size of remote file: 1.54 MB

app/examples/generated_1715761545_frame0.png ADDED Viewed

Git LFS Details

SHA256: ff813fe203a97a916bc73fa2bb61229c6c81884484cee1da53ff131093780636
Pointer size: 131 Bytes
Size of remote file: 208 kB

app/examples/generated_1715762357_frame0.png ADDED Viewed

Git LFS Details

SHA256: 4f211e298d5e6ffc2fc7d8ad5133e81b471d13ab6398931e8386ea9698021b4b
Pointer size: 131 Bytes
Size of remote file: 235 kB

app/examples/generated_1715763329_frame0.png ADDED Viewed

Git LFS Details

SHA256: e86aee7707d9870e1f56a24be9c52ff42048d4f45ed39d52e86f293336189580
Pointer size: 131 Bytes
Size of remote file: 182 kB

app/examples/hatsune_miku.png ADDED Viewed

Git LFS Details

SHA256: fbb6285c5a9a670bdee0992c6db2e43b51c584f3adb052d89136000b52eedc97
Pointer size: 130 Bytes
Size of remote file: 96.2 kB

app/examples/princess-large.png ADDED Viewed

Git LFS Details

SHA256: 203fd1fef34720656e51d27b1bfdc8c0a082a9fbbf48f3100039a63dcc59fd57
Pointer size: 130 Bytes
Size of remote file: 65.5 kB

app/examples/shoe.png ADDED Viewed

Git LFS Details

SHA256: 2b3798b58377246626b0ff7d38fd0a5ff028399b3e5b9b53b92785707a3ca081
Pointer size: 131 Bytes
Size of remote file: 249 kB

app/gradio_3dgen.py ADDED Viewed

	@@ -0,0 +1,71 @@

+import os
+import gradio as gr
+from PIL import Image
+from pytorch3d.structures import Meshes
+from app.utils import clean_up
+from app.custom_models.mvimg_prediction import run_mvprediction
+from app.custom_models.normal_prediction import predict_normals
+from scripts.refine_lr_to_sr import run_sr_fast
+from scripts.utils import save_glb_and_video
+from scripts.multiview_inference import geo_reconstruct
+def generate3dv2(preview_img, input_processing, seed, render_video=True, do_refine=True, expansion_weight=0.1, init_type="std"):
+    if preview_img is None:
+        raise gr.Error("preview_img is none")
+    if isinstance(preview_img, str):
+        preview_img = Image.open(preview_img)
+    if preview_img.size[0] <= 512:
+        preview_img = run_sr_fast([preview_img])[0]
+    rgb_pils, front_pil = run_mvprediction(preview_img, remove_bg=input_processing, seed=int(seed)) # 6s
+    new_meshes = geo_reconstruct(rgb_pils, None, front_pil, do_refine=do_refine, predict_normal=True, expansion_weight=expansion_weight, init_type=init_type)
+    vertices = new_meshes.verts_packed()
+    vertices = vertices / 2 * 1.35
+    vertices[..., [0, 2]] = - vertices[..., [0, 2]]
+    new_meshes = Meshes(verts=[vertices], faces=new_meshes.faces_list(), textures=new_meshes.textures)
+    ret_mesh, video = save_glb_and_video("/tmp/gradio/generated", new_meshes, with_timestamp=True, dist=3.5, fov_in_degrees=2 / 1.35, cam_type="ortho", export_video=render_video)
+    return ret_mesh, video
+#######################################
+def create_ui(concurrency_id="wkl"):
+    with gr.Row():
+        with gr.Column(scale=2):
+            input_image = gr.Image(type='pil', image_mode='RGBA', label='Frontview')
+            example_folder = os.path.join(os.path.dirname(__file__), "./examples")
+            example_fns = sorted([os.path.join(example_folder, example) for example in os.listdir(example_folder)])
+            gr.Examples(
+                examples=example_fns,
+                inputs=[input_image],
+                cache_examples=False,
+                label='Examples (click one of the images below to start)',
+                examples_per_page=12
+            )
+        with gr.Column(scale=3):
+            # export mesh display
+            output_mesh = gr.Model3D(value=None, label="Mesh Model", show_label=True, height=320)
+            output_video = gr.Video(label="Preview", show_label=True, show_share_button=True, height=320, visible=False)
+            input_processing = gr.Checkbox(
+                value=True,
+                label='Remove Background',
+                visible=True,
+            )
+            do_refine = gr.Checkbox(value=True, label="Refine Multiview Details", visible=False)
+            expansion_weight = gr.Slider(minimum=-1., maximum=1.0, value=0.1, step=0.1, label="Expansion Weight", visible=False)
+            init_type = gr.Dropdown(choices=["std", "thin"], label="Mesh Initialization", value="std", visible=False)
+            setable_seed = gr.Slider(-1, 1000000000, -1, step=1, visible=True, label="Seed")
+            render_video = gr.Checkbox(value=False, visible=False, label="generate video")
+            fullrunv2_btn = gr.Button('Generate 3D', interactive=True)
+    fullrunv2_btn.click(
+        fn = generate3dv2,
+        inputs=[input_image, input_processing, setable_seed, render_video, do_refine, expansion_weight, init_type],
+        outputs=[output_mesh, output_video],
+        concurrency_id=concurrency_id,
+        api_name="generate3dv2",
+    ).success(clean_up, api_name=False)
+    return input_image

app/gradio_3dgen_steps.py ADDED Viewed

	@@ -0,0 +1,87 @@

+import gradio as gr
+from PIL import Image
+from app.custom_models.mvimg_prediction import run_mvprediction
+from app.utils import make_image_grid, split_image
+from scripts.utils import save_glb_and_video
+def concept_to_multiview(preview_img, input_processing, seed, guidance=1.):
+    seed = int(seed)
+    if preview_img is None:
+        raise gr.Error("preview_img is none.")
+    if isinstance(preview_img, str):
+        preview_img = Image.open(preview_img)
+    rgb_pils, front_pil = run_mvprediction(preview_img, remove_bg=input_processing, seed=seed, guidance_scale=guidance)
+    rgb_pil = make_image_grid(rgb_pils, rows=2)
+    return rgb_pil, front_pil
+def concept_to_multiview_ui(concurrency_id="wkl"):
+    with gr.Row():
+        with gr.Column(scale=2):
+            preview_img = gr.Image(type='pil', image_mode='RGBA', label='Frontview')
+            input_processing = gr.Checkbox(
+                value=True,
+                label='Remove Background',
+            )
+            seed = gr.Slider(minimum=-1, maximum=1000000000, value=-1, step=1.0, label="seed")
+            guidance = gr.Slider(minimum=1.0, maximum=5.0, value=1.0, label="Guidance Scale", step=0.5)
+            run_btn = gr.Button('Generate Multiview', interactive=True)
+        with gr.Column(scale=3):
+            # export mesh display
+            output_rgb = gr.Image(type='pil', label="RGB", show_label=True)
+            output_front = gr.Image(type='pil', image_mode='RGBA', label="Frontview", show_label=True)
+    run_btn.click(
+        fn = concept_to_multiview,
+        inputs=[preview_img, input_processing, seed, guidance],
+        outputs=[output_rgb, output_front],
+        concurrency_id=concurrency_id,
+        api_name=False,
+    )
+    return output_rgb, output_front
+from app.custom_models.normal_prediction import predict_normals
+from scripts.multiview_inference import geo_reconstruct
+def multiview_to_mesh_v2(rgb_pil, normal_pil, front_pil, do_refine=False, expansion_weight=0.1, init_type="std"):
+    rgb_pils = split_image(rgb_pil, rows=2)
+    if normal_pil is not None:
+        normal_pil = split_image(normal_pil, rows=2)
+    if front_pil is None:
+        front_pil = rgb_pils[0]
+    new_meshes = geo_reconstruct(rgb_pils, normal_pil, front_pil, do_refine=do_refine, predict_normal=normal_pil is None, expansion_weight=expansion_weight, init_type=init_type)
+    ret_mesh, video = save_glb_and_video("/tmp/gradio/generated", new_meshes, with_timestamp=True, dist=3.5, fov_in_degrees=2 / 1.35, cam_type="ortho", export_video=False)
+    return ret_mesh
+def new_multiview_to_mesh_ui(concurrency_id="wkl"):
+    with gr.Row():
+        with gr.Column(scale=2):
+            rgb_pil = gr.Image(type='pil', image_mode='RGB', label='RGB')
+            front_pil = gr.Image(type='pil', image_mode='RGBA', label='Frontview(Optinal)')
+            normal_pil = gr.Image(type='pil', image_mode='RGBA', label='Normal(Optinal)')
+            do_refine = gr.Checkbox(
+                value=False,
+                label='Refine rgb',
+                visible=False,
+            )
+            expansion_weight = gr.Slider(minimum=-1.0, maximum=1.0, value=0.1, step=0.1, label="Expansion Weight", visible=False)
+            init_type = gr.Dropdown(choices=["std", "thin"], label="Mesh initialization", value="std", visible=False)
+            run_btn = gr.Button('Generate 3D', interactive=True)
+        with gr.Column(scale=3):
+            # export mesh display
+            output_mesh = gr.Model3D(value=None, label="mesh model", show_label=True)
+    run_btn.click(
+        fn = multiview_to_mesh_v2,
+        inputs=[rgb_pil, normal_pil, front_pil, do_refine, expansion_weight, init_type],
+        outputs=[output_mesh],
+        concurrency_id=concurrency_id,
+        api_name="multiview_to_mesh",
+    )
+    return rgb_pil, front_pil, output_mesh
+#######################################
+def create_step_ui(concurrency_id="wkl"):
+    with gr.Tab(label="3D:concept_to_multiview"):
+        concept_to_multiview_ui(concurrency_id)
+    with gr.Tab(label="3D:new_multiview_to_mesh"):
+        new_multiview_to_mesh_ui(concurrency_id)

app/gradio_local.py ADDED Viewed

	@@ -0,0 +1,76 @@

+if __name__ == "__main__":
+    import os
+    import sys
+    sys.path.append(os.curdir)
+    if 'CUDA_VISIBLE_DEVICES' not in os.environ:
+        os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+    os.environ['TRANSFORMERS_OFFLINE']='0'
+    os.environ['DIFFUSERS_OFFLINE']='0'
+    os.environ['HF_HUB_OFFLINE']='0'
+    os.environ['GRADIO_ANALYTICS_ENABLED']='False'
+    os.environ['HF_ENDPOINT']='https://hf-mirror.com'
+    import torch
+    torch.set_float32_matmul_precision('medium')
+    torch.backends.cuda.matmul.allow_tf32 = True
+    torch.set_grad_enabled(False)
+import gradio as gr
+import argparse
+from app.gradio_3dgen import create_ui as create_3d_ui
+# from app.gradio_3dgen_steps import create_step_ui
+from app.all_models import model_zoo
+_TITLE = '''Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image'''
+_DESCRIPTION = '''
+[Project page](https://wukailu.github.io/Unique3D/)
+* High-fidelity and diverse textured meshes generated by Unique3D from single-view images.
+* The demo is still under construction, and more features are expected to be implemented soon.
+'''
+def launch(
+    port,
+    listen=False,
+    share=False,
+    gradio_root="",
+):
+    model_zoo.init_models()
+    with gr.Blocks(
+        title=_TITLE,
+        theme=gr.themes.Monochrome(),
+    ) as demo:
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown('# ' + _TITLE)
+        gr.Markdown(_DESCRIPTION)
+        create_3d_ui("wkl")
+    launch_args = {}
+    if listen:
+        launch_args["server_name"] = "0.0.0.0"
+    demo.queue(default_concurrency_limit=1).launch(
+        server_port=None if port == 0 else port,
+        share=share,
+        root_path=gradio_root if gradio_root != "" else None,  # "/myapp"
+        **launch_args,
+    )
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    args, extra = parser.parse_known_args()
+    parser.add_argument("--listen", action="store_true")
+    parser.add_argument("--port", type=int, default=0)
+    parser.add_argument("--share", action="store_true")
+    parser.add_argument("--gradio_root", default="")
+    args = parser.parse_args()
+    launch(
+        args.port,
+        listen=args.listen,
+        share=args.share,
+        gradio_root=args.gradio_root,
+    )

app/utils.py ADDED Viewed

	@@ -0,0 +1,112 @@

+import torch
+import numpy as np
+from PIL import Image
+import gc
+import numpy as np
+import numpy as np
+from PIL import Image
+from scripts.refine_lr_to_sr import run_sr_fast
+GRADIO_CACHE = "/tmp/gradio/"
+def clean_up():
+    torch.cuda.empty_cache()
+    gc.collect()
+def remove_color(arr):
+    if arr.shape[-1] == 4:
+        arr = arr[..., :3]
+    # calc diffs
+    base = arr[0, 0]
+    diffs = np.abs(arr.astype(np.int32) - base.astype(np.int32)).sum(axis=-1)
+    alpha = (diffs <= 80)
+    arr[alpha] = 255
+    alpha = ~alpha
+    arr = np.concatenate([arr, alpha[..., None].astype(np.int32) * 255], axis=-1)
+    return arr
+def simple_remove(imgs, run_sr=True):
+    """Only works for normal"""
+    if not isinstance(imgs, list):
+        imgs = [imgs]
+        single_input = True
+    else:
+        single_input = False
+    if run_sr:
+        imgs = run_sr_fast(imgs)
+    rets = []
+    for img in imgs:
+        arr = np.array(img)
+        arr = remove_color(arr)
+        rets.append(Image.fromarray(arr.astype(np.uint8)))
+    if single_input:
+        return rets[0]
+    return rets
+def rgba_to_rgb(rgba: Image.Image, bkgd="WHITE"):
+    new_image = Image.new("RGBA", rgba.size, bkgd)
+    new_image.paste(rgba, (0, 0), rgba)
+    new_image = new_image.convert('RGB')
+    return new_image
+def change_rgba_bg(rgba: Image.Image, bkgd="WHITE"):
+    rgb_white = rgba_to_rgb(rgba, bkgd)
+    new_rgba = Image.fromarray(np.concatenate([np.array(rgb_white), np.array(rgba)[:, :, 3:4]], axis=-1))
+    return new_rgba
+def split_image(image, rows=None, cols=None):
+    """
+        inverse function of make_image_grid
+    """
+    # image is in square
+    if rows is None and cols is None:
+        # image.size [W, H]
+        rows = 1
+        cols = image.size[0] // image.size[1]
+        assert cols * image.size[1] == image.size[0]
+        subimg_size = image.size[1]
+    elif rows is None:
+        subimg_size = image.size[0] // cols
+        rows = image.size[1] // subimg_size
+        assert rows * subimg_size == image.size[1]
+    elif cols is None:
+        subimg_size = image.size[1] // rows
+        cols = image.size[0] // subimg_size
+        assert cols * subimg_size == image.size[0]
+    else:
+        subimg_size = image.size[1] // rows
+        assert cols * subimg_size == image.size[0]
+    subimgs = []
+    for i in range(rows):
+        for j in range(cols):
+            subimg = image.crop((j*subimg_size, i*subimg_size, (j+1)*subimg_size, (i+1)*subimg_size))
+            subimgs.append(subimg)
+    return subimgs
+def make_image_grid(images, rows=None, cols=None, resize=None):
+    if rows is None and cols is None:
+        rows = 1
+        cols = len(images)
+    if rows is None:
+        rows = len(images) // cols
+        if len(images) % cols != 0:
+            rows += 1
+    if cols is None:
+        cols = len(images) // rows
+        if len(images) % rows != 0:
+            cols += 1
+    total_imgs = rows * cols
+    if total_imgs > len(images):
+        images += [Image.new(images[0].mode, images[0].size) for _ in range(total_imgs - len(images))]
+    if resize is not None:
+        images = [img.resize((resize, resize)) for img in images]
+    w, h = images[0].size
+    grid = Image.new(images[0].mode, size=(cols * w, rows * h))
+    for i, img in enumerate(images):
+        grid.paste(img, box=(i % cols * w, i // cols * h))
+    return grid

assets/teaser.jpg ADDED Viewed

ckpt/controlnet-tile/config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "_class_name": "ControlNetModel",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lllyasviel/control_v11f1e_sd15_tile",
+  "act_fn": "silu",
+  "addition_embed_type": null,
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": null,
+  "attention_head_dim": 8,
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "class_embed_type": null,
+  "conditioning_channels": 3,
+  "conditioning_embedding_out_channels": [
+    16,
+    32,
+    96,
+    256
+  ],
+  "controlnet_conditioning_channel_order": "rgb",
+  "cross_attention_dim": 768,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "global_pool_conditions": false,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_time_scale_shift": "default",
+  "transformer_layers_per_block": 1,
+  "upcast_attention": false,
+  "use_linear_projection": false
+}

ckpt/controlnet-tile/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:845d3845053912728cd1453029a0ef87d3c0a3082a083ba393f36eaa5fb0e218
+size 1445157120

ckpt/image2normal/feature_extractor/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "_valid_processor_keys": [
+    "images",
+    "do_resize",
+    "size",
+    "resample",
+    "do_center_crop",
+    "crop_size",
+    "do_rescale",
+    "rescale_factor",
+    "do_normalize",
+    "image_mean",
+    "image_std",
+    "do_convert_rgb",
+    "return_tensors",
+    "data_format",
+    "input_data_format"
+  ],
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

ckpt/image2normal/image_encoder/config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "architectures": [
+    "CLIPVisionModelWithProjection"
+  ],
+  "attention_dropout": 0.0,
+  "dropout": 0.0,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 1024,
+  "image_size": 224,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-05,
+  "model_type": "clip_vision_model",
+  "num_attention_heads": 16,
+  "num_channels": 3,
+  "num_hidden_layers": 24,
+  "patch_size": 14,
+  "projection_dim": 768,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.39.3"
+}

ckpt/image2normal/image_encoder/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e4b33d864f89a793357a768cb07d0dc18d6a14e6664f4110a0d535ca9ba78da8
+size 607980488

ckpt/image2normal/model_index.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_class_name": "StableDiffusionImageCustomPipeline",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "feature_extractor": [
+    "transformers",
+    "CLIPImageProcessor"
+  ],
+  "image_encoder": [
+    "transformers",
+    "CLIPVisionModelWithProjection"
+  ],
+  "noisy_cond_latents": false,
+  "requires_safety_checker": true,
+  "safety_checker": [
+    null,
+    null
+  ],
+  "scheduler": [
+    "diffusers",
+    "EulerAncestralDiscreteScheduler"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}

ckpt/image2normal/scheduler/scheduler_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "_class_name": "EulerAncestralDiscreteScheduler",
+  "_diffusers_version": "0.27.2",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "num_train_timesteps": 1000,
+  "prediction_type": "epsilon",
+  "rescale_betas_zero_snr": false,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "timestep_spacing": "linspace",
+  "trained_betas": null
+}

ckpt/image2normal/unet/config.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "_class_name": "UnifieldWrappedUNet",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "act_fn": "silu",
+  "addition_embed_type": null,
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": null,
+  "attention_head_dim": 8,
+  "attention_type": "default",
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 768,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dropout": 0.0,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "reverse_transformer_layers_per_block": null,
+  "sample_size": 64,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "transformer_layers_per_block": 1,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "upcast_attention": false,
+  "use_linear_projection": false
+}

ckpt/image2normal/unet/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5cbaf1d56619345ce78de8cfbb20d94923b3305a364bf6a5b2a2cc422d4b701
+size 3537503456

ckpt/image2normal/unet_state_dict.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8df80d09e953d338aa6d8decd0351c5045f52ec6e2645eee2027ccb8792c8ed8
+size 3537964654

ckpt/image2normal/vae/config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "latents_mean": null,
+  "latents_std": null,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 256,
+  "scaling_factor": 0.18215,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}

ckpt/image2normal/vae/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d0c34f57abe50f323040f2366c8e22b941068dcdf53c8eb1d6fafb838afecb7
+size 167335590

ckpt/img2mvimg/feature_extractor/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "_valid_processor_keys": [
+    "images",
+    "do_resize",
+    "size",
+    "resample",
+    "do_center_crop",
+    "crop_size",
+    "do_rescale",
+    "rescale_factor",
+    "do_normalize",
+    "image_mean",
+    "image_std",
+    "do_convert_rgb",
+    "return_tensors",
+    "data_format",
+    "input_data_format"
+  ],
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}

ckpt/img2mvimg/image_encoder/config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "architectures": [
+    "CLIPVisionModelWithProjection"
+  ],
+  "attention_dropout": 0.0,
+  "dropout": 0.0,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 1024,
+  "image_size": 224,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-05,
+  "model_type": "clip_vision_model",
+  "num_attention_heads": 16,
+  "num_channels": 3,
+  "num_hidden_layers": 24,
+  "patch_size": 14,
+  "projection_dim": 768,
+  "torch_dtype": "float32",
+  "transformers_version": "4.39.3"
+}

ckpt/img2mvimg/image_encoder/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:77b33d2a3a643650857672e880ccf73adbaf114fbbadec36d142ee9d48af7e20
+size 1215912728

ckpt/img2mvimg/model_index.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_class_name": "StableDiffusionImage2MVCustomPipeline",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "condition_offset": true,
+  "feature_extractor": [
+    "transformers",
+    "CLIPImageProcessor"
+  ],
+  "image_encoder": [
+    "transformers",
+    "CLIPVisionModelWithProjection"
+  ],
+  "requires_safety_checker": true,
+  "safety_checker": [
+    null,
+    null
+  ],
+  "scheduler": [
+    "diffusers",
+    "DDIMScheduler"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}

ckpt/img2mvimg/scheduler/scheduler_config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "_class_name": "DDIMScheduler",
+  "_diffusers_version": "0.27.2",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "clip_sample_range": 1.0,
+  "dynamic_thresholding_ratio": 0.995,
+  "num_train_timesteps": 1000,
+  "prediction_type": "epsilon",
+  "rescale_betas_zero_snr": false,
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "thresholding": false,
+  "timestep_spacing": "leading",
+  "trained_betas": null
+}

ckpt/img2mvimg/unet/config.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "_class_name": "UnifieldWrappedUNet",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "act_fn": "silu",
+  "addition_embed_type": null,
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": null,
+  "attention_head_dim": 8,
+  "attention_type": "default",
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 768,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dropout": 0.0,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 8,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": 8,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "reverse_transformer_layers_per_block": null,
+  "sample_size": 64,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "transformer_layers_per_block": 1,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "upcast_attention": false,
+  "use_linear_projection": false
+}

ckpt/img2mvimg/unet/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93a3b4e678efac0c997e76df465df13136a4b0f1732e534a1200fad9e04cd0f9
+size 3438254688

ckpt/img2mvimg/unet_state_dict.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0dff2fdba450af0e10c3a847ba66a530170be2e9b9c9f4c834483515e82738b5
+size 3438460972

ckpt/img2mvimg/vae/config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.27.2",
+  "_name_or_path": "lambdalabs/sd-image-variations-diffusers",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "latents_mean": null,
+  "latents_std": null,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 256,
+  "scaling_factor": 0.18215,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}

ckpt/img2mvimg/vae/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2aa1f43011b553a4cba7f37456465cdbd48aab7b54b9348b890e8058ea7683ec
+size 334643268

ckpt/realesrgan-x4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9bc5d0c85207adad8bca26286f0c0007f266f85e7aa7c454c565da9b5f3c940a
+size 67051617

ckpt/v1-inference.yaml ADDED Viewed

	@@ -0,0 +1,70 @@

+model:
+  base_learning_rate: 1.0e-04
+  target: ldm.models.diffusion.ddpm.LatentDiffusion
+  params:
+    linear_start: 0.00085
+    linear_end: 0.0120
+    num_timesteps_cond: 1
+    log_every_t: 200
+    timesteps: 1000
+    first_stage_key: "jpg"
+    cond_stage_key: "txt"
+    image_size: 64
+    channels: 4
+    cond_stage_trainable: false   # Note: different from the one we trained before
+    conditioning_key: crossattn
+    monitor: val/loss_simple_ema
+    scale_factor: 0.18215
+    use_ema: False
+    scheduler_config: # 10000 warmup steps
+      target: ldm.lr_scheduler.LambdaLinearScheduler
+      params:
+        warm_up_steps: [ 10000 ]
+        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
+        f_start: [ 1.e-6 ]
+        f_max: [ 1. ]
+        f_min: [ 1. ]
+    unet_config:
+      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
+      params:
+        image_size: 32 # unused
+        in_channels: 4
+        out_channels: 4
+        model_channels: 320
+        attention_resolutions: [ 4, 2, 1 ]
+        num_res_blocks: 2
+        channel_mult: [ 1, 2, 4, 4 ]
+        num_heads: 8
+        use_spatial_transformer: True
+        transformer_depth: 1
+        context_dim: 768
+        use_checkpoint: True
+        legacy: False
+    first_stage_config:
+      target: ldm.models.autoencoder.AutoencoderKL
+      params:
+        embed_dim: 4
+        monitor: val/rec_loss
+        ddconfig:
+          double_z: true
+          z_channels: 4
+          resolution: 256
+          in_channels: 3
+          out_ch: 3
+          ch: 128
+          ch_mult:
+          - 1
+          - 2
+          - 4
+          - 4
+          num_res_blocks: 2
+          attn_resolutions: []
+          dropout: 0.0
+        lossconfig:
+          target: torch.nn.Identity
+    cond_stage_config:
+      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder