i

Browse files

Files changed (3) hide show

README.md +13 -1
image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6/config.yaml +143 -0
image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6/model.ckpt +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,15 @@
 ---
-license: agpl-3.0
 ---

 ---
+license: other
+license_name: agpl-3.0
+license_link: https://www.gnu.org/licenses/agpl-3.0.en.html
 ---
+# CraftsMan
+Model card for *CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner*.
+Code: https://github.com/wyysf-98/CraftsMan
+Arxiv: https://arxiv.org/abs/xxxxx
+We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mesh topologies, noisy surfaces, and dificulties in accommodating user edits, consequently impeding their widespread adoption and implentation in 3D modeling softwares. Our work is inspired by the craftsman, who usually roughs out the holistic figure of the work first and elaborate the surface details subsequently. Specifically, we employ a 3D native difiusion model, which operates on latent space learned from latent set-based 3D representations, to generate coarse geometries with regular mesh topology in seconds. In particular, this process takes as input a text prompt or a reference image, and leverages a powerful multi-view (MV) difiusion model to generates multiple views of the coarse geometry, which are fed into our MV-conditioned 3D difiusion model for generating the 3D geometry, significantly improving robustness and generalizability. Following that, a normal-based geometry refiner is used to significantly enhance the surface details. This refinement can be performed automatically, or interactively with user-supplied edits. Extensive experiments demonstrate that our method achieves high eficacy in producing superior quality 3D assets compared to existing methods.

image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6/config.yaml ADDED Viewed

	@@ -0,0 +1,143 @@

+name: image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6-big
+description: ''
+tag: michelangelo-autoencoder+n4096+noise0.0+pfeat3+normembFalse+lr5e-05+qkvbiasFalse+nfreq8+ln_postTrue
+seed: 0
+use_timestamp: true
+timestamp: ''
+exp_root_dir: outputs
+exp_dir: outputs/image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6-big
+trial_name: michelangelo-autoencoder+n4096+noise0.0+pfeat3+normembFalse+lr5e-05+qkvbiasFalse+nfreq8+ln_postTrue
+trial_dir: outputs/image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6-big/michelangelo-autoencoder+n4096+noise0.0+pfeat3+normembFalse+lr5e-05+qkvbiasFalse+nfreq8+ln_postTrue
+n_gpus: 8
+resume: null
+data_type: objaverse-datamodule
+data:
+  root_dir: data/objaverse_clean/cap3d_high_quality_170k_images
+  data_type: occupancy
+  n_samples: 4096
+  noise_sigma: 0.0
+  load_supervision: false
+  supervision_type: occupancy
+  n_supervision: 4096
+  load_image: true
+  image_data_path: data/objaverse_clean/raw_data/images/cap3d_high_quality_170k
+  image_type: mvrgb
+  idx:
+  - 0
+  - 4
+  - 8
+  - 12
+  - 16
+  n_views: 4
+  load_caption: false
+  rotate_points: false
+  batch_size: 32
+  num_workers: 16
+system_type: shape-diffusion-system
+system:
+  val_samples_json: val_data/mv_images/val_samples_rgb_mvimage.json
+  z_scale_factor: 1.0
+  guidance_scale: 7.5
+  num_inference_steps: 50
+  eta: 0.0
+  shape_model_type: michelangelo-autoencoder
+  shape_model:
+    num_latents: 256
+    embed_dim: 64
+    point_feats: 3
+    out_dim: 1
+    num_freqs: 8
+    include_pi: false
+    heads: 12
+    width: 768
+    num_encoder_layers: 8
+    num_decoder_layers: 16
+    use_ln_post: true
+    init_scale: 0.25
+    qkv_bias: false
+    use_flash: true
+    use_checkpoint: true
+  condition_model_type: clip-embedder
+  condition_model:
+    pretrained_model_name_or_path: openai/clip-vit-large-patch14
+    encode_camera: true
+    camera_embeds_dim: 32
+    n_views: 4
+    empty_embeds_ratio: 0.1
+    normalize_embeds: false
+    zero_uncond_embeds: true
+  denoiser_model_type: simple-denoiser
+  denoiser_model:
+    input_channels: 64
+    output_channels: 64
+    n_ctx: 256
+    width: 768
+    layers: 6
+    heads: 12
+    context_dim: 1024
+    init_scale: 1.0
+    skip_ln: true
+    use_checkpoint: true
+  noise_scheduler_type: diffusers.schedulers.DDPMScheduler
+  noise_scheduler:
+    num_train_timesteps: 1000
+    beta_start: 0.00085
+    beta_end: 0.012
+    beta_schedule: scaled_linear
+    variance_type: fixed_small
+    clip_sample: false
+  denoise_scheduler_type: diffusers.schedulers.DDIMScheduler
+  denoise_scheduler:
+    num_train_timesteps: 1000
+    beta_start: 0.00085
+    beta_end: 0.012
+    beta_schedule: scaled_linear
+    clip_sample: false
+    set_alpha_to_one: false
+    steps_offset: 1
+  loggers:
+    wandb:
+      enable: false
+      project: CraftsMan
+      name: image-to-shape-diffusion+image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6-big+michelangelo-autoencoder+n4096+noise0.0+pfeat3+normembFalse+lr5e-05+qkvbiasFalse+nfreq8+ln_postTrue
+  loss:
+    loss_type: mse
+    lambda_diffusion: 1.0
+  optimizer:
+    name: AdamW
+    args:
+      lr: 5.0e-05
+      betas:
+      - 0.9
+      - 0.99
+      eps: 1.0e-06
+  scheduler:
+    name: SequentialLR
+    interval: step
+    schedulers:
+    - name: LinearLR
+      interval: step
+      args:
+        start_factor: 1.0e-06
+        end_factor: 1.0
+        total_iters: 5000
+    - name: CosineAnnealingLR
+      interval: step
+      args:
+        T_max: 5000
+        eta_min: 0.0
+    milestones:
+    - 5000
+trainer:
+  num_nodes: 4
+  max_epochs: 100000
+  log_every_n_steps: 5
+  num_sanity_val_steps: 1
+  check_val_every_n_epoch: 3
+  enable_progress_bar: true
+  precision: 16-mixed
+  strategy: ddp_find_unused_parameters_true
+checkpoint:
+  save_last: true
+  save_top_k: -1
+  every_n_train_steps: 5000

image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6/model.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74f9385c0de1112c418b010820a59f0932a9dad9e71cdd507b460874f063d466
+size 3746582323