estellea commited on
Commit
80cf1d0
1 Parent(s): a65b03b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -16,9 +16,7 @@ The LDM3D model was proposed in ["LDM3D: Latent Diffusion Model for 3D"](https:/
16
 
17
  LDM3D got accepted to [CVPRW'23]([https://aaai.org/Conferences/AAAI-23/](https://cvpr2023.thecvf.com/)).
18
 
19
- This checkpoint finetunes the previous [ldm3d-4c](https://huggingface.co/Intel/ldm3d-4c) on 2 panoramic-images datasets:
20
- - [polyhaven](https://polyhaven.com/): 585 images for the training set, 66 images for the validation set
21
- - [ihdri](https://www.ihdri.com/hdri-skies-outdoor/): 57 outdoor images for the training set, 7 outdoor images for the validation set.
22
 
23
  These datasets were augmented using [Text2Light](https://frozenburning.github.io/projects/text2light/) to create a dataset containing 13852 training samples and 1606 validation samples.
24
 
@@ -47,14 +45,14 @@ Here is how to use this model to get the features of a given text in PyTorch:
47
 
48
  from diffusers import StableDiffusionLDM3DPipeline
49
 
50
- pipe = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-4c")
51
  pipe.to("cuda")
52
 
53
 
54
- prompt ="A picture of some lemons on a table"
55
- name = "lemons"
56
 
57
- output = pipe(prompt)
58
  rgb_image, depth_image = output.rgb, output.depth
59
  rgb_image[0].save(name+"_ldm3d_rgb.jpg")
60
  depth_image[0].save(name+"_ldm3d_depth.png")
@@ -62,7 +60,7 @@ depth_image[0].save(name+"_ldm3d_depth.png")
62
 
63
  This is the result:
64
 
65
- ![ldm3d_results](ldm3d_4c_results.png)
66
 
67
 
68
  ### Limitations and bias
@@ -77,13 +75,15 @@ The LDM3D model was finetuned on a dataset constructed from a subset of the LAIO
77
 
78
  ### Finetuning
79
 
80
- The fine-tuning process comprises two stages. In the first stage, we train an autoencoder to generate a lower-dimensional, perceptually equivalent data representation. Subsequently, we fine-tune the diffusion model using the frozen autoencoder
 
 
81
 
82
- ## Evaluation results
 
 
 
83
 
84
- Please refer to Table 1 and Table2 from the [paper](https://arxiv.org/abs/2305.10853) for quantitative results.
85
- The figure below shows some qualitative results comparing our method with (Stable diffusion v1.4)[https://arxiv.org/pdf/2112.10752.pdf] and with (DPT-Large)[https://arxiv.org/pdf/2103.13413.pdf] for the depth maps
86
- ![qualitative results](qualitative_results.png)
87
 
88
  ### BibTeX entry and citation info
89
  ```bibtex
 
16
 
17
  LDM3D got accepted to [CVPRW'23]([https://aaai.org/Conferences/AAAI-23/](https://cvpr2023.thecvf.com/)).
18
 
19
+
 
 
20
 
21
  These datasets were augmented using [Text2Light](https://frozenburning.github.io/projects/text2light/) to create a dataset containing 13852 training samples and 1606 validation samples.
22
 
 
45
 
46
  from diffusers import StableDiffusionLDM3DPipeline
47
 
48
+ pipe = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-pano")
49
  pipe.to("cuda")
50
 
51
 
52
+ prompt ="360 view of a large bedroom"
53
+ name = "bedroom_pano"
54
 
55
+ output = pipe(prompt, width=1024, height=512,)
56
  rgb_image, depth_image = output.rgb, output.depth
57
  rgb_image[0].save(name+"_ldm3d_rgb.jpg")
58
  depth_image[0].save(name+"_ldm3d_depth.png")
 
60
 
61
  This is the result:
62
 
63
+ ![ldm3d_results](ldm3d_pano_results.png)
64
 
65
 
66
  ### Limitations and bias
 
75
 
76
  ### Finetuning
77
 
78
+ This checkpoint finetunes the previous [ldm3d-4c](https://huggingface.co/Intel/ldm3d-4c) on 2 panoramic-images datasets:
79
+ - [polyhaven](https://polyhaven.com/): 585 images for the training set, 66 images for the validation set
80
+ - [ihdri](https://www.ihdri.com/hdri-skies-outdoor/): 57 outdoor images for the training set, 7 outdoor images for the validation set.
81
 
82
+
83
+ These datasets were augmented using [Text2Light](https://frozenburning.github.io/projects/text2light/) to create a dataset containing 13852 training samples and 1606 validation samples.
84
+
85
+ In order to generate the depth map of those samples, we used [DPT-large](https://github.com/isl-org/MiDaS) and to generate the caption we used [BLIP-2](https://huggingface.co/docs/transformers/main/model_doc/blip-2)
86
 
 
 
 
87
 
88
  ### BibTeX entry and citation info
89
  ```bibtex