seungminh commited on
Commit
adcc4b9
1 Parent(s): 7f9bd4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - allenai/objaverse-xl
4
+ tags:
5
+ - 3d
6
+ extra_gated_fields:
7
+ Name: text
8
+ Email: text
9
+ Country: text
10
+ Organization or Affiliation: text
11
+ I ALLOW Stability AI to email me about new model releases: checkbox
12
  license: mit
13
+ license_name: sai-nc-community
14
+ pipeline_tag: text-to-3d
15
  ---
16
+ # Zero123-pro_v1
17
+
18
+ ## Model Description
19
+
20
+ Zero123-pro is a fine-tuned model for *high-resolution* view-conditioned image generation based on [Zero123](https://github.com/cvlab-columbia/zero123).
21
+
22
+ Currently, our model is pursuing 512x512 resolution and we are still trying to find a best way to train high-resolution because convergence is not easy.
23
+
24
+ This model is currently fine-tuned only with *chair* dataset, but a foundation model suitable for e-commerce will be released later.
25
+
26
+
27
+ ## Usage
28
+
29
+ Use the config file modified from an original zero123 code base.
30
+
31
+ Our model has an output resolution of 512, and the corresponding latent dimension is 64. Therefore, first_stage_config resolution should be corrected to 512 and image_size to 64.
32
+
33
+ To get good quality, please use image of 1:1 aspect ratio as an input.
34
+
35
+ ## Model Details
36
+
37
+ * **Developed by**: Seungmin Ha, Yeonju Kim
38
+ * **Model type**: latent diffusion model.
39
+ * **Finetuned from model**: [lambdalabs/sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
40
+ * **License**: We released 1st. version of Zero123-pro.
41
+ * Some of the data used in **Zero123-pro** cannot be used for commercial purposes, but it can be used for research purposes.
42
+ According to our internal tests, both models perform similarly in terms of prediction visual quality.
43
+
44
+
45
+ ### Training Infrastructure
46
+
47
+ * **Hardware**: `Zero123-pro` was trained on the cluster on a single node with 8 A100 80GiBs GPUs.
48
+ * **Code Base**: We use our modified version of [the original zero123 repository](https://github.com/cvlab-columbia/zero123).
49
+
50
+
51
+ ### Misuse, Malicious Use, and Out-of-Scope Use
52
+
53
+ The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.