Zero123-pro_v1

Model Description

Zero123-pro is a fine-tuned model for high-resolution view-conditioned image generation based on Zero123.

Currently, our model is pursuing 512x512 resolution and we are still trying to find a best way to train high-resolution because convergence is not easy.

This model is currently fine-tuned only with chair dataset, but a foundation model suitable for e-commerce will be released later.

Usage

Use the config file modified from an original zero123 code base.

Our model has an output resolution of 512, and the corresponding latent dimension is 64. Therefore, first_stage_config resolution should be corrected to 512 and image_size to 64.

To get good quality, please use image of 1:1 aspect ratio as an input.

Model Details

Developed by: Seungmin Ha, Yeonju Kim
Model type: latent diffusion model.
Finetuned from model: lambdalabs/sd-image-variations-diffusers
License: We released 1st. version of Zero123-pro.
- Some of the data used in Zero123-pro cannot be used for commercial purposes, but it can be used for research purposes.

Training Infrastructure

Hardware: Zero123-pro was trained on the cluster on a single node with 8 A100 80GiBs GPUs.
Code Base: We use our modified version of the original zero123 repository.

Misuse, Malicious Use, and Out-of-Scope Use

The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

seungminh
/

zero123-pro_v1.0