etri-vilab
/

koala-700m

StableDiffusionXLPipeline

Inference Endpoints

Model card Files Files and versions Community

koala-700m / README.md

yoorhim's picture

Update README.md

9bc7b53 verified 9 months ago

|

No virus

2.33 kB

	# KOALA-700M Model Card

	## Model Discription
	KOALA, which stands for KnOwledge-distillAtion in LAtent diffusion model, marks a notable advancement in text-to-image (T2I) synthesis technology. This model is engineered to balance speed and performance effectively, making it ideal for resource-limited environments. By emphasizing self-attention in knowledge distillation, KOALA significantly enhances the accessibility and efficiency of high-quality text-to-image synthesis, particularly in settings with constrained resources. This approach represents a major leap forward in the field of T2I technology.

	## Key Features
	- Efficient U-Net Architecture: KOALA models use a simplified U-Net architecture that reduces the model size by up to 54% and 69% respectively compared to its predecessor, Stable Diffusion XL (SDXL).
	- Self-Attention-Based Knowledge Distillation: The core technique in KOALA focuses on the distillation of self-attention features, which proves crucial for maintaining image generation quality.

	## Model Architecture

	## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
	The inference code with denoising step 25
	```python
	import torch
	from diffusers import StableDiffusionXLPipeline

	pipe = StableDiffusionXLPipeline.from_pretrained("etri-vilab/koala-700m", torch_dtype=torch.float16)
	pipe = pipe.to("cuda")

	prompt = "A portrait painting of a Golden Retriever like Leonard da Vinci"
	negative = "worst quality, low quality, illustration, low resolution"
	image = pipe(prompt=prompt, negative_prompt=negative).images[0]
	```

	## Limitations and Bias
	- Text Rendering: The models face challenges in rendering long, legible text within images.
	- Complex Prompts: KOALA sometimes struggles with complex prompts involving multiple attributes.
	- Dataset Dependencies: The current limitations are partially attributed to the characteristics of the training dataset (LAION-aesthetics-V2 6+).

	## Citation
	```bibtex
	@misc{Lee@koala,
	title={KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis},
	author={Youngwan Lee and Kwanyong Park and Yoorhim Cho and Yong-Ju Lee and Sung Ju Hwang},
	year={2023},
	eprint={2312.04005},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```