|
# KOALA-700M Model Card |
|
|
|
## Model Discription |
|
KOALA, which stands for **KnOwledge-distillAtion in LAtent diffusion model**, marks a notable advancement in text-to-image (T2I) synthesis technology. This model is engineered to balance speed and performance effectively, making it ideal for resource-limited environments. By emphasizing self-attention in knowledge distillation, KOALA significantly enhances the accessibility and efficiency of high-quality text-to-image synthesis, particularly in settings with constrained resources. This approach represents a major leap forward in the field of T2I technology. |
|
|
|
## Key Features |
|
- **Efficient U-Net Architecture**: KOALA models use a simplified U-Net architecture that reduces the model size by up to 54% and 69% respectively compared to its predecessor, Stable Diffusion XL (SDXL). |
|
- **Self-Attention-Based Knowledge Distillation**: The core technique in KOALA focuses on the distillation of self-attention features, which proves crucial for maintaining image generation quality. |
|
|
|
## Model Architecture |
|
|
|
## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers) |
|
The inference code with denoising step 25 |
|
```python |
|
import torch |
|
from diffusers import StableDiffusionXLPipeline |
|
|
|
pipe = StableDiffusionXLPipeline.from_pretrained("etri-vilab/koala-700m", torch_dtype=torch.float16) |
|
pipe = pipe.to("cuda") |
|
|
|
prompt = "A portrait painting of a Golden Retriever like Leonard da Vinci" |
|
negative = "worst quality, low quality, illustration, low resolution" |
|
image = pipe(prompt=prompt, negative_prompt=negative).images[0] |
|
``` |
|
|
|
## Limitations and Bias |
|
- Text Rendering: The models face challenges in rendering long, legible text within images. |
|
- Complex Prompts: KOALA sometimes struggles with complex prompts involving multiple attributes. |
|
- Dataset Dependencies: The current limitations are partially attributed to the characteristics of the training dataset (LAION-aesthetics-V2 6+). |
|
|
|
## Citation |
|
```bibtex |
|
@misc{Lee@koala, |
|
title={KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis}, |
|
author={Youngwan Lee and Kwanyong Park and Yoorhim Cho and Yong-Ju Lee and Sung Ju Hwang}, |
|
year={2023}, |
|
eprint={2312.04005}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
} |
|
``` |
|
|