Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -1 +1,41 @@
1
- # KOALA-700M Model Card
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KOALA-700M Model Card
2
+
3
+ ## Model Discription
4
+ KOALA, which stands for **KnOwledge-distillAtion in LAtent diffusion model**, marks a notable advancement in text-to-image (T2I) synthesis technology. This model is engineered to balance speed and performance effectively, making it ideal for resource-limited environments. By emphasizing self-attention in knowledge distillation, KOALA significantly enhances the accessibility and efficiency of high-quality text-to-image synthesis, particularly in settings with constrained resources. This approach represents a major leap forward in the field of T2I technology.
5
+
6
+ ## Key Features
7
+ - **Efficient U-Net Architecture**: KOALA models use a simplified U-Net architecture that reduces the model size by up to 54% and 69% respectively compared to its predecessor, Stable Diffusion XL (SDXL).
8
+ - **Self-Attention-Based Knowledge Distillation**: The core technique in KOALA focuses on the distillation of self-attention features, which proves crucial for maintaining image generation quality.
9
+
10
+ ## Model Architecture
11
+
12
+ ## Usage with 🤗[Diffusers library](https://github.com/huggingface/diffusers)
13
+ The inference code with denoising step 25
14
+ ```python
15
+ import torch
16
+ from diffusers import StableDiffusionXLPipeline
17
+
18
+ pipe = StableDiffusionXLPipeline.from_pretrained("etri-vilab/koala-700m", torch_dtype=torch.float16)
19
+ pipe = pipe.to("cuda")
20
+
21
+ prompt = "A portrait painting of a Golden Retriever like Leonard da Vinci"
22
+ negative = "worst quality, low quality, illustration, low resolution"
23
+ image = pipe(prompt=prompt, negative_prompt=negative).images[0]
24
+ ```
25
+
26
+ ## Limitations and Bias
27
+ - Text Rendering: The models face challenges in rendering long, legible text within images.
28
+ - Complex Prompts: KOALA sometimes struggles with complex prompts involving multiple attributes.
29
+ - Dataset Dependencies: The current limitations are partially attributed to the characteristics of the training dataset (LAION-aesthetics-V2 6+).
30
+
31
+ ## Citation
32
+ ```bibtex
33
+ @misc{Lee@koala,
34
+ title={KOALA: Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis},
35
+ author={Youngwan Lee and Kwanyong Park and Yoorhim Cho and Yong-Ju Lee and Sung Ju Hwang},
36
+ year={2023},
37
+ eprint={2312.04005},
38
+ archivePrefix={arXiv},
39
+ primaryClass={cs.CV}
40
+ }
41
+ ```