File size: 3,396 Bytes
2381485 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
license: other
license_name: kohaku-license-1.0
datasets:
- laion/conceptual-captions-12m-webdataset
- CaptionEmporium/coyo-hd-11m-llavanext
- KBlueLeaf/danbooru2023-metadata-database
- graph-based-captions/GBC10M
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# TIPO: Text to Image with text presampling for Prompt Optimization
500M LLaMA arch model trained for TIPO.<br>
Tech Report: https://hackmd.io/@KBlueLeaf/BJULOQBR0
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/fc9ovmARapQmgq9DZ7ApJ.png)
## Introduction
In this project, we introduce "TIPO" (**T**ext to **I**mage with text presampling for **P**rompt **O**ptimization), an innovative framework designed to significantly enhance the quality and usability of Text-to-Image (T2I) generative models. TIPO utilizes the Large Language Models (LLMs) to perform "Text Presampling" within the inference pipeline of text-to-image generative modeling. By refining and extending user input prompts, TIPO enables generative models to produce superior results with minimal user effort, making T2I systems more accessible and effective for a wider range of users.
## Usage
Use updated version of DTG extension (renamed to z-tipo-extension), current version of z-tipo-extension support stable-diffusion-webui, stable-diffusion-webui-forge and ComfyUI. SD-Next haven't been tested.
https://github.com/KohakuBlueleaf/z-tipo-extension
## Model arch and Training
This model is LLaMA arch with 500M parameters, the training data is combined version of Danbooru2023, GBC10M and Coyo-HD-11M.<br>
The total token seen is around 30B tokens.<br>
For more information please refer to the tech report.
### Evaluation
We have tested TIPO in several metric:
#### 1. Aesthetic Score (Higher is Better)
We compute the Aesthetic Score using the **Aesthetic Predictor V2.5**. This metric is calculated on the short/truncated long test.
![Aesthetic Score Distribution](https://hackmd.io/_uploads/HkJphkSCA.png)
*Figure 1: Aesthetic Score distribution.*
#### 2. AI Corrupt Score (Higher is Better)
The AI Corrupt Score is obtained from the **AICorruptMetrics** in **sdeval**.
This metric is calculated on the short/truncated long test.
![AI Corrupt Score Distribution](https://hackmd.io/_uploads/SJlktvE0R.png)
*Figure 2: AI Corrupt Score distribution.*
#### 3. Frechet Dino Distance (FDD) on Scenery Tag Test
We use FDD on the Scenery Tag Test to demonstrate that when input prompts address a smaller distribution, the model struggles to generate images that reflect the true distribution. However, with **TIPO**, this issue is mitigated.
| FDD Model | `<meta> scenery` only | `<meta> scenery` + TIPO |
|------------------|-----------------------|-------------------------|
| DinoV2 ViT-S | 0.1917 | **0.1786** |
| DinoV2 ViT-B | 0.2002 | **0.1755** |
| DinoV2 ViT-L | 0.2017 | **0.1863** |
| DinoV2 ViT-G | 0.2359 | **0.2096** |
*Table 1: Frechet Dino Distance (FDD) on Scenery Tag Test.*
## LICENSE
This model is released under [Kohaku License 1.0](https://kblueleaf.net/documents/kohaku-license/?[Your%20Organization/Name]=KohakuBlueLeaf&[Year]=2024)<br>
You can check the above provided URL or check the LICENSE file in this repo. |