File size: 4,117 Bytes
affb06a
 
 
 
 
 
 
 
 
 
507fca9
 
 
 
 
affb06a
 
 
 
507fca9
 
 
 
 
 
 
 
 
faa0ffc
 
 
 
 
 
507fca9
 
affb06a
 
507fca9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2372c29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
affb06a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: openrail++
datasets:
- friedrichor/PhotoChat_120_square_HQ
language:
- en
tags:
- stable-diffusion
- text-to-image
---

fine-tuned with text-image dataset `friedrichor/PhotoChat_120_square_HQ`

# Model Details

- **Model type:** Diffusion-based text-to-image generation model
- **Language(s):** English
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL)
- **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). Fine-tuning dataset: [friedrichor/PhotoChat_120_square_HQ](https://huggingface.co/datasets/friedrichor/PhotoChat_120_square_HQ)

## Dataset
[friedrichor/PhotoChat_120_square_HQ](https://huggingface.co/datasets/friedrichor/PhotoChat_120_square_HQ) was used for fine-tuning Stable Diffusion v2.1.  

120 image-text pairs  

Images were manually screened from the [PhotoChat](https://aclanthology.org/2021.acl-long.479/) dataset, cropped to square, and `Gigapixel` was used to improve the quality.   
Image captions are generated by [BLIP-2](https://arxiv.org/abs/2301.12597).

## How to fine-tuning

[friedrichor/Text-to-Image-Summary/fine-tune/text2image](https://github.com/friedrichor/Text-to-Image-Summary/tree/main/fine-tune/text2image)

or [Hugging Face diffusers](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image)

# Simple use example

Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers)

```python
import torch
from diffusers import StableDiffusionPipeline

device = "cuda:0"
pipe = StableDiffusionPipeline.from_pretrained("friedrichor/stable-diffusion-v2.1-portraiture", torch_dtype=torch.float32)
pipe.to(device)

prompt = "a woman in a red and gold costume with feathers on her head"
extra_prompt = ", facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography"
negative_prompt = "cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs"

generator = torch.Generator(device=device).manual_seed(42)
image = pipe(prompt + extra_prompt,
             negative_prompt=negative_prompt,
             height=768, width=768,
             num_inference_steps=20,
             guidance_scale=7.5,
             generator=generator).images[0]
image.save("image.png")
```

## Prompt template

**Applying prompt templates is helpful for improving image quality**  

If you want to generate images with human in the real world, you can try the following prompt template.  
```
{{caption}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography
```

If you want to generate images in the real world without human, you can try the following prompt template.  
```
{{caption}}, depth of field. bokeh. soft light. by Yasmin Albatoul, Harry Fayt. centered. extremely detailed. Nikon D850, (35mm|50mm|85mm). award winning photography.
```

For more prompt templates, see [Dalabad/stable-diffusion-prompt-templates](https://github.com/Dalabad/stable-diffusion-prompt-templates), [r/StableDiffusion](https://www.reddit.com/r/StableDiffusion/), etc.  

## Negative prompt

**Applying negative prompt is also helpful for improving image quality**  

For example,
```
cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs
```