This friedrichor/stable-diffusion-2-1-realistic model fine-tuned from stable-diffusion-2-1 with friedrichor/PhotoChat_120_square_HQ

This model is not trained solely for Text-to-Image tasks, but as a part of the Tiger(currently not open-source and submission) model for Multimodal Dialogue Response Generation.

Model Details

Model type: Diffusion-based text-to-image generation model
Language(s): English
License: CreativeML Open RAIL++-M License
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).

Dataset

friedrichor/PhotoChat_120_square_HQ was used for fine-tuning Stable Diffusion v2.1.

120 image-text pairs

Images were manually screened from the PhotoChat dataset, cropped to square, and Gigapixel was used to improve the quality.
Image captions are generated by BLIP-2.

How to fine-tuning

friedrichor/Text-to-Image-Summary/fine-tune/text2image

or Hugging Face diffusers

Simple use example

Using the 🤗's Diffusers library

import torch
from diffusers import StableDiffusionPipeline

device = "cuda:0"
pipe = StableDiffusionPipeline.from_pretrained("friedrichor/stable-diffusion-2-1-realistic", torch_dtype=torch.float32)
pipe.to(device)

prompt = "a woman in a red and gold costume with feathers on her head"
extra_prompt = ", facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography"
negative_prompt = "cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs"

generator = torch.Generator(device=device).manual_seed(42)
image = pipe(prompt + extra_prompt,
             negative_prompt=negative_prompt,
             height=768, width=768,
             num_inference_steps=20,
             guidance_scale=7.5,
             generator=generator).images[0]
image.save("image.png")

Prompt template

Applying prompt templates is helpful for improving image quality

If you want to generate images with human in the real world, you can try the following prompt template.

{{caption}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography

If you want to generate images in the real world without human, you can try the following prompt template.

{{caption}}, depth of field. bokeh. soft light. by Yasmin Albatoul, Harry Fayt. centered. extremely detailed. Nikon D850, (35mm|50mm|85mm). award winning photography.

For more prompt templates, see Dalabad/stable-diffusion-prompt-templates, r/StableDiffusion, etc.

Negative prompt

Applying negative prompt is also helpful for improving image quality

For example,

cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs

Hosted inference API

You can use the Hosted inference API on the right by inputting prompts.

For example,

a woman in a red and gold costume with feathers on her head, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography

friedrichor
/

stable-diffusion-2-1-realistic