SFCOCO Stable Diffusion Model Card

SFCOCO Stable Diffusion is a Japanese-specific latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

This model was fine-tuned by using a powerful Japanese-specific latent text-to-image diffusion model, Japanese Stable Diffusion. We use the Stable Diffusion text-to-image fine-tuning script of 🤗 Diffusers

Model Details

Developed by: Atsumoto Ohashi
Model type: Diffusion-based text-to-image generation model
Language(s): Japanese
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model (LDM) that used Japanese Stable Diffusion as a pre-trained model.
Resources for more information: Japanese Stable Diffusion GitHub Repository

Examples

Firstly, install our package as follows. This package is modified 🤗's Diffusers library to run Japanese Stable Diffusion.

pip install git+https://github.com/rinnakk/japanese-stable-diffusion

Run this command to log in with your HF Hub token if you haven't before:

huggingface-cli login

Running the pipeline with the k_lms scheduler:

import torch
from torch import autocast
from diffusers import LMSDiscreteScheduler
from japanese_stable_diffusion import JapaneseStableDiffusionPipeline
model_id = "nu-dialogue/sfc2022-stable-diffusion"
device = "cuda"
# Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = JapaneseStableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True, torch_dtype=torch.float16)
pipe = pipe.to(device)

prompt = "福澤諭吉像の写真"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7.5)["sample"][0]  
    
image.save("output.png")

Note: JapaneseStableDiffusionPipeline is almost same as diffusers' StableDiffusionPipeline but added some lines to initialize our models properly.

Training

Training Data We used the SFCOCO2021 and SFCOCO2022 dataset for training the model. You can see these datasets in this repository.

Training Procedure SFCOCO Stable Diffusion has the same architecture as Japanese Stable Diffusion and was trained by using Japanese Stable Diffusion. We use the Stable Diffusion text-to-image fine-tuning script of 🤗 Diffusers

Citation

@InProceedings{Rombach_2022_CVPR,
      author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
      title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month     = {June},
      year      = {2022},
      pages     = {10684-10695}
  }

@misc{japanese_stable_diffusion,
    author    = {Shing, Makoto and Sawada, Kei},
    title     = {Japanese Stable Diffusion},
    howpublished = {\url{https://github.com/rinnakk/japanese-stable-diffusion}},
    month     = {September},
    year      = {2022},
}

This model card was written by: Atsumoto Ohashi and is based on the Japanese Stable Diffusion Model Card.

nu-dialogue
/

sfc2022-stable-diffusion

SFCOCO Stable Diffusion Model Card

Model Details

Examples

Training

Citation

Spaces using nu-dialogue/sfc2022-stable-diffusion 2