Jan 29

I am trying to run stable video diffusion with the code below but got all black videos (which is composed of all black images). It seems to be caused by the fp16 format. After trying your VAE, I got this error: "RuntimeError: Input type (float) and bias type (c10::Half) should be the same". Any suggestions for how to fix?

#################################code###

import torch

from diffusers import StableVideoDiffusionPipeline, AutoencoderKL
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid", torch_dtype=torch.float16, variant="fp16"
)

pipe.to("cuda")

Load the conditioning image

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)

export_to_video(frames, "generated.mp4", fps=7)

madebyollin

Owner Jan 29

sdxl-vae-fp16-fix cannot be used with SVD, because SVD uses the Stable Diffusion 1/2 latent space (see code, paper), whereas sdxl-vae-fp16-fix uses the SDXL latent space, and the SD1/2 and SDXL latent spaces are not compatible.

Hopefully the stabilityai/stable-video-diffusion-img2vid thread can find a solution to the issue you're encountering.

jiagaoxiang

Jan 29

Thank you for the info!

madebyollin
/

sdxl-vae-fp16-fix

can this vae be used in stable video diffusion?

Load the conditioning image

export_to_video(frames, "generated.mp4", fps=7)