Kijai/SkyReels-V1-Hunyuan_comfy · Hunyuan SkyReels I2V on Runpod H100

I managed to get this working on Runpod just to see how close the quality can get to close-sourced models like Kling/Sora. On Runpod, you can rent GPUs like the H100 to go more unconstrained on the VRAM etc.

It took a few days to iron out all the bugs, so just sharing the steps I took here to save people some time figuring things out (and for myself to not forget how to set up an instance).

I deployed using the ComfyUI template by aitrepreneur. There are some hunyuan templates available, but I wanted to mainly start on a cleaner slate. The template also has ComfyUI on port 3000 and a filebrowser+console on port 7777 which becomes useful below.
Download the bf16 model in this repository to the ComfyUI/models/diffusion_models
Connect on port 7777, browse to ComfyUI/models/diffusion_models/ and right-click to run console, then enter the below command:
wget https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy/resolve/main/skyreels_hunyuan_i2v_bf16.safetensors
Download Kijae's hunyuan_video_vae_bf16.safetensors vae to ComfyUI/models/vae (note it's slightly different from the official hunyuan vae)
Similar to above. Browse to ComfyUI/models/vae and right-click to run console, enter the below command:
wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors
Launch ComfyUI (connect on port 3000). Drag and drop Kijae's workflow for Skyreels into ComfyUI and install missing nodes via Manager. Also update ComfyUI itself from the Manager. Workflow download: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/tree/main/example_workflows
This is where it gets a little messy - Hunyuan needs newer dependencies like pytorch 2.5. The update is tricky because the torch update will fail initially until we copy the nvidia packages to the ComfyUI venv.

Connect on port 7777, browse to ComfyUI and right-click to run console. Enter the below commands:

source venv/bin/activate

python -c "import torch; print(torch.version)"

pip uninstall torch torchvision torchaudio xformers came-pytorch lion-pytorch torchsde -y

pip install --no-cache-dir torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0+cu124 xformers==0.0.28.post2 --index-url https://download.pytorch.org/whl/cu124

pip install --no-cache-dir torchsde==0.2.6

pip install sageattention

deactivate

cp /usr/local/lib/python3.11/dist-packages/nvidia/cudnn /workspace/ComfyUI/venv/lib/python3.11/site-packages/nvidia -r

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH

echo "export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH" >> /workspace/ComfyUI/venv/bin/activate

ldconfig

source venv/bin/activate

python -c "import torch; print(torch.version)"

pip list | grep -E "torch|torchvision|torchaudio|xformers"

After this, Kijai's workflow should be fine. I found that the video quality/motion wasn't good until the steps was increased to about 100. At 100 steps, it seems almost as good as Kling/Sora if not too much is going on. CFG also matters a lot if you want it to listen to your prompt fully - 6 is like 50/50 basically but that comes at the cost of overbaked/airbrushed look. Alternatively you can lower the CFG to around 3 and steps to around 10-30 for something that looks good like Kling but doesn't obey your prompt as much (more random behavior); so it's a tradeoff there but you want to keep the CFG and steps either both high or both low. The default CFG 6 is a bit too high for steps being at 30, which results in a lower quality look.

I also found that the attention mode on SDPA was most reliable even though SageAttention (SageAttn VarLen) was what was recommended; it just introduced an ever so slight jitter to the motion that reduced the quality. Same goes for connecting the torch, blockswap, and enhance video nodes so I leave them disconnected. However, SDPA is extremely slow (1hr for 10 seconds even on H100) whereas SageAttention cuts that in half and torch by another 30%, so it depends on how picky you are on the motion I suppose.

One thing I'm still trying to figure out and maybe someone else can provide input on is how sustain the video past 193 frames. I've been testing for 10 seconds since that is what is possible on skyreels.ai's commercial site, but even though the video doesn't appear to degrade over time, there is suddenly a lot of static for just a brief moment at roughly 193 frames - before and after that it appears to continue along just fine, which is what makes it stranger. VRAM usage never maxes out, so it doesn't appear to be a memory limitation.

Other results/numbers:

On a H100, about 1 hour render time for 10 seconds of video at 512x512 100 steps with no speedups (SDPA attention, no torch, no Sage). 30 minutes with Sage instead of SDPA & torch compile settings enabled. (but risking some slight jitter in the motion). However, overall there's a defect at roughly 8 sec mark (193th frame) that I'm trying to figure out currently if it's from the model itself or something else (it randomly becomes static for a split second but then continues normally with no defects after).

Render times drop by half if you're happy with 50 steps, etc. Increases to about 3 hours on 720x720 and 1.5 hours with SageAttention. The colors and motion stability become better the higher the resolution, even with SageAttention on.

The H100 GPU is roughly 2x faster than the RTX 4090, but it costs 4x more. The H200 GPU costs more than the H100 but oddly enough has exactly the same render time as the H100.

I also posted some samples at different CFG/Steps combinations as a response to someone else's thread here too in case it's of interest:
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/367#issuecomment-2675819574