Text-to-Image
Diffusion Single File
comfyui

For those who are wondering why Anima FP8 in ComfyUI only grants a minor speed increase

#4
by Zhijie-Chen - opened

See this issue
FWIW, I tested one of my suggested changes locally, and the generation speed increase goes from ~15% to ~50%, which is what one would expect from FP8 on a compute bottlenecked application.

Great. I have tested with simple monkey patch that AI made, and It boost generation speed.
Hope to see this optimization integrated soon.

Model Quantization Configuration Before After Improvement
fp8tensorwise 832×1216, 30steps 4.47s (6.95it/s) 3.58s (8.74it/s) +25.3%
1216×1856, 30steps 11.78s (2.62it/s) 9.62s (3.22it/s) +18.3%
mxfp8 832×1216, 30steps 4.69s (6.60it/s) 3.75s (8.34it/s) +20.0%
1216×1856, 30steps 11.95s (2.58it/s) 9.99s (3.10it/s) +16.4%

ComfyUI/custom_nodes/patch_anima_mlp_patch.py

import torch
from torch import nn

def patch_fp8_mlp():
    try:
        from comfy.ldm.cosmos.predict2 import GPT2FeedForward
    except ImportError:
        print("[FP8 Patch] GPT2FeedForward class not found. Skipping patch.")
        return

    def patched_forward(self, x: torch.Tensor) -> torch.Tensor:
        original_shape = x.shape

        x_reshaped = x.view(-1, original_shape[-1])

        x_out = self.layer1(x_reshaped)
        x_out = self.activation(x_out)
        x_out = self.layer2(x_out)

        x_out = x_out.view(*original_shape)

        return x_out

    GPT2FeedForward.forward = patched_forward
    print("[FP8 Patch] Successfully patched GPT2FeedForward.forward for FP8 GEMM support.")

patch_fp8_mlp()

Sign up or log in to comment