For those who are wondering why Anima FP8 in ComfyUI only grants a minor speed increase

by Zhijie-Chen - opened 4 days ago

See this issue
FWIW, I tested one of my suggested changes locally, and the generation speed increase goes from ~15% to ~50%, which is what one would expect from FP8 on a compute bottlenecked application.

Bedovyy

Owner 4 days ago

•

edited 4 days ago

Great. I have tested with simple monkey patch that AI made, and It boost generation speed.
Hope to see this optimization integrated soon.

Model Quantization	Configuration	Before	After	Improvement
fp8tensorwise	832×1216, 30steps	4.47s (6.95it/s)	3.58s (8.74it/s)	+25.3%
	1216×1856, 30steps	11.78s (2.62it/s)	9.62s (3.22it/s)	+18.3%
mxfp8	832×1216, 30steps	4.69s (6.60it/s)	3.75s (8.34it/s)	+20.0%
	1216×1856, 30steps	11.95s (2.58it/s)	9.99s (3.10it/s)	+16.4%

ComfyUI/custom_nodes/patch_anima_mlp_patch.py

import torch
from torch import nn

def patch_fp8_mlp():
    try:
        from comfy.ldm.cosmos.predict2 import GPT2FeedForward
    except ImportError:
        print("[FP8 Patch] GPT2FeedForward class not found. Skipping patch.")
        return

    def patched_forward(self, x: torch.Tensor) -> torch.Tensor:
        original_shape = x.shape

        x_reshaped = x.view(-1, original_shape[-1])

        x_out = self.layer1(x_reshaped)
        x_out = self.activation(x_out)
        x_out = self.layer2(x_out)

        x_out = x_out.view(*original_shape)

        return x_out

    GPT2FeedForward.forward = patched_forward
    print("[FP8 Patch] Successfully patched GPT2FeedForward.forward for FP8 GEMM support.")

patch_fp8_mlp()

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment