Pipeline for low-latency webcam stream editing based of Flux.2-klein-4B

#14
by TensorForger - opened

Hi, am making a pipeline based on this model for low-latency streaming.
Have added several optimizations, one is the custom KV-cache that allows to recompute only the changed regions of each frame by masking static tokens and reusing there KV pairs. This alone increases FPS 1.5 - 2.5 times depending on motion in the stream.
Around 30 FPS now on a single RTX 5090 card (with frame interpolation) and around 8 native model's FPS.

https://github.com/tensorforger/FluxRT

Sign up or log in to comment