10x demo speedup

#1
by cbensimon HF Staff - opened

PR contents

  • Remove CPU offloading (not needed)
  • Load pre-compiled FLUX blocks (built with FlashAttention-3)

Before: 130s
After: 13s
(including ZeroGPU init)

cbensimon changed pull request title from Accelerate demo to 10x demo speedup
cbensimon changed pull request status to open
wanghaofan changed pull request status to merged

Sign up or log in to comment