• By wrapping GPU functions like this the pipeline doesn't have to be transferred from the main process to the GPU worker and GPU coldstart should be much faster (measured between 3s and 4s on this Space)
  • prefetch_hf_cache actually calls the pipe, which can't be done outside of decorated functions on ZeroGPU.
    When no coldstart happens, execution time does not seem to change, with or without prefetch_hf_cache
Photogrammetry and Remote Sensing Lab of ETH Zurich org

Thank you @cbensimon -- this works great with ZERO A100.
In my private sandbox space, when the worker is warm, it is even faster than now with A10G: 3 sec -> 2 sec.
When the worker is cold, it adds ~4 seconds, which is much less than it used to be!

toshas changed pull request status to merged

Sign up or log in to comment