Apply for community grant: Company project (gpu)

#1
by listen2you - opened
StepFun org

Hello! First of all, thank you for your efforts and outstanding contributions to the open-source community.

This project is the official demo of the Step1X-Edit model (https://github.com/stepfun-ai/Step1X-Edit). We hope to utilize your resources to make the model more accessible and available for trial by a wider audience. Therefore, we would like to apply for access to ZeroGPU. Thank you very much!

Hi @listen2you , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

StepFun org

@hysts Thank you very much for your assistance. I am currently working on resolving some technical issues.

I noticed in the documentation that ZeroGPU is currently using A100 GPUs with 40GB of memory. Is this still the case? If the memory usage exceeds this value, will it be impossible to use ZeroGPU?

@listen2you

I noticed in the documentation that ZeroGPU is currently using A100 GPUs with 40GB of memory. Is this still the case?

Actually, we are in the process of migrating underlying hardware of ZeroGPU Spaces from half A100 to one-third H200 and half H200. (Here, "half" refers to MIG configuration.)
One-third H200 will eventually be the default, but for now, to avoid breaking too many ZeroGPU Spaces at the same time, we are using half H200.

If the memory usage exceeds this value, will it be impossible to use ZeroGPU?

Yes, that's correct. CUDA OOM will occur if the memory usage exceeds the limit. But ZeroGPU Spaces are currently using half H200, so the VRAM limit is 71GB.

Hi @listen2you , congratulations on the release and on securing the ZeroGPU grants! We would love to promote the project on our Social Media channels. However, I believe the app is currently experiencing issues. I couldn't generate an edited image, as shown in my attached example. The error displayed in the logs is:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 256, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 392, in inference
    image = infer_func(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/app.py", line 321, in generate_image
    ref_images = self.ae.encode(ref_images_raw.to(self.device) * 2 - 1)
  File "/home/user/app/modules/autoencoder.py", line 317, in encode
    z = self.reg(self.encoder(x))
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/app/modules/autoencoder.py", line 164, in forward
    hs = [self.conv_in(x)]
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

image.png

StepFun org

Hi @listen2you , congratulations on the release and on securing the ZeroGPU grants! We would love to promote the project on our Social Media channels. However, I believe the app is currently experiencing issues. I couldn't generate an edited image, as shown in my attached example. The error displayed in the logs is:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 256, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 392, in inference
    image = infer_func(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/app/app.py", line 321, in generate_image
    ref_images = self.ae.encode(ref_images_raw.to(self.device) * 2 - 1)
  File "/home/user/app/modules/autoencoder.py", line 317, in encode
    z = self.reg(self.encoder(x))
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/app/modules/autoencoder.py", line 164, in forward
    hs = [self.conv_in(x)]
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

image.png

Hi @ysharma
Thank you very much for your help. I really need some support here. Specifically, I found that this space will repeatedly upload incorrect examples, which have consumed all of my quota, making it impossible for me to debug this space properly. How can I resolve this issue?

Additionally, I found that this technical issue may require me to recreate a space. After recreating it, will I still have access to the zerogpu?

StepFun org

@ysharma Hi, the Huggingface space is ready now. Many thanks for all your help!

StepFun org

Here is an example for Step1X-Edit.

image.png

The app is fantastic and functioning excellently now. Thank you for addressing the issue, @listen2you ! We've amplified the project on X, LI, Bluesky, and YouTube through our accounts.

GpenZQPXEAAWVVj.jpeg

I have also opened a PR to add our new ImageSlider component to your app: https://huggingface.co/spaces/stepfun-ai/Step1X-Edit/discussions/2
I have duplicated your space and added imageslider just to demonstrate how it will look. You can access the app here: https://huggingface.co/spaces/ysharma/Step1X-Edit
Let me know your thoughts on this?

StepFun org

I have also opened a PR to add our new ImageSlider component to your app: https://huggingface.co/spaces/stepfun-ai/Step1X-Edit/discussions/2
I have duplicated your space and added imageslider just to demonstrate how it will look. You can access the app here: https://huggingface.co/spaces/ysharma/Step1X-Edit
Let me know your thoughts on this?

Many thanks! It is extremely fascinating, and I am merging it!

Awesome, thanks!

A big congratulations to @listen2you and the amazing StepFun team! We're thrilled to share that the Step1X-Edit app is trending this week on Huggingface !

image.png

StepFun org

@ysharma Thank you so much! My team and I are truly excited about this. It's an honor to contribute to the open-source community, and we're deeply grateful for your team's support. Without your help, it would have been impossible to share our work with so many community members.

@ysharma @hysts
Hi, we notice that the ZeroGPU seems to have some problems. To be short, when we try to run an example, we find:

thread '' panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rayon-core-1.12.1/src/registry.rs:168:10:
The global thread pool has not been initialized.: ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) }
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Process ForkProcess-2095:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 256, in thread_wrapper
res = future.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 434, in inference
image = inference_func(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/user/app/app.py", line 373, in generate_image
inputs = self.prepare([prompt, negative_prompt], x, ref_image=ref_images, ref_image_raw=ref_images_raw)
File "/home/user/app/app.py", line 174, in prepare
txt, mask = self.llm_encoder(prompt, ref_image_raw)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/app/modules/conditioner.py", line 158, in forward
inputs = self.processor(
File "/usr/local/lib/python3.10/site-packages/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 177, in call
text_inputs = self.tokenizer(text, **output_kwargs["text_kwargs"])
File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2877, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2965, in _call_one
return self.batch_encode_plus(
File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3167, in batch_encode_plus
return self._batch_encode_plus(
File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 539, in _batch_encode_plus
encodings = self._tokenizer.encode_batch(
pyo3_runtime.PanicException: The global thread pool has not been initialized.: ThreadPoolBuildError { kind: IOError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) }

Is this expected? Many thanks!

Hi @listen2you I just restarted the Space and looks like it's working again. Not sure what caused the issue, but it might be due to CUDA OOM.

Sign up or log in to comment