Fix the app

#13
by RefreshedCyberJohn - opened

So everytime I try to generate a video, it takes 1.7 seconds and I get nothing. Can you please fix this?

RefreshedCyberJohn changed discussion title from App is broken? to Fix the app
Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Hi, @RefreshedCyberJohn
It seems that the app was not working properly due to the CUDA OOM error. I've just restarted the app, so I think it should start working again in about 30 minutes. Thanks.

You're welcome. ;)

It's still doing it, now it says 2.0 or so.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Hmm, I'm not sure what's going on, but we are getting this error:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/routes.py", line 247, in run_predict
    output = await app.blocks.process_api(
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/blocks.py", line 640, in process_api
    predictions, duration = await self.call_function(fn_index, processed_input)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/gradio/blocks.py", line 555, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/user/app/model.py", line 1241, in run_with_translation
    frames = self.run(text, seed, only_first_stage,image_prompt)
  File "/home/user/app/model.py", line 1178, in run
    set_random_seed(seed)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/SwissArmyTransformer/arguments.py", line 429, in set_random_seed
    torch.manual_seed(seed)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/random.py", line 40, in manual_seed
    torch.cuda.manual_seed_all(seed)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/random.py", line 113, in manual_seed_all
    _lazy_call(cb, seed_all=True)
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/__init__.py", line 156, in _lazy_call
    callable()
  File "/home/user/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/cuda/random.py", line 111, in cb
    default_generator.manual_seed(seed)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I'll reboot the Space, but the same error could occur again.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

@chris-rannou @akhaliq Any idea why this error occurs and how to fix it?

Okay, it's working now.

UPDATE: Now when I try to render, it says 0/193.0 and immediately stops after like a second. Fix it again please.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

I think there are some glitches in HF Spaces now. Some other Spaces are not working properly either.

It's doing it again

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Thanks. I restarted the Space. It will be up again in about 30 minutes.

It's still doing it

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Hmm, factory reboot doesn't seem to be working.

A factory reboot is ongoing on the space, did you try setting the environment variable CUDA_LAUNCH_BLOCKING=1 to try and get more details about the error ?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

@chris-rannou
Thanks! It's working now.

did you try setting the environment variable CUDA_LAUNCH_BLOCKING=1 to try and get more details about the error ?

Ah, sorry. No, I haven't tried it. I know the log said to do it, but when I pressed the "Restart this Space" button or "Factory reboot this Space" button, the build ended unexpectedly fast and the log from last time was still showing. So I thought the Space was not actually rebooting and decided to ask in the forum. But I should have checked it first just in case.

After a successful factory reboot the space seems to be working now

@hysts you were right there was an issue with the rebooting dues to this space specific resources assignment

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

I see. Thanks!

Okay, I'll try it.

This comment has been hidden

Now it needs to be fixed again. It stops quickly after I press "generate".

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Hi, @RefreshedCyberJohn
Sorry for the late reply. I've been busy and away from HF Hub for a while and I just noticed your message. I factory-rebooted the Space and it seems working properly now.

:O Broken again! Oct 12. It stops after 5 seconds and shows just a cam. Nothing happens nearly 2 hours latter either.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Hi, @BladedSupernova
Thanks for reporting this. I factory-rebooted this Space just now. So I think it will be back up in 30-40 minutes.

(Sorry, but I'm not feeling very well today, so I'm not going to wait to see if the Space will be restarted successfully, but I think it will be fine. I'll check it again tomorrow.)

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Looks like it's working properly now.

Sign up or log in to comment