Spaces:

zai-org
/

CogView2

Runtime error

About launch time out

by hysts - opened Jul 28, 2022

Jul 28, 2022

We are now updating this Space to make the second stage model available, but downloading and loading the second stage model increases the launch time, and we are getting the following error:

Runtime error
launch timed out, space was not healthy after 30 min

Could you make the launch time limit longer?

hysts

Jul 28, 2022

•

edited Jul 28, 2022

Also, how much host memory is available in this Space? Using the second stage model will increase the memory usage too. As for the GPU memory, I checked this app works with 24 GB VRAM, so I think it's OK with A10, but I'm not sure if it will work with the current amount of host memory.

chris-rannou

Jul 29, 2022

•

edited Jul 29, 2022

@hysts

I increased the launch timeout but you are right the actual issue is an OOM issue. This space is assigned 46GB of memory. How much memory do you think you need ?
Is the high memory usage only at startup to load the model or does it also consumes a lot of memory at actual runtime ?

chris-rannou

Jul 29, 2022

I updated the error message to reflect the OOM and increased the memory for the Space to 64GB

hysts

Jul 29, 2022

@chris-rannou
Thanks a lot!

How much memory do you think you need ?

I've tested this app on an A100 instance of GCP with 85GB RAM before pushing it, so 85GB is definitely sufficient, but I wasn't sure how much is the necessary amount. But it seems to be working with 64GB host memory now. Thanks.

Is the high memory usage only at startup to load the model or does it also consumes a lot of memory at actual runtime ?

It consumes a lot of memory at runtime too. When I run the app in an instance mentioned above, it consumes about 40-50GB memory.

hysts changed discussion status to closed Jul 29, 2022

chris-rannou

Jul 29, 2022

The space seems to stabilize around 54GB memory but with a few spikes that went beyond the 64GB limit.

hysts

Jul 29, 2022

Thanks for the info. I was encountering CUDA OOM when I ran the app with a larger batch size, but now it's fixed and seems to be working. Thanks for your help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment