A10G GPU assignment

#2
by hysts HF staff - opened
Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Hi, @chris-rannou cc: @akhaliq
Could you assign an A10G GPU to this space?

Hello @hysts I assigned an A10G GPU on the space but it's getting OOMKilled, consuming 30Go+ RAM

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

@chris-rannou Great! Thanks!

but it's getting OOMKilled, consuming 30Go+ RAM

Hmm, that's weird. I thought it worked on my environment with 24 GB RAM. I'll look into it.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

@chris-rannou

The error says

Runtime error
Memory limit exceeded (16G)

I was thinking of GPU memory when I saw OOM, but maybe you were talking about regular non-GPU memory. Is it possible to assign an instance with more RAM? I'm not sure if it'll solve the problem, but I'd like to see what happens.

@hysts yes the OOM error is for regular memory I should have been clearer. The error message says 16Go but that's a mistake (it's the default value), there is in fact 30Gb assigned to this space.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

@chris-rannou
Thanks. I see. Actually, this program temporarily consuming about 30 GB of RAM is an expected behavior because the size of the model is about 14GB and the SwissArmyTransformer library, which this repo uses, first instantiates the model with random parameters and then loads the pretrained weights, as is often done.
30 GB of RAM looks sufficient, but it seems the program actually needs a bit more memory.

Is it possible to increase the amount of RAM?

Working on it

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

@chris-rannou Thanks a lot! Looks like the Space is working now.

hysts changed discussion status to closed

FYI at startup the space actually consumes up to about 46Go of memory and then stabilizes at about 26Go.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Oh, that much? In my GCP environment, the app seemed to consume only about 32GB of RAM, so it's unexpected to me, but thanks.

Sign up or log in to comment