Apply for community grant: Academic project (gpu and storage)

#1
by wdplx - opened
CMU-LTI org

The Sotopia series is an expanding area of research in the Language Technology Institute at Carnegie Mellon University, featuring a dynamic social simulation framework that evaluates LLMs’ social intelligence through roleplaying and interaction. The work has earned the Spotlight award at ICLR for advancing the benchmarking and improvement of social intelligent AI.
Our latest work, Sotopia-pi, also showed social intelligence to improve under situated social environments created by Sotopia, where we trained a Mistral-7B model through reinforced learning to be on par with GPT3.5.
To further the accessibility of our platform to the social AI community, we are looking forward to publishing a demo through Hugging Face Spaces. The demo will allow users to interact with different LLM models and provide feedbacks under different social scenarios and roleplay characters, akin to an open-source version of Character AI.
We require access to GPU instances (at minimum NVIDIA A10G or A100) and persistent storage to host our model and persist user’s interaction data. We believe this demo will be valuable for our continual research and the broader community.

Website:
https://www.sotopia.world/

Github:
https://github.com/sotopia-lab

Publications:
https://arxiv.org/pdf/2310.11667.pdf
https://arxiv.org/pdf/2403.08715.pdf
https://arxiv.org/pdf/2403.05020.pdf

Hi @wdplx , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Hi @hysts , would you mind checking if ZeroGPU has already been granted to us? I don't see an option of running on Zero in my settings, and when I use the free tier CPU it has errors despite using the Zero GPU decorators. Thank you!

@wdplx Have you switched the hardware back to cpu-basic? Not sure why, but looks like the Space is currently running on cpu-basic.

CMU-LTI org

That's interesting. I don't actually have the Zero Nvidia A100 option.

Screenshot 2024-04-29 at 11.57.06 PM.png

Alright, looks like you switched the hardware to cpu-basic on April 19th. Maybe it was by accident? Anyway, I just switched the hardware to ZeroGPU again.

That's interesting. I don't actually have the Zero Nvidia A100 option.

Yeah, sorry about that. The option is visible only for people in the ZeroGPU explorers org (and HF members).

CMU-LTI org

Got it, no problem. I will not switch resources from now on. Thanks for the help!

I just sent you an invitation to join the ZeroGPU explorers org.

CMU-LTI org

Recieved it. Thanks!

Feel free to ask if you have any questions regarding ZeroGPU. Also, some libraries are not compatible with ZeroGPU, so if your app doesn't work with Zero, let us know. The default hardware for grants is now ZeroGPU but we can still assign a normal GPU as a grant with shorter sleep time if necessary.

CMU-LTI org

Thanks for the support! I'd like to share with you a few problems running on ZeroGPU.

  1. There is chance that my generate request would raise an error of GPU aborted and I think it's because of time out.
  2. When I make consecutive requests there is chance to run into the following error.

Screenshot 2024-05-01 at 1.47.01 AM.png

  1. When I try a duration of larger than 60 like 120, the above problems happen more and the latency of one generation could be slower (with out empircal observations.)

With these problems, would you mind assigning us a normal GPU for our space? We intend this space for people to have consecutive conversations with our model, which is crucial date for our future research. An uninterrupted, lower-latency experience is really important for this purpose.

Thanks!

Hmm, I see. If ZeroGPU doesn't work for your use case, maybe we can assign a10g-small with shorter sleep time.
But I'm not sure if it really improves the overall UX. It's true that normal GPU grant doesn't have GPU quota issue, but the underlying hardware of Zero is A100 while the best hardware available for a normal grant is A10G, and ZeroGPU can run parallel jobs in background, meaning multiple GPU instances are used behind the load balancer, which is also not available for a normal GPU grant.

FYI, our ZeroGPU Space grant is intended to allow as many users as possible to try out Spaces, and each user has a few minutes worth of ZeroGPU quota per a few hours, which is recharged gradually. We are limiting the amount of inference time for each user so that the Spaces are not flooded with requests from a small number of users. Without the GPU quota, some people might try to use the Space in an abusive way, like sending massive amount of requests to the Space via API, etc., which would prevent other users from using the Space.
Also, FYI, the duration parameter of the @spaces.GPU decorator is the parameter to specify the expected maximum execution time of the function. The "GPU task aborted" error is raised when the function decorated with @spaces.GPU is taking longer than the duration. This parameter is used to check if users can run the function, meaning if the remaining quota of a user is smaller than duration, they can't run the function and the error in your screenshot is raised. So, it's supposed to set as close as the maximum expected inference time of the function, and settings it too long is not recommended.

CMU-LTI org

Thanks for the explanation! Now I understand how it works. I will stick to ZeroGPU then to support multiple users.

Sign up or log in to comment