databricks/dolly-v2-12b · GPU requirement for simply running the model

Apr 13, 2023

•

edited Apr 13, 2023

Hello good people of Databricks!

I'm a grad student and trying out Dolly v2 for a summarization problem using an AWS EC2 instance. I have a limited budget for AWS so cannot afford to experiment much. Can you please guide me?

What is the GPU requirement for running the model?
The input prompts are going to longer (since it's Summarization task). Would a longer input require higher memory?

I have used dolly v1. It's great but slow probably since I'm running it with 16 GB GPU provided by two Tesla M60s.
Thanks,
Abhilash

srowen

Databricks org Apr 13, 2023

Please see https://github.com/databrickslabs/dolly
A100, though it can work on A10 in 8-bit.
Yes longer prompts require more memory. I think you really want at least an A10. M60s aren't even really for deep learning, though would work with more memory maybe.

abhi24

Apr 13, 2023

Thank you! Helps a lot.

dfurman

Apr 13, 2023

@abhi24 you can load the model on a Tesla T4 when using load_in_8bit=True, I was seeing around 13 GB in VRAM usage after loading it in. This means you can either do it in Google Colab or on any AWS instance with a basic GPU (like a Tesla T4).

nicolaschaillan

Apr 13, 2023

@dfurman where do you set load_in_8bit? Is that in the config.json? Thanks!

KanonKop

Apr 13, 2023

•

edited Apr 13, 2023

@srowen , I tried to run the model on a workstation last night (8c Ryzen CPU, 32GB RAM and RTX3090 GPU). The model appears to load correctly, but the RAM quickly saturates to 100% with the VRAM consumption idling on 2GB (Windows and background apps). I am using the GPU version of torch and set the CUDA device ID to force use of the GPU, torch also correctly identifies the CUDA device. Is more RAM required either way to first load in the model, prior to model being transferred to the GPU? Or is the model loaded into RAM either way with the inference running solely on the GPU (i.e. time for a RAM upgrade :) )?

matthayes

Databricks org Apr 13, 2023

We've released some smaller models trained on the same data if you'd like to try them. These are 2.8B and 6.9B parameter respectively, compared to the current model which is 12B parameters.

https://huggingface.co/databricks/dolly-v2-2-8b
https://huggingface.co/databricks/dolly-v2-6-9b

dfurman

Apr 13, 2023

@nicolaschaillan :

AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-6-9b", device_map="auto", load_in_8bit=True)

jacobgoss

Apr 13, 2023

•

edited Apr 13, 2023

@KanonKop I have been able to load and run the 12b model on a g5d.2xlarge instance on AWS which has 32GB RAM and an A10 GPU.

with:
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b", device_map='auto', load_in_8bit=True)

When loading in the model it leaked a couple of GB into swap but then dumped the model into the GPU and RAM usage went down to below 10GB

srowen

Databricks org Apr 13, 2023

@dfurman good to know 12b works on the T4! I hadn't tried that yet. The smaller models Matt just put out should be totally viable on these GPUs without 8-bit.

KanonKop

Apr 14, 2023

@jacobgoss thanks for the feedback, the reduced parameter model worked correctly, will try to rerun the full 12b model with 8-bit quantization soon.

jaklan

Apr 16, 2023

•

edited Apr 16, 2023

@jacobgoss If that's a Linux host, maybe it helps if you turn the swap off with:
$ sudo swapoff -a

jaklan

Apr 16, 2023

This comment has been hidden

jayliang701

Apr 19, 2023

I can't run 12b and 7b model in Google Cloud GPU instance with T4/7.5G Mem/100 G disk, and using the image Debian 10 based Deep Learning VM with , M107, Base CUDA 11.3, Deep Learning VM Image with CUDA 11.3 preinstalled

Always fails by MemoryError. So frustrating .

Anyone works as expected?

srowen

Databricks org Apr 19, 2023

This is documented in the repo https://github.com/databrickslabs/dolly#training-on-other-instances
A T4 isn't nearly enough, and 7.5GB mem won't work.
You want an A100 for the largest model, and there are notes there for smaller GPUs

srowen changed discussion status to closed Apr 19, 2023

jaklan

Apr 20, 2023

What is Google Colab, how can it be used to run these models?

jaklan

Apr 20, 2023

@srowen Why close this discussion?

srowen

Databricks org Apr 20, 2023

I think the question is answered, no? you seem to be asking something unrelated, too. I'm not sure I understand

jaklan

Apr 20, 2023

Based on @srowen answers the minimum GPU requirement to even run this model is A100, which costs $10k+, so you might not in the future call this model as "runnable in home PC", I bet no-one has $10k GPU in home PC.

Powering many of these applications is a roughly $10,000 chip that’s become one of the most critical tools in the artificial intelligence industry: The Nvidia A100.

jaklan

Apr 20, 2023

@srowen 1. Where is the topic of this discussion answered?

@srowen 2. So you are closing this because I am asking something unrelated? How is asking about possible GPUs in which to run ths is unrelated,when the topic is "GPU requirements to run this model"

@srowen 3. What you didn't understand, something wrong with language or questions?

jaklan

Apr 20, 2023

Also, most of the comments in this discussion are about not being able to run any models, with any GPUs, or comments about failing to even run model, no comments about successful runs and inferings from the model with any of the GPUs tyey are trying, and no successful training runs either.

And you conclude this can be closed as resolved, seems that you couldn't care less whether community is able to actually run and use these models or not.

srowen

Databricks org Apr 20, 2023

Hey @jaklan , slow your roll there.
I think you're not reading the docs and discussion above. You don't need an A100; you certainly do not buy one to start using this. These are, obviously, available in the cloud. You can run the 12B model on an A10 or V100 or a T4 (16GB) with 8bit. In fact, that's what was discussed above. That's about all this thread is about. That's why I don't know how to answer "where is the answer"
You're asking things like "what is Colab?" which is unrelated, and then re-asking the same question.
It's general best practice anywhere to just start new threads for different questions, if your question isn't already answered.

jaklan

Apr 20, 2023

•

edited Apr 20, 2023

Why so abusive @srowen ?

If you make claims about other accounts, you are stepping out of scope, and you are constantly insulting, towards me specifically, closed this discussion because I am asking wrong questions, and acting as an admin who doesn't like that community comments or asks questions.

Reporting you for abuse!

srowen

Databricks org Apr 20, 2023

(This is actually Hugging Face.) I don't understand your tone or complaint here. This isn't your question that I answered and deemed finished. You added both the same, and a different, question after. Just don't see any other way to read the timeline?
I am an 'admin' for these repos.
"Closing" a discussion is like marking an issue resolved. I don't get why that's perceived as negative.
You are welcome to report whatever you want, but, I think the discussion speaks for itself.
I will not interact with you more on this. I will interact with normal boring civil threads that 99.9% of people manage here.

jaklan

Apr 20, 2023

I think the question is answered, no? you seem to be asking something unrelated, too. I'm not sure I understand

No it isn't.

No, I am not asking something unrelated, I am asking something on topic: what is minimum GPU requirement to run this model, and how?

If you don't understand, why did you answer then, I didn't ask you I asked the forum/community!

srowen

Databricks org Apr 20, 2023

Last time: https://huggingface.co/databricks/dolly-v2-12b/discussions/9#643fc2866fd05d823065341b
I do feel it is appropriate to close discussions that have concluded, where further comments aren't adding anything - re-asking what's been answered, "me too", different questions. Of course, anyone is welcome to start a new discussion, hopefully not a duplicate. It keeps the list of active discussions clean, and keeps separate threads separate.
I understand the question, and your question. I don't understand your puzzlement at the above.

jaklan

Apr 20, 2023

(This is actually Hugging Face.) I don't understand your tone or complaint here. This isn't your question that I answered and deemed finished. You added both the same, and a different, question after. Just don't see any other way to read the timeline?
I am an 'admin' for these repos.
"Closing" a discussion is like marking an issue resolved. I don't get why that's perceived as negative.
You are welcome to report whatever you want, but, I think the discussion speaks for itself.
I will not interact with you more on this. I will interact with normal boring civil threads that 99.9% of people manage here.

As you wish 💪 (because of # posts where @srowen claims I've done things which I haven't: haven't asked unrelated questions, haven't re-asked them etc, and because of # posts where @srowen insults and attacks me personally ‼️
https://huggingface.co/databricks/dolly-v2-12b/discussions/47#6440c9417841867cd5b7a068

silvacarl

Apr 28, 2023

try one of these, you can pick any size you want according to your budget:

https://cloud.lambdalabs.com/
https://cloud.coreweave.com/
https://paperspace.com/