Why not add system requirements on the model card?

#28

by johnjohndoedoe - opened Jun 4, 2023

Jun 4, 2023

Hi
I had to search for a while to find a bit of info about what the requirements are to run this, it would be nice to have more info on the model card!

thx

Ichsan2895

Jun 5, 2023

Using newest transformer & accelerate library from pip github + using bitsandbytes config (load_in_4bit, bfloat16, and nf4 quant type), I am able to run this on single A100 40 GB. Its using 80 GB of disk space for saving pretrained model.

tckb

Jun 5, 2023

Hi @Ichsan2895 I’m pretty new to this model and llm general. I’d like test this in one of azure environment and there are many which are available. Do you by any chance know which one of the vm sizes can be used. There are many which are not available due to shortage and high demands.

Thank you

Ichsan2895

Jun 5, 2023

Hi @Ichsan2895 I’m pretty new to this model and llm general. I’d like test this in one of azure environment and there are many which are available. Do you by any chance know which one of the vm sizes can be used. There are many which are not available due to shortage and high demands.

Thank you

Hello, Sorry I never test it on Azure.
I tested it on Runpods environment. It cost $ 0.85/hour which has A6000 48GB VRAM + 58 GB RAM + 200 GB Disk when it running and cost $ 0.03/hour when system idle because I saving pretrained model in their Disk too.

tckb

Jun 5, 2023

thank you for the response. How was the performance on this machine(tk/sec)?

Ichsan2895

Jun 5, 2023

•

edited Jun 5, 2023

Pretty slow.. About 0.5-1 token/second. BTW, Guanaco-65-GPTQ is faster but unfortunatelly it can not be use for commercial.

tckb

Jun 5, 2023

@Ichsan2895 I was able to run this on Standard_NC48ads_A100_v4 which has 160 GiB GPU mem. I wasn't able to use bitsandbytes module (some issue, I couldn't debug it). The results were surprisingly good. I could only use it for a very short time because its pretty expensive. See https://twitter.com/this_is_tckb/status/1665814803829473280/

adeebaldkheel

Jun 7, 2023

is it possible to run it on RTX 4090?
sorry guys but can someone tell me what does 40b mean? what I know is 40b x 4 = 160GB right?
does it mean one GPU with total of 160GB can load this model?
or I need 160GB+ for training? and training is different than using?

Ichsan2895

Jun 7, 2023

•

edited Jun 7, 2023

40b is 40 billion parameter used. But it does not mean that it need 40 GB GPU RAM. I used 48 GB A6000 to run this model. It can be optimized (for lowering consumption) by activating bitsandbytes config. Which enable bfloat16 and load_in_4bit. Unfortunatelly it wont run in 24GB VRAM (OOM).

Sorry I dont know the consumption when it training/fine tuning new dataset.

FalconLLM

Technology Innovation Institute org Jun 9, 2023

We have added some basic info on running the model to the card. It takes ~80-100GB to comfortably infer Falcon-40B. There has been some work with FalconTune on 4-bit quantization as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment