Can I run model in my desktop?

#14

by mitu820 - opened Aug 13, 2022

Aug 13, 2022

Hi, I have an 8 Core Ryzen 7 desktop PC with 64GB Ram, I have an old 4GB GPU also. Is it possible to run this model on my PC?

If yes then do we have any guide for this??

Muennighoff

BigScience Workshop org Aug 13, 2022

You can try running it on your CPUs but it will be extremely slow.
If you want to run it on a single GPU, I'd recommend at least a 40GB GPU with FP16 support.

There's some inference information here for the 176B model: https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/scripts/inference
It should also be applicable to this model just that you need less memory & FP16 instead of BF16.

mitu820

Aug 13, 2022

You can try running it on your CPUs but it will be extremely slow.
If you want to run it on a single GPU, I'd recommend at least a 40GB GPU with FP16 support.

There's some inference information here for the 176B model: https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/scripts/inference
It should also be applicable to this model just that you need less memory & FP16 instead of BF16.

Thank you, I thought 7b model takes low resources.

synthetisoft

Sep 1, 2022

You can try running it on your CPUs but it will be extremely slow.
If you want to run it on a single GPU, I'd recommend at least a 40GB GPU with FP16 support.

There's some inference information here for the 176B model: https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/scripts/inference
It should also be applicable to this model just that you need less memory & FP16 instead of BF16.

Is this the largest version I can do inference on using say, 4 x NVIDIA A10s? The total GPU memory would be 96GB. It would be nice to have a breakdown of all system requirements for each model.

Muennighoff

BigScience Workshop org Sep 1, 2022

You may be able to run the 176B by sacrificing performance or time, see https://huggingface.co/bigscience/bloom/discussions/87 or https://huggingface.co/bigscience/bloom/discussions/88

The thing is there are no hard system requirements. It depends on how fast you want it to be and how much performance you're willing to sacrifice (by e.g. reducing precision).

Tailen

Dec 19, 2022

If you want to run it on a single GPU, I'd recommend at least a 40GB GPU with FP16 support.

This is not true. I was able to run the 7b1 model in fp16 on my GPU with 24GB VRAM.

Muennighoff

BigScience Workshop org Dec 19, 2022

If you want to run it on a single GPU, I'd recommend at least a 40GB GPU with FP16 support.

This is not true. I was able to run the 7b1 model in fp16 on my GPU with 24GB VRAM.

Very nice! Didn't say it wasn't possible 👍

Tailen

Dec 22, 2022

I ran into a strange issue after a few runs though. Plenty of VRAM is free, yet PyTorch reports OOM.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 24.00 GiB total capacity; 7.04 GiB already allocated; 15.73 GiB free; 7.04 GiB reserved in total by PyTorch)

My guess is this is some sort of memory leak, since the issue doesn't occur after a system restart. I know this might not be the right place to ask, but is this a known issue?

mrtrieuphong

Aug 9, 2023

I can run BLOOM 7B1 and LoRa on 24GB GPU, it's take about 17GB

christopher changed discussion status to closed Jun 30, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment