Spaces:

juanma1907
/

la-llama-que-llama

Paused

App Files Files Community

la-llama-que-llama / docs /System-requirements.md

juanma1907's picture

Upload folder using huggingface_hub

462dacf over 1 year ago

|

history blame contribute delete

1.75 kB

A newer version of the Gradio SDK is available: 5.4.0

Upgrade

These are the VRAM and RAM requirements (in MiB) to run some examples of models in 16-bit (default) precision:

model	VRAM (GPU)	RAM
arxiv_ai_gpt2	1512.37	5824.2
blenderbot-1B-distill	2441.75	4425.91
opt-1.3b	2509.61	4427.79
gpt-neo-1.3b	2605.27	5851.58
opt-2.7b	5058.05	4863.95
gpt4chan_model_float16	11653.7	4437.71
gpt-j-6B	11653.7	5633.79
galactica-6.7b	12697.9	4429.89
opt-6.7b	12700	4368.66
bloomz-7b1-p3	13483.1	4470.34

GPU mode with 8-bit precision

Allows you to load models that would not normally fit into your GPU. Enabled by default for 13b and 20b models in this web UI.

model	VRAM (GPU)	RAM
opt-13b	12528.1	1152.39
gpt-neox-20b	20384	2291.7

CPU mode (32-bit precision)

A lot slower, but does not require a GPU.

On my i5-12400F, 6B models take around 10-20 seconds to respond in chat mode, and around 5 minutes to generate a 200 tokens completion.

model	RAM
arxiv_ai_gpt2	4430.82
gpt-neo-1.3b	6089.31
opt-1.3b	8411.12
blenderbot-1B-distill	8508.16
opt-2.7b	14969.3
bloomz-7b1-p3	21371.2
gpt-j-6B	24200.3
gpt4chan_model	24246.3
galactica-6.7b	26561.4
opt-6.7b	29596.6