docs/System-requirements.md · rodrigomasini/advanced-ui-for-gw at b20a61130ec89b0963c281e73cf725948d47673e

These are the VRAM and RAM requirements (in MiB) to run some examples of models in 16-bit (default) precision:

Allows you to load models that would not normally fit into your GPU. Enabled by default for 13b and 20b models in this web UI.

model	VRAM (GPU)	RAM
opt-13b	12528.1	1152.39
gpt-neox-20b	20384	2291.7

A lot slower, but does not require a GPU.

On my i5-12400F, 6B models take around 10-20 seconds to respond in chat mode, and around 5 minutes to generate a 200 tokens completion.