System requirements

#1
by gaboukassm - opened

Hi,

What kind of setup would be required to run your model ?
I am new to this and would appreciate your help .
I am planning on running this on a GPU setup but even using 4xRTX4090 or 2X A100 , the system kept going out of memory , so i was wondering what king of setup do i need to have for this to run efficiently ?

Thank you

It depends on the settings. 2 4090 should be more than enough for a lot of use cases.
Without fused attention, it's 27 gb vram, will need some if if yo do stuff.

image.png

You can also let if have fused attention and just reduce the max_seq_length to something way smaller yet still useful

image.png

KnutJaegersberg changed discussion status to closed
gaboukassm changed discussion status to open

Thank you so much for your reply and sorry for my silly question but what is this tool you're using (not the nvtop) to load the module and specify sequence length etc ?

It's https://github.com/oobabooga/text-generation-webui
These options are part of autoawq
https://github.com/casper-hansen/AutoAWQ
fuse_layers=True
max_new_tokens=seq_len

KnutJaegersberg changed discussion status to closed

Sign up or log in to comment