how to perform inference over multi-gpu setup

by fcakyon - opened Jan 7, 2024

Discussion

fcakyon

Jan 7, 2024

•

edited Jan 7, 2024

as given in the readme of https://huggingface.co/THUDM/cogvlm-chat-hf

device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes=['CogVLMDecoderLayer', 'TransformerLayer'])

how to dispatch THUDM/cogagent-vqa-hf model into multiple gpus?

cc: @qingsonglv @chenkq

teo96

Jan 25, 2024

•

edited Jan 25, 2024

I managed to perform inference on multiple gpus also by following example from https://huggingface.co/THUDM/cogvlm-chat-hf and replacing device_map with:

device_map = infer_auto_device_map(model, max_memory={0:'18GiB',1:'18GiB','cpu':'16GiB'},no_split_module_classes=['CogAgentDecoderLayer'])

fcakyon

Jan 25, 2024

@teo96 thanks a lot!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment