Is is possible to run the model in 2 gpu?

#5
by thanhnew2001 - opened

I tried to use max_memory_mapping to enable 2 gpus but it seems not support

max_memory_mapping = {0: "600MB", 1: "1GB"}
model_name = "bigscience/bloom-3b"
model_4bit = AutoModelForCausalLM.from_pretrained(
model_name, device_map="auto", load_in_4bit=True, max_memory=max_memory_mapping
)

Ctranslate2 in the past did not support this. If you enable multiple GPU‘s each GPU will hold a entrie full copy.

michaelfeil changed discussion status to closed

Sign up or log in to comment