Loading and Inferencing model on Multiple GPUs

by V1shwanath - opened Sep 30

Discussion

V1shwanath

Sep 30

When loading and inferencing with the model getting the following error.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

xxyyy123

AIDC-AI org Oct 19

Below is an example of running AIDC-AI/Ovis1.6-Gemma2-9B on two GPUs:

import torch
from PIL import Image
from transformers import AutoModelForCausalLM

device_map = {
    "visual_tokenizer": 0,
    "vte": 0,
    "llm.model.embed_tokens": 0,
    "llm.model.norm": 0,
    "llm.lm_head": 0,
    "llm.model.layers.0": 0,
    "llm.model.layers.1": 0,
    "llm.model.layers.2": 0,
    "llm.model.layers.3": 0,
    "llm.model.layers.4": 0,
    "llm.model.layers.5": 0,
    "llm.model.layers.6": 0,
    "llm.model.layers.7": 0,
    "llm.model.layers.8": 0,
    "llm.model.layers.9": 0,
    "llm.model.layers.10": 0,
    "llm.model.layers.11": 0,
    "llm.model.layers.12": 0,
    "llm.model.layers.13": 0,
    "llm.model.layers.14": 0,
    "llm.model.layers.15": 0,
    "llm.model.layers.16": 0,
    "llm.model.layers.17": 0,
    "llm.model.layers.18": 0,
    "llm.model.layers.19": 0,
    "llm.model.layers.20": 1,
    "llm.model.layers.21": 1,
    "llm.model.layers.22": 1,
    "llm.model.layers.23": 1,
    "llm.model.layers.24": 1,
    "llm.model.layers.25": 1,
    "llm.model.layers.26": 1,
    "llm.model.layers.27": 1,
    "llm.model.layers.28": 1,
    "llm.model.layers.29": 1,
    "llm.model.layers.30": 1,
    "llm.model.layers.31": 1,
    "llm.model.layers.32": 1,
    "llm.model.layers.33": 1,
    "llm.model.layers.34": 1,
    "llm.model.layers.35": 1,
    "llm.model.layers.36": 1,
    "llm.model.layers.37": 1,
    "llm.model.layers.38": 1,
    "llm.model.layers.39": 1,
    "llm.model.layers.40": 1,
    "llm.model.layers.41": 1
}

# load model
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Ovis1.6-Gemma2-9B",
                                             torch_dtype=torch.bfloat16,
                                             multimodal_max_length=8192,
                                                device_map=device_map,
                                             trust_remote_code=True)
text_tokenizer = model.get_text_tokenizer()
visual_tokenizer = model.get_visual_tokenizer()

# enter image path and prompt
image_path = input("Enter image path: ")
image = Image.open(image_path)
text = input("Enter prompt: ")
query = f'<image>\n{text}'
# format conversation
prompt, input_ids, pixel_values = model.preprocess_inputs(query, [image])
attention_mask = torch.ne(input_ids, text_tokenizer.pad_token_id)
input_ids = input_ids.unsqueeze(0).to(device=model.device)
attention_mask = attention_mask.unsqueeze(0).to(device=model.device)
pixel_values = [pixel_values.to(dtype=visual_tokenizer.dtype, device=visual_tokenizer.device)]

# generate output
with torch.inference_mode():
    gen_kwargs = dict(
        max_new_tokens=1024,
        do_sample=False,
        top_p=None,
        top_k=None,
        temperature=None,
        repetition_penalty=None,
        eos_token_id=model.generation_config.eos_token_id,
        pad_token_id=text_tokenizer.pad_token_id,
        use_cache=True
    )
    output_ids = model.generate(input_ids, pixel_values=pixel_values, attention_mask=attention_mask, **gen_kwargs)[0]
    output = text_tokenizer.decode(output_ids, skip_special_tokens=True)
    print(f'Output:\n{output}')

V1shwanath

Oct 21

Im getting this error now.
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:0)

xxyyy123

AIDC-AI org Oct 21

Im getting this error now.
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:0)

I executed the above code locally, and it runs fine. You might want to check if there is an issue with your Python environment.

V1shwanath

Oct 21

Thanks. Reran it in a new env and it worked

V1shwanath changed discussion status to closed Oct 21

kyujinpy

22 days ago

This discussion was very helpful!
One insight, the bug above is determined by the transformers version.

As of 12/04/2024, the latest version of transformers shows the same error as above.
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:0)

The 4.44.2 version is safe.

Thanks!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment