Splitting the model of multiple GPU's

by dnhkng - opened Jul 11, 2022

Jul 11, 2022

Is there any documentation on splitting such models up for inferencing over multiple GPU's?
2nd-hand 3090 Ti's are getting quite affordable now, and I believe the model would be able to fit on the ram of two such cards, at least purely by size.

liangyuch

Jul 18, 2022

This comment has been hidden

liangyuch

Jul 24, 2022

Hi! I think I managed to run UL2 on 2 RTX3090 GPUs. I deleted last comment to avoid confusion.
I configured hugging face accelarate and ran the code snippet below. It worked! Seems that the assignment of layers on multi-GPU is by auto device map. Hope it helps.

logging.info('build tokenizer')
tokenizer = AutoTokenizer.from_pretrained("google/ul2")
logging.info('build model')
                                                                                              
model = T5ForConditionalGeneration.from_pretrained("google/ul2", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16, 
                                                    device_map='auto')

input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"                                               

inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")

logging.info('generate output')

outputs = model.generate(inputs, max_length=200)

logging.info(tokenizer.decode(outputs[0]))

dnhkng

Jul 24, 2022

I have it running on the RTX Titans, but the output looks very weird.
I first had to load the model, and then save if as FP16, as my cards do not support bfloat16.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoConfig
from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch
import torch

import torch
model_name = "google/ul2"
config = AutoConfig.from_pretrained(model_name)

with init_empty_weights():
        model = AutoModelForSeq2SeqLM.from_config(config)
        
tokenizer = AutoTokenizer.from_pretrained("google/ul2")

device_map = infer_auto_device_map(model,dtype=torch.float16)
weights_path = '.'
load_checkpoint_and_dispatch(
    model,
    weights_path,
    device_map=device_map,
    offload_folder=None,
    offload_state_dict=False,
    dtype="float16"
)

prompt = 'Machine learning is the '
input_tokenized = tokenizer(prompt, return_tensors="pt")
output = model.generate(input_tokenized["input_ids"].to(1), do_sample=True,max_length=100,temperature=0.9,top_k=50,top_p=0.9)
output_text = tokenizer.decode(output[0].tolist())
print(output_text)

The output from this script is quite fast after loading the model, a few tens of seconds, but the output reads as:

'<pad><extra_id_0> uimitmplă luminăn<pad><pad><pad><extra_id_0> uimit uimit<pad><extra_id_0> 
for the incendiu Project uimit uimitmplă machine learning uimit<extra_id_10> is<pad><pad><extra_id_0> 
qEinen incendiu în incendiu incendiu<pad><pad><extra_id_0>recommending<pad><extra_id_0><pad>
<extra_id_0> uimit lumină combining data and acţiune acţiune code uimit learning lumină presiune presiune 
acţiune; incendiu<extra_id_7> uimitency lumină lumină Artificial<extra_id_7> lumină incendiu incendiu 
incendiu<pad><extra_id_0><extra_id_74><extra_id_7><extra_id_9>a lumină lumină<extra_id_9>
<extra_id_7> uimit gradini treptat lumină presiune deep incendiu lumină lumină knowing that works 
is târziu<pad><extra_id_0><pad><pad>'

This might be to overflows in FP16 vs BFP16 maybe?

liangyuch

Jul 24, 2022

I am not an expert in LMs, therefore sorry I can't reason from your code and output.
My suggestion is to reproduce the output of the given example. I could, with the code I provided.

nbroad

Jul 24, 2022

@dnhkng check out this: https://discuss.huggingface.co/t/mixed-precision-for-bfloat16-pretrained-models/5315

dnhkng

Jul 24, 2022

•

edited Jul 25, 2022

Ouch, thanks for the link!

Looks like I either need 2 more cards and convert to FP32, or replace my cards with bfloat16 capable ones, damn...

dnhkng changed discussion status to closed Jul 25, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment