Memory leak.

by Yurkoff - opened Aug 23, 2023

Aug 23, 2023

•

edited Aug 23, 2023

There is a memory leak somewhere. After the calculations are completed, the model does not return the used memory to the pool.

This is what it looks like when the model boots up.

TheBloke

Owner Aug 23, 2023

That's a code problem, nothing to do with the model itself. And no-one can help with that without seeing the code being run.

deleted

Aug 23, 2023

Might want to report this to the dev of whatever app/module/etc you are using.

Yurkoff

Aug 23, 2023

My code:

import torch
from transformers import LlamaTokenizerFast, LlamaForCausalLM


tokenizer = LlamaTokenizerFast.from_pretrained(model_dir)
model = LlamaForCausalLM.from_pretrained(model_dir,
                                         load_in_8bit=True,
                                         device_map='sequential',
                                         torch_dtype=torch.float16,
                                         low_cpu_mem_usage=True,
                                         )


inputs = self.tokenizer(prompts)
output_ids = self.model.generate(torch.as_tensor(inputs.input_ids).to(self.device),
                                 do_sample=True,
                                 temperature=0.8,
                                 max_new_tokens=512,
                                 top_p=0.95,
                                 # synced_gpus=True,
                                 )
results = self.tokenizer.batch_decode(output_ids,
                                      skip_special_tokens=True,
                                      clean_up_tokenization_spaces=False)[0]

Yurkoff

Aug 23, 2023

Virsions of my packeges:

torch==2.0.1+cu118; sys_platform == 'linux'
torchvision==0.15.2+cu118; sys_platform == 'linux'
torchtext==0.15.2; sys_platform == 'linux'
torchaudio==2.0.2+cu118; sys_platform == 'linux'
psutil==5.9.5
requests==2.31.0
captum==0.6.0
packaging==23.1
pynvml==11.4.1
pyyaml==6.0
nvgpu
cython==0.29.34
wheel==0.40.0
pillow==9.3.0
numpy==1.24.3
torchtext==0.15.2
torchserve==0.7.1
torch-model-archiver==0.7.1
transformers==4.31.0
tokenizers==0.13.3
sentencepiece==0.1.99
bitsandbytes==0.41.1
accelerate==0.21.0
scipy==1.10.1

Yurkoff

Aug 24, 2023

I solved problem. After each inference i call

gc.collect()
torch.cuda.empty_cache()

https://github.com/huggingface/transformers/issues/25690

Yurkoff changed discussion status to closed Aug 24, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment