Jul 21, 2023

So I have 8x 1080Ti in my machine. (also i5 and 16GB ram).
1080Ti is 11GB graphic card. Falcon 7B is in 2 parts and it should work. Vicuna model works on this machine.
so this is my Python code.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_name = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to("cuda") # Move the model to the GPU

Wrap the model with DataParallel to use multiple GPUs

if torch.cuda.device_count() > 1:
model = torch.nn.DataParallel(model)

pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)

sequences = pipeline(
"tell me a joke.",
max_length=100,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")

when I run it, I am just getting message "Killed".
thanks for help!

jurecucek

Jul 21, 2023

Falcon 7B does load using fastchat. So i guess my code is wrong :D

adam-zettafi

Jul 28, 2023

In my experience, killed usually means it used too much RAM and was shut down. Are there specific reasons you are providing so many configuration options, or was this from a code snippet? My best experiences with Hugging Face libraries have been when starting with only the bare necessities to get it running and then modifying from there for tuning. I suggest removing all the extras and letting the Transformer and Pipeline figure it out. You can always add more for tuning later.

tiiuae
/

falcon-7b-instruct

Getting message Killed when loading on multi-gpu

Wrap the model with DataParallel to use multiple GPUs