Falcon 7b instruct using cpu for inference even on NVIDIA A40 cards with 50GB VRAM

#70

by Akshadv - opened Jul 20, 2023

Discussion

Akshadv

Jul 20, 2023

•

edited Jul 20, 2023

tokenizer = AutoTokenizer.from_pretrained(model_path)

falcon_pipeline = pipeline(
"text-generation",
model=model_path,
tokenizer=tokenizer,
max_new_tokens=256,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map = 'auto',
do_sample=True,
top_k = 10,
temperature=0.7,
eos_token_id=tokenizer.eos_token_id
)

using this code + llmchain for inference am i doing something wrong or any thing needs to be fixed to get full inference on gpu?

CPU is always hitting 100%

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment