Code snippet incredibly slow - can someone give some tips?
#77
by
NatanRajch
- opened
This code took a 13gb ram machine 3 hs to provide an answer. Im new to the field, can someone tell me if this is expected behaviour, or if I should tweak something in the code?
How does hugging_chat, replicate, and other chats get so quick responses?
Thank you in advance for your help!!
import transformers
import torch
access_token = 'MY_TOKEN'
model_id = "meta-llama/Meta-Llama-3-8B"
pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto",token=access_token
)
pipeline("Hey how are you doing today?")