A noob trying to run the model with the transformers library

#125
by Alain44 - opened

I'm a newbie with LLM.
I've tried to use the code written in the model card to launch the model on my laptop running Ubuntu (with an nvidia A500 with 4GB of RAM, CPU AMD Ryzen™ 7 7840HS 8cores/16threads, 32 GB of DDR5).
pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")

When I type the code :
pipeline("what is LLM ?")
It takes forever with 1 CPU thread at 100%.

Is there a way to make this run faster or is my laptop too slow ?

I tried with
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-8B"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={
"torch_dtype": torch.bfloat16,
"quantization_config": {"load_in_4bit": True},
"low_cpu_mem_usage": True,
},
)
I've got the error : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB. GPU.

My GPU has 4GB of memory maybe it's not enough ?

Hey Alain,

For Llama 3 8GB, your GPU has not enough memory. Try more powerful hardware. You could start with google Colab, there you can use ressources from Google Server for free, its still not the best but should work for your 8GB Modell. Maybe also shardening could help you at this point.
If you are inferecing, choose a lower sequention length for less Ram Usage

Sry for my bad english, and GL

Sign up or log in to comment