Text Generation
Transformers
Safetensors
Turkish
English
llama
conversational
Inference Endpoints
text-generation-inference

How can I run it with 8GB of RAM

#2
by whatnext - opened

I have a poor Ryzen 5 8GB RAM laptop. I want to make a Turkish chatting AI.
I tried to run by following the example in readme file. I got an error message at first such as

  ValueError: Input length of input_ids is 21, but `max_length` is set to 20. This can lead to unexpected behavior. You should 
 consider increasing `max_length` or, better yet, setting `max_new_tokens`.

Chat-GPT helped me handle it.
Though the process continued for dozens of minutes it failed to output any result.
I tried tokens way but process kept killing itself after a while I ran it.

I am a noob regarding LLM.
I need your guidance.
How can I run it properly?

SambaNova Systems org
edited Mar 21

Hey @whatnext , it looks like you do not have the proper hardware setup to run this model, the model weights are 13GB so 8GB RAM will not be enough to run this.

I have heard of llama cpp, which can help you run on a laptop, but I am not too familiar with this and not sure if it would work https://github.com/ggerganov/llama.cpp.

If I were you I would

  1. Use our API we provide for free https://sambaverse.sambanova.net/
    documentation: https://docs.sambanova.ai/sambaverse/latest/use-sambaverse.html#:~:text=Task%2Dspecific%20model-,Your%20API%20key,%27https%3A//sambaverse.sambanova.net/api/predict%27,-Rate%20limits
  2. Run the model on the cloud, for example you can use some free GPU/TPU time on google colab, or some money to access more compute https://colab.google/. There are probably many other cloud offerings, I am not familiar with what the best options are.

We hope you enjoy our model and would appreciate if like and share our project for more visibility, thank you!

@zolicsaki Thank you for your help. I want an offline chat model. I 'll try quantization technique of llama.cpp, though I don't know if it works.

Sign up or log in to comment