The Serverless Inference API: "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)"

#31
by michaelpope - opened

It shows the error "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)" when using the Serverless Inference API.

Any way to use Meta-Llama-3-8B with the Serverless Inference API?

Thank you!

Same question here!

This comment has been hidden

It's ironic because the error is The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints). but I am using inference endpoints?

Got it working, on the website on the right hand column is specifically says

Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

After creating a dedicated endpoint it works.

The same error message.

Want to use Meta-Llama-3-8B with the Serverless Inference API.

same problem even in pro account

Meta Llama org

Hey all. This model is not provided in the serverless inference API, but the instruct version is https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

for the beginners : what's the difference between regular and Instruct model?

Meta Llama org

Base models are optimize to generate the next token. If you want a chat-like model (a-la ChatGPT), you want to use an instruct version, which is the base model furtherly trained on chat-like behavior (with a series of alignment techniques).

osanseviero changed discussion status to closed

Sign up or log in to comment