The Serverless Inference API: "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)"

#31

by michaelpope - opened Apr 19, 2024

Apr 19, 2024

It shows the error "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)" when using the Serverless Inference API.

Any way to use Meta-Llama-3-8B with the Serverless Inference API?

Thank you!

Ironmole

Apr 21, 2024

Same question here!

JadeTW

Apr 23, 2024

This comment has been hidden

gbhall

Apr 23, 2024

It's ironic because the error is The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints). but I am using inference endpoints?

gbhall

Apr 23, 2024

•

edited Apr 23, 2024

Got it working, on the website on the right hand column is specifically says

Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

After creating a dedicated endpoint it works.

dragonku

Apr 27, 2024

The same error message.

Want to use Meta-Llama-3-8B with the Serverless Inference API.

JitendraK

May 9, 2024

same problem even in pro account

osanseviero

May 9, 2024

Hey all. This model is not provided in the serverless inference API, but the instruct version is https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

JitendraK

May 9, 2024

for the beginners : what's the difference between regular and Instruct model?

osanseviero

May 9, 2024

Base models are optimize to generate the next token. If you want a chat-like model (a-la ChatGPT), you want to use an instruct version, which is the base model furtherly trained on chat-like behavior (with a series of alignment techniques).

osanseviero changed discussion status to closed May 9, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment