Issue with using the model in Spaces

#13
by gospacedev - opened
Cognitive Computations org
edited Dec 19, 2023
from huggingface_hub import InferenceClient

client = InferenceClient(
    "ehartford/dolphin-2.5-mixtral-8x7b"
)

When I try to use ehartford/dolphin-2.5-mixtral-8x7b in Spaces, I get these errors:

huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b (Request ID: bttrYLuVoD5jjxUm9RxFm)

The model ehartford/dolphin-2.5-mixtral-8x7b is too large to be loaded automatically (93GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b

Is the API access of this model restricted, or its too large? But when I tried mistralai/Mixtral-8x7B-Instruct-v0.1 using InferenceClient, it works.

gospacedev changed discussion title from Errors when using the model in Spaces to Issue with using the model in Spaces

You may be launching it on spaces, but u are still using the InferenceClient that uses the inference api. I guess you must be new around, models that require more than 10gb cannot be run with the free inference API, and running it on spaces using a free space will not be possible neither hardware speaking.

If you ever want to REALLY run the model itself, in a space or in your computer, you will not be using the inference API and instead be using transformers, AKA downloading the model on your computer and running it on your own hardware (or space hardware if you run it here)

I help I managed to be of some help !

from huggingface_hub import InferenceClient

client = InferenceClient(
    "ehartford/dolphin-2.5-mixtral-8x7b"
)

When I try to use ehartford/dolphin-2.5-mixtral-8x7b in Spaces, I get these errors:

huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b (Request ID: bttrYLuVoD5jjxUm9RxFm)

The model ehartford/dolphin-2.5-mixtral-8x7b is too large to be loaded automatically (93GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b

Is the API access of this model restricted, or its too large? But when I tried mistralai/Mixtral-8x7B-Instruct-v0.1 using InferenceClient, it works.

Now about Mixtral I agree that I was surprised too, but Mixtral does use originally an architecture quite unique that reduces considerably the amount of paremeters required to predict tokens for max efficiency, so that may be the reason.

Cognitive Computations org

Thank you for helping me!

Sign up or log in to comment