Inconsistent Response Format for Inference Chat Completion Endpoint

#56
by MrGeniusProgrammer - opened

Description:
The response format for the Mistral-7B-Instruct-v0.3 inference chat completion endpoint is inconsistent with the documentation. Older versions (v0.2 and v0.1) align with the expected format. The issue occurs regardless of whether the stream parameter is set to true or false.

Endpoint:
https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3/v1/chat/completions

cURL Command:

curl 'https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3/v1/chat/completions' \
-H "Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
-H 'Content-Type: application/json' \
-d '{
    "model": "mistralai/Mistral-7B-Instruct-v0.3",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 500,
    "stream": false
}'

Expected Response:

{
    "object": "chat.completion",
    // other fields
}

Actual Response:

{
    "object": "text_completion",
    // other fields
}

Note: Older versions (v0.2 and v0.1) correctly return "object": "chat.completion".

Please update the documentation or fix the response format in v0.3.

Mistral AI_ org

Hi @MrGeniusProgrammer , the documentation in https://docs.mistral.ai/api/ is about the official Mistral API, not the hugging face API that has their own documentation https://huggingface.co/docs/api-inference/index , also I experimented with v1 an v2, and both of them also sent a text_completion object, did I misunderstood the issue?

Hi @pandora-s ,

Thank you for your response. I am currently seeing the chat.completion object on my end for v1 and v2. Could you please provide a detailed guide on how you encountered the text_completion object, so I can replicate the issue in my environment?

I understand that the Hugging Face API and the official Mistral API are different, but I believe the response formats should be consistent. If there is a discrepancy, I would appreciate it if you could reference the specific documentation for the Mistral-7B-Instruct chat completion inference API.

Guide to Recreating the Behavior of v1 and v2:

curl 'https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2/v1/chat/completions' \
-H "Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
-H 'Content-Type: application/json' \
-d '{
    "model": "mistralai/Mistral-7B-Instruct-v0.2",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 50,
    "stream": false
}'

Response from My End:

{
  "object": "chat.completion",
  "id": "",
  "created": 1720524192,
  "model": "mistralai/Mistral-7B-Instruct-v0.2",
  "system_fingerprint": "2.1.1-dev0-sha-4327210",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " The capital city of France is Paris. Paris is one of the most famous cities in the world and is known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, Notre Dame Cathedral, and the Arc"
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 50,
    "total_tokens": 65
  }
}

Sign up or log in to comment