Instructions to use moonshotai/Kimi-K2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moonshotai/Kimi-K2.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.5", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2.5", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use moonshotai/Kimi-K2.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moonshotai/Kimi-K2.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/moonshotai/Kimi-K2.5

SGLang

How to use moonshotai/Kimi-K2.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moonshotai/Kimi-K2.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moonshotai/Kimi-K2.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use moonshotai/Kimi-K2.5 with Docker Model Runner:
```
docker model run hf.co/moonshotai/Kimi-K2.5
```

How to return the structured output

#107

by Cripple-Lee - opened Apr 5

Discussion

Cripple-Lee

Apr 5

llm = HuggingFaceEndpoint(
task='conversational',
repo_id="moonshotai/Kimi-K2.5",
temperature=0.5,
huggingfacehub_api_token=token,
provider="auto",
)

chat = ChatHuggingFace(llm=llm)

class FeatureSelectionResponse(BaseModel):
"""The response of feature selection"""
content: str = Field(description="The content field contains a detailed explanation of the feature selection process")
chosen_features: List[int] = Field(description="The chosen_features field is a list of indices representing the selected features")

agent = create_agent(
model=chat,
tools=[],
system_prompt="""
You are a professional data analyst.
The business problem is to predict the class of customers based on the customer data. This is a binary classification problem.
""",
response_format=ToolStrategy(FeatureSelectionResponse)
)

message = {
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64, {base64_numerical_feature_univariate_score}"},
},
{"type": "text", "text": "according to the picture, give me a reasonable number of features chosen. The result which is returned is an array of index of features chosen."}
]
}
response = agent.invoke(
{"messages": [message]}
)

The code as shown above, 400 Client Error occurred when I call the invoke function of agent.

Here is the detailed information of errors

HTTPError Traceback (most recent call last)
File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_http.py:403, in hf_raise_for_status(response, endpoint_name)
402 try:
--> 403 response.raise_for_status()
404 except HTTPError as e:

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/requests/models.py:1026, in Response.raise_for_status(self)
1025 if http_error_msg:
-> 1026 raise HTTPError(http_error_msg, response=self)

HTTPError: 400 Client Error: Bad Request for url: https://router.huggingface.co/novita/v3/openai/chat/completions

The above exception was the direct cause of the following exception:

BadRequestError Traceback (most recent call last)
Cell In[56], line 12
1 # From base64 data
2 message = {
3 "role": "user",
4 "content": [
(...) 10 ]
11 }
---> 12 response = agent.invoke(
13 {"messages": [message]}
14 )

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langgraph/pregel/main.py:3071, in Pregel.invoke(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, **kwargs)
3068 chunks: list[dict[str, Any] | Any] = []
3069 interrupts: list[Interrupt] = []
-> 3071 for chunk in self.stream(
3072 input,
3073 config,
3074 context=context,
3075 stream_mode=["updates", "values"]
3076 if stream_mode == "values"
3077 else stream_mode,
3078 print_mode=print_mode,
3079 output_keys=output_keys,
3080 interrupt_before=interrupt_before,
3081 interrupt_after=interrupt_after,
3082 durability=durability,
3083 **kwargs,
3084 ):
3085 if stream_mode == "values":
3086 if len(chunk) == 2:

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langgraph/pregel/main.py:2646, in Pregel.stream(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, subgraphs, debug, **kwargs)
2644 for task in loop.match_cached_writes():
2645 loop.output_writes(task.id, task.writes, cached=True)
-> 2646 for _ in runner.tick(
2647 [t for t in loop.tasks.values() if not t.writes],
2648 timeout=self.step_timeout,
2649 get_waiter=get_waiter,
2650 schedule_task=loop.accept_push,
2651 ):
2652 # emit output
2653 yield from _output(
2654 stream_mode, print_mode, subgraphs, stream.get, queue.Empty
2655 )
2656 loop.after_tick()

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langgraph/pregel/_runner.py:167, in PregelRunner.tick(self, tasks, reraise, timeout, retry_policy, get_waiter, schedule_task)
165 t = tasks[0]
166 try:
--> 167 run_with_retry(
168 t,
169 retry_policy,
170 configurable={
171 CONFIG_KEY_CALL: partial(
172 _call,
173 weakref.ref(t),
174 retry_policy=retry_policy,
175 futures=weakref.ref(futures),
176 schedule_task=schedule_task,
177 submit=self.submit,
178 ),
179 },
180 )
181 self.commit(t, None)
182 except Exception as exc:

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langgraph/pregel/_retry.py:42, in run_with_retry(task, retry_policy, configurable)
40 task.writes.clear()
41 # run the task
---> 42 return task.proc.invoke(task.input, config)
43 except ParentCommand as exc:
44 ns: str = config[CONF][CONFIG_KEY_CHECKPOINT_NS]

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langgraph/_internal/_runnable.py:656, in RunnableSeq.invoke(self, input, config, **kwargs)
654 # run in context
655 with set_config_context(config, run) as context:
--> 656 input = context.run(step.invoke, input, config, **kwargs)
657 else:
658 input = step.invoke(input, config)

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langgraph/_internal/_runnable.py:400, in RunnableCallable.invoke(self, input, config, **kwargs)
398 run_manager.on_chain_end(ret)
399 else:
--> 400 ret = self.func(*args, **kwargs)
401 if self.recurse and isinstance(ret, Runnable):
402 return ret.invoke(input, config)

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain/agents/factory.py:1261, in create_agent..model_node(state, runtime)
1249 request = ModelRequest(
1250 model=model,
1251 tools=default_tools,
(...) 1257 runtime=runtime,
1258 )
1260 if wrap_model_call_handler is None:
-> 1261 model_response = _execute_model_sync(request)
1262 return _build_commands(model_response)
1264 result = wrap_model_call_handler(request, _execute_model_sync)

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain/agents/factory.py:1233, in create_agent..execute_model_sync(request)
1230 if request.system_message:
1231 messages = [request.system_message, *messages]
-> 1233 output = model.invoke(messages)
1234 if name:
1235 output.name = name

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain_core/runnables/base.py:5695, in RunnableBindingBase.invoke(self, input, config, **kwargs)
5688 @override
5689 def invoke(
5690 self,
(...) 5693 **kwargs: Any | None,
5694 ) -> Output:
-> 5695 return self.bound.invoke(
5696 input,
5697 self._merge_configs(config),
5698 **{**self.kwargs, **kwargs},
5699 )

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:402, in BaseChatModel.invoke(self, input, config, stop, **kwargs)
388 @override
389 def invoke(
390 self,
(...) 395 **kwargs: Any,
396 ) -> AIMessage:
397 config = ensure_config(config)
398 return cast(
399 "AIMessage",
400 cast(
401 "ChatGeneration",
--> 402 self.generate_prompt(
403 [self._convert_input(input)],
404 stop=stop,
405 callbacks=config.get("callbacks"),
406 tags=config.get("tags"),
407 metadata=config.get("metadata"),
408 run_name=config.get("run_name"),
409 run_id=config.pop("run_id", None),
410 **kwargs,
411 ).generations[0][0],
412 ).message,
413 )

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:1123, in BaseChatModel.generate_prompt(self, prompts, stop, callbacks, **kwargs)
1114 @override
1115 def generate_prompt(
1116 self,
(...) 1120 **kwargs: Any,
1121 ) -> LLMResult:
1122 prompt_messages = [p.to_messages() for p in prompts]
-> 1123 return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:933, in BaseChatModel.generate(self, messages, stop, callbacks, tags, metadata, run_name, run_id, **kwargs)
930 for i, m in enumerate(input_messages):
931 try:
932 results.append(
--> 933 self._generate_with_cache(
934 m,
935 stop=stop,
936 run_manager=run_managers[i] if run_managers else None,
937 **kwargs,
938 )
939 )
940 except BaseException as e:
941 if run_managers:

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py:1235, in BaseChatModel._generate_with_cache(self, messages, stop, run_manager, **kwargs)
1233 result = generate_from_stream(iter(chunks))
1234 elif inspect.signature(self._generate).parameters.get("run_manager"):
-> 1235 result = self._generate(
1236 messages, stop=stop, run_manager=run_manager, **kwargs
1237 )
1238 else:
1239 result = self._generate(messages, stop=stop, **kwargs)

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/langchain_huggingface/chat_models/huggingface.py:750, in ChatHuggingFace._generate(self, messages, stop, run_manager, stream, **kwargs)
743 message_dicts, params = self._create_message_dicts(messages, stop)
744 params = {
745 "stop": stop,
746 **params,
747 **({"stream": stream} if stream is not None else {}),
748 **kwargs,
749 }
--> 750 answer = self.llm.client.chat_completion(messages=message_dicts, **params)
751 return self._create_chat_result(answer)
752 llm_input = self._to_chat_prompt(messages)

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:915, in InferenceClient.chat_completion(self, messages, model, stream, frequency_penalty, logit_bias, logprobs, max_tokens, n, presence_penalty, response_format, seed, stop, stream_options, temperature, tool_choice, tool_prompt, tools, top_logprobs, top_p, extra_body)
887 parameters = {
888 "model": payload_model,
889 "frequency_penalty": frequency_penalty,
(...) 906 **(extra_body or {}),
907 }
908 request_parameters = provider_helper.prepare_request(
909 inputs=messages,
910 parameters=parameters,
(...) 913 api_key=self.token,
914 )
--> 915 data = self._inner_post(request_parameters, stream=stream)
917 if stream:
918 return _stream_chat_completion_response(data) # type: ignore[arg-type]

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/huggingface_hub/inference/_client.py:275, in InferenceClient._inner_post(self, request_parameters, stream)
272 raise InferenceTimeoutError(f"Inference call timed out: {request_parameters.url}") from error # type: ignore
274 try:
--> 275 hf_raise_for_status(response)
276 return response.iter_lines() if stream else response.content
277 except HTTPError as error:

File ~/projects/auto-train-agent/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_http.py:459, in hf_raise_for_status(response, endpoint_name)
455 elif response.status_code == 400:
456 message = (
457 f"\n\nBad request for {endpoint_name} endpoint:" if endpoint_name is not None else "\n\nBad request:"
458 )
--> 459 raise _format(BadRequestError, message, response) from e
461 elif response.status_code == 403:
462 message = (
463 f"\n\n{response.status_code} Forbidden: {error_message}."
464 + f"\nCannot access content at: {response.url}."
465 + "\nMake sure your token has the correct permissions."
466 )

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment