Instructions to use moonshotai/Kimi-K2.7-Code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moonshotai/Kimi-K2.7-Code with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="moonshotai/Kimi-K2.7-Code", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("moonshotai/Kimi-K2.7-Code", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use moonshotai/Kimi-K2.7-Code with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moonshotai/Kimi-K2.7-Code"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.7-Code",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/moonshotai/Kimi-K2.7-Code

SGLang

How to use moonshotai/Kimi-K2.7-Code with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moonshotai/Kimi-K2.7-Code" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.7-Code",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moonshotai/Kimi-K2.7-Code" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moonshotai/Kimi-K2.7-Code",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use moonshotai/Kimi-K2.7-Code with Docker Model Runner:
```
docker model run hf.co/moonshotai/Kimi-K2.7-Code
```

Consumer sized versions? (26B A3B Versions)

by CYISNOTHERE - opened 3 days ago

Discussion

CYISNOTHERE

3 days ago

I was wondering if it would be possible for moonshootai to create consumer GPU versions of the kimi model? This model is so good, it is sad that we don't get smaller versions for consumers to be able to run themselves. Right now this class of model seems to be only usable if you have enterprise funding or are wealthy with disposable income.

I know this isn't the typical pattern of releases you guys do, but it would be really appreciated. 🙏

aviallon

2 days ago

Even a version a quarter of the size would at least allow running it on small enterprises setups. I don't have dozens of H200 lying arround, even in our small DC.

CYISNOTHERE

2 days ago

Even a version a quarter of the size would at least allow running it on small enterprises setups. I don't have dozens of H200 lying arround, even in our small DC.

If there's a will, there's a way

daniel-dona

1 day ago

I was wondering if it would be possible for moonshootai to create consumer GPU versions of the kimi model? This model is so good, it is sad that we don't get smaller versions for consumers to be able to run themselves. Right now this class of model seems to be only usable if you have enterprise funding or are wealthy with disposable income.

I know this isn't the typical pattern of releases you guys do, but it would be really appreciated. 🙏

You already have Qwen 3.6 models.

aviallon

1 day ago

@daniel-dona to be fair, no 135b or 397b sized Qwen models were released last time, which I'd love to have. This is kind of a forgotten size right now. You either have 40b-ish models (and lower), or 750b and above.

The middle is empty.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment