Instructions to use Edmon02/Kimi-K2.7-Code-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Edmon02/Kimi-K2.7-Code-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Edmon02/Kimi-K2.7-Code-GGUF",
	filename="UD-Q4_K_XL/Kimi-K2.7-Code-UD-Q4_K_XL-00001-of-00014.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Edmon02/Kimi-K2.7-Code-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama-cli -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Use Docker

docker model run hf.co/Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

LM Studio
Jan

vLLM

How to use Edmon02/Kimi-K2.7-Code-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Edmon02/Kimi-K2.7-Code-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Edmon02/Kimi-K2.7-Code-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Ollama
How to use Edmon02/Kimi-K2.7-Code-GGUF with Ollama:
```
ollama run hf.co/Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL
```

Unsloth Studio

How to use Edmon02/Kimi-K2.7-Code-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Edmon02/Kimi-K2.7-Code-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Edmon02/Kimi-K2.7-Code-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Edmon02/Kimi-K2.7-Code-GGUF to start chatting

How to use Edmon02/Kimi-K2.7-Code-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Edmon02/Kimi-K2.7-Code-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Edmon02/Kimi-K2.7-Code-GGUF with Docker Model Runner:
```
docker model run hf.co/Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL
```

Lemonade

How to use Edmon02/Kimi-K2.7-Code-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Edmon02/Kimi-K2.7-Code-GGUF:UD-Q4_K_XL

Run and chat with the model

lemonade run user.Kimi-K2.7-Code-GGUF-UD-Q4_K_XL

List all available models

lemonade list

Kimi K2.7-Code — GGUF (coding agent MoE)

Community GGUF mirror of moonshotai/Kimi-K2.7-Code for llama.cpp-compatible runtimes on server-grade hardware.

Released June 12, 2026 by Moonshot AI. Coding-focused agent built on Kimi K2.6 with +21.8% on Kimi Code Bench v2.


Architecture	1T MoE (32B active), DeepSeek2 / MLA
Context	256K tokens (262144 in GGUF)
Modalities	Text, image, video (API-first; vision via mmproj in GGUF)
License	Modified MIT
Thinking	Forced `preserve_thinking` — reasoning retained across turns

Important: server-class model

This is not a consumer-laptop model. Even the smallest GGUF quants are hundreds of GB. Plan for:

Multi-GPU or high-RAM server (512 GB+ system RAM typical for Q4-class quants)
Fast NVMe scratch space
Latest llama.cpp with DeepSeek2 / Kimi K2.5+ support

See docs/kimi-k27-code-analysis.md for full analysis.

Why this repo exists

One download hub for unsloth UD quants (Q2–Q8, IQ variants) + mmproj.
Hub-side sync from unsloth/Kimi-K2.7-Code-GGUF — no re-upload from your laptop.
Maintainer script: scripts/sync_kimi_k27_code_gguf_quants.py

Available files

See gguf-manifest.json for the live file list.

Essential tier (recommended start)

Path	Use
`UD-Q4_K_XL/` (14 shards)	Recommended — maps to Kimi native int4 quality
`mmproj-F16.gguf`	Vision encoder weights for llama.cpp multimodal
`config.json`	Model metadata

Full tier

All unsloth UD quants (UD-IQ1_M, UD-IQ3_XXS, UD-IQ4_XS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q8_K_XL) + mmproj BF16/F16/F32 — run make sync-kimi-k27-gguf-full.

Download

pip install -U huggingface_hub

# Essential: Q4 XL + vision mmproj (hundreds of GB)
huggingface-cli download Edmon02/Kimi-K2.7-Code-GGUF \
  config.json mmproj-F16.gguf \
  --include "UD-Q4_K_XL/*" \
  --local-dir ./models/kimi-k27-code

Quick start (llama.cpp)

Requires a recent llama.cpp build with Kimi K2.5 / DeepSeek2 MoE support.

# Text + tools (thinking mode — match Moonshot API defaults)
llama-server -m ./models/kimi-k27-code/UD-Q4_K_XL \
  --mmproj ./models/kimi-k27-code/mmproj-F16.gguf \
  --ctx-size 32768 \
  --temp 1.0 --top-p 0.95

Moonshot recommends temperature=1.0, top_p=0.95, and thinking enabled. Instant mode is not supported.

Benchmark highlights (Moonshot-reported)

Benchmark	K2.6	K2.7-Code	Δ vs K2.6
Kimi Code Bench v2	50.9	62.0	+21.8%
Program Bench	48.3	53.6	+11.0%
MLS Bench Lite	26.7	35.1	+31.5%
MCP Atlas	69.4	76.0	+9.5%
MCP Mark Verified	72.8	81.1	+11.4%

Deployment alternatives

Path	When
Kimi API (`kimi-k2.7-code`)	Production agents, Kimi Code CLI
vLLM / SGLang / KTransformers	Self-host from safetensors
GGUF + llama.cpp	Offline / custom infra with enough RAM

API pricing (Moonshot): ~$0.95 / $4.00 per 1M tokens in/out.

Provenance

Item	Source
Base model	`moonshotai/Kimi-K2.7-Code`
GGUF quants	Mirrored from `unsloth/Kimi-K2.7-Code-GGUF`
Maintainer	Edmon02/audio_set

Limitations

Sharded GGUF folders — download entire quant prefix, not individual shards only.
Video input in GGUF may lag official API support.
Vendor-run benchmarks; validate on your coding/agent workloads.
GGUF community quants — compare against native int4 safetensors when possible.

Citation

@misc{kimi_k27_code_2026,
  title={Kimi K2.7-Code},
  author={Moonshot AI},
  year={2026},
  url={https://huggingface.co/moonshotai/Kimi-K2.7-Code}
}

Downloads last month: 2,864

GGUF

Model size

1T params

Architecture

deepseek2

Hardware compatibility

4-bit

Model tree for Edmon02/Kimi-K2.7-Code-GGUF

Base model

moonshotai/Kimi-K2.7-Code

Quantized

(15)

this model