Kimi K2.7-Code — GGUF (coding agent MoE)

Community GGUF mirror of moonshotai/Kimi-K2.7-Code for llama.cpp-compatible runtimes on server-grade hardware.

Released June 12, 2026 by Moonshot AI. Coding-focused agent built on Kimi K2.6 with +21.8% on Kimi Code Bench v2.

Architecture 1T MoE (32B active), DeepSeek2 / MLA
Context 256K tokens (262144 in GGUF)
Modalities Text, image, video (API-first; vision via mmproj in GGUF)
License Modified MIT
Thinking Forced preserve_thinking — reasoning retained across turns

Important: server-class model

This is not a consumer-laptop model. Even the smallest GGUF quants are hundreds of GB. Plan for:

  • Multi-GPU or high-RAM server (512 GB+ system RAM typical for Q4-class quants)
  • Fast NVMe scratch space
  • Latest llama.cpp with DeepSeek2 / Kimi K2.5+ support

See docs/kimi-k27-code-analysis.md for full analysis.

Why this repo exists

  • One download hub for unsloth UD quants (Q2–Q8, IQ variants) + mmproj.
  • Hub-side sync from unsloth/Kimi-K2.7-Code-GGUF — no re-upload from your laptop.
  • Maintainer script: scripts/sync_kimi_k27_code_gguf_quants.py

Available files

See gguf-manifest.json for the live file list.

Essential tier (recommended start)

Path Use
UD-Q4_K_XL/ (14 shards) Recommended — maps to Kimi native int4 quality
mmproj-F16.gguf Vision encoder weights for llama.cpp multimodal
config.json Model metadata

Full tier

All unsloth UD quants (UD-IQ1_M, UD-IQ3_XXS, UD-IQ4_XS, UD-Q2_K_XL, UD-Q3_K_XL, UD-Q4_K_XL, UD-Q8_K_XL) + mmproj BF16/F16/F32 — run make sync-kimi-k27-gguf-full.

Download

pip install -U huggingface_hub

# Essential: Q4 XL + vision mmproj (hundreds of GB)
huggingface-cli download Edmon02/Kimi-K2.7-Code-GGUF \
  config.json mmproj-F16.gguf \
  --include "UD-Q4_K_XL/*" \
  --local-dir ./models/kimi-k27-code

Quick start (llama.cpp)

Requires a recent llama.cpp build with Kimi K2.5 / DeepSeek2 MoE support.

# Text + tools (thinking mode — match Moonshot API defaults)
llama-server -m ./models/kimi-k27-code/UD-Q4_K_XL \
  --mmproj ./models/kimi-k27-code/mmproj-F16.gguf \
  --ctx-size 32768 \
  --temp 1.0 --top-p 0.95

Moonshot recommends temperature=1.0, top_p=0.95, and thinking enabled. Instant mode is not supported.

Benchmark highlights (Moonshot-reported)

Benchmark K2.6 K2.7-Code Δ vs K2.6
Kimi Code Bench v2 50.9 62.0 +21.8%
Program Bench 48.3 53.6 +11.0%
MLS Bench Lite 26.7 35.1 +31.5%
MCP Atlas 69.4 76.0 +9.5%
MCP Mark Verified 72.8 81.1 +11.4%

Deployment alternatives

Path When
Kimi API (kimi-k2.7-code) Production agents, Kimi Code CLI
vLLM / SGLang / KTransformers Self-host from safetensors
GGUF + llama.cpp Offline / custom infra with enough RAM

API pricing (Moonshot): ~$0.95 / $4.00 per 1M tokens in/out.

Provenance

Item Source
Base model moonshotai/Kimi-K2.7-Code
GGUF quants Mirrored from unsloth/Kimi-K2.7-Code-GGUF
Maintainer Edmon02/audio_set

Limitations

  • Sharded GGUF folders — download entire quant prefix, not individual shards only.
  • Video input in GGUF may lag official API support.
  • Vendor-run benchmarks; validate on your coding/agent workloads.
  • GGUF community quants — compare against native int4 safetensors when possible.

Citation

@misc{kimi_k27_code_2026,
  title={Kimi K2.7-Code},
  author={Moonshot AI},
  year={2026},
  url={https://huggingface.co/moonshotai/Kimi-K2.7-Code}
}
Downloads last month
2,864
GGUF
Model size
1T params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Edmon02/Kimi-K2.7-Code-GGUF

Quantized
(15)
this model