Instructions to use CosineAI/lumen-outpost-2026-04-27 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CosineAI/lumen-outpost-2026-04-27 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="CosineAI/lumen-outpost-2026-04-27", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("CosineAI/lumen-outpost-2026-04-27", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CosineAI/lumen-outpost-2026-04-27 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CosineAI/lumen-outpost-2026-04-27"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CosineAI/lumen-outpost-2026-04-27",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/CosineAI/lumen-outpost-2026-04-27

SGLang

How to use CosineAI/lumen-outpost-2026-04-27 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CosineAI/lumen-outpost-2026-04-27" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CosineAI/lumen-outpost-2026-04-27",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CosineAI/lumen-outpost-2026-04-27" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CosineAI/lumen-outpost-2026-04-27",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use CosineAI/lumen-outpost-2026-04-27 with Docker Model Runner:
```
docker model run hf.co/CosineAI/lumen-outpost-2026-04-27
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Lumen Outpost

Lumen Outpost is a fine-tuned variant of Kimi-K2.6 produced by Cosine AI. It is trained on Cosine's proprietary dataset using custom-built graders designed to improve output quality across several dimensions:

Stronger capability on niche and low-resource languages. Fine-tuning on targeted multilingual data improves fluency and correctness in languages that are underserved by the base model's pretraining distribution.
Reduced slop. Trained against code quality graders that penalize dead code, duplication, unnecessary abstractions, and noisy comments. The model produces cleaner patches with less residual noise.
Better conversational feel. Trained against conversational quality graders that reward concise and substantive updates, professional tone, and alignment between what the model says and what it does.

The base model is Moonshot AI's Kimi-K2.6, a 1T-parameter native multimodal MoE model with 32B active parameters per token. For full details on the base architecture, capabilities, and benchmarks, see the Kimi-K2.6 model card.

Model Details

Field	Value
Base model	Kimi-K2.6
Architecture	Mixture-of-Experts (MoE) — same architecture family as upstream Kimi-K2.6
Total parameters	1T
Active parameters	32B per token
Layers	61 (including 1 dense layer)
Experts	384 routed + 1 shared, 8 selected per token
Context length	256K tokens
Weight format	BF16 + INT4 packed MoE experts
Size on disk	~595 GB
Tokenizer	TikToken-based, 163,840 vocab
Vision encoder	MoonViT (400M params)

BF16 is used for attention and shared MLP weights. Routed MoE experts are stored as packed INT4 tensors. This checkpoint merges the lumen-outpost-2026-04-27 LoRA into Kimi-K2.6, including re-AWQ'd routed expert INT4 LoRA deltas packed back into Kimi-compatible INT4 tensors.

Serving

Use multi-GPU tensor parallelism.

vllm serve CosineAI/lumen-outpost-2026-04-27 \
  --served-model-name lumen-outpost \
  --api-key "$VLLM_API_KEY" \
  --trust-remote-code \
  --tensor-parallel-size 8 \
  --mm-encoder-tp-mode data \
  --enable-auto-tool-choice \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --gpu-memory-utilization 0.95 \
  --dtype bfloat16

For additional deployment options (SGLang, KTransformers), refer to the base model deployment guide.

The chat template is included (chat_template.jinja) and supports both thinking and instant modes — same usage as base model. See the base model README for API usage examples including chat completion, vision input, tool calling, thinking mode toggles, and preserve-thinking mode.

Requirements

Software:

Python >= 3.10
transformers >= 4.57.1, <5.0.0 (same requirement as base model)
tiktoken — required by the custom tokenizer (tokenization_kimi.py)
tokenizers — required by tiktoken tokenizer internals
numpy, Pillow, pydantic — required by vision processing code
flash-attn >= 2.1 — optional but strongly recommended for attention performance. Without it, the model falls back to eager attention (functional but slow).
mecord — optional, only needed for video input processing. Image-only and text-only usage does not require it.
vllm, sglang, or ktransformers — for serving. Direct transformers generation is possible but not practical at this model size.

Hardware (minimum for inference):

See the Kimi-K2.6 deployment guide

License

Please refer to the LICENSE.md file in this repository.

Acknowledgements

Lumen Outpost is built on Moonshot AI's Kimi-K2.6 base model. Credit to Moonshot AI for the Kimi-K2.6 architecture, training, and release. See the Kimi-K2.6 model card and technical blog for details. The underlying DeepSeek-V3 architecture is credited to DeepSeek.

Downloads last month: 1,803

Safetensors

Model size

1.1T params

Tensor type

I32

BF16

F32

Model tree for CosineAI/lumen-outpost-2026-04-27

Base model

moonshotai/Kimi-K2.6

Finetuned

(12)

this model