Instructions to use CosineAI/lumen-outpost-2026-04-27 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CosineAI/lumen-outpost-2026-04-27 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="CosineAI/lumen-outpost-2026-04-27", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("CosineAI/lumen-outpost-2026-04-27", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CosineAI/lumen-outpost-2026-04-27 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CosineAI/lumen-outpost-2026-04-27" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CosineAI/lumen-outpost-2026-04-27", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/CosineAI/lumen-outpost-2026-04-27
- SGLang
How to use CosineAI/lumen-outpost-2026-04-27 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CosineAI/lumen-outpost-2026-04-27" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CosineAI/lumen-outpost-2026-04-27", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CosineAI/lumen-outpost-2026-04-27" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CosineAI/lumen-outpost-2026-04-27", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use CosineAI/lumen-outpost-2026-04-27 with Docker Model Runner:
docker model run hf.co/CosineAI/lumen-outpost-2026-04-27
Lumen Outpost
Lumen Outpost is a fine-tuned variant of Kimi-K2.6 produced by Cosine AI. It is trained on Cosine's proprietary dataset using custom-built graders designed to improve output quality across several dimensions:
- Stronger capability on niche and low-resource languages. Fine-tuning on targeted multilingual data improves fluency and correctness in languages that are underserved by the base model's pretraining distribution.
- Reduced slop. Trained against code quality graders that penalize dead code, duplication, unnecessary abstractions, and noisy comments. The model produces cleaner patches with less residual noise.
- Better conversational feel. Trained against conversational quality graders that reward concise and substantive updates, professional tone, and alignment between what the model says and what it does.
The base model is Moonshot AI's Kimi-K2.6, a 1T-parameter native multimodal MoE model with 32B active parameters per token. For full details on the base architecture, capabilities, and benchmarks, see the Kimi-K2.6 model card.
Model Details
| Field | Value |
|---|---|
| Base model | Kimi-K2.6 |
| Architecture | Mixture-of-Experts (MoE) โ same architecture family as upstream Kimi-K2.6 |
| Total parameters | 1T |
| Active parameters | 32B per token |
| Layers | 61 (including 1 dense layer) |
| Experts | 384 routed + 1 shared, 8 selected per token |
| Context length | 256K tokens |
| Weight format | BF16 + INT4 packed MoE experts |
| Size on disk | ~595 GB |
| Tokenizer | TikToken-based, 163,840 vocab |
| Vision encoder | MoonViT (400M params) |
BF16 is used for attention and shared MLP weights. Routed MoE experts are stored as packed INT4 tensors. This checkpoint merges the lumen-outpost-2026-04-27 LoRA into Kimi-K2.6, including re-AWQ'd routed expert INT4 LoRA deltas packed back into Kimi-compatible INT4 tensors.
Serving
Use multi-GPU tensor parallelism.
vllm serve CosineAI/lumen-outpost-2026-04-27 \
--served-model-name lumen-outpost \
--api-key "$VLLM_API_KEY" \
--trust-remote-code \
--tensor-parallel-size 8 \
--mm-encoder-tp-mode data \
--enable-auto-tool-choice \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k2 \
--gpu-memory-utilization 0.95 \
--dtype bfloat16
For additional deployment options (SGLang, KTransformers), refer to the base model deployment guide.
The chat template is included (chat_template.jinja) and supports both thinking and instant modes โ same usage as base model. See the base model README for API usage examples including chat completion, vision input, tool calling, thinking mode toggles, and preserve-thinking mode.
Requirements
Software:
- Python >= 3.10
transformers >= 4.57.1, <5.0.0(same requirement as base model)tiktokenโ required by the custom tokenizer (tokenization_kimi.py)tokenizersโ required by tiktoken tokenizer internalsnumpy,Pillow,pydanticโ required by vision processing codeflash-attn >= 2.1โ optional but strongly recommended for attention performance. Without it, the model falls back to eager attention (functional but slow).mecordโ optional, only needed for video input processing. Image-only and text-only usage does not require it.vllm,sglang, orktransformersโ for serving. Directtransformersgeneration is possible but not practical at this model size.
Hardware (minimum for inference):
- See the Kimi-K2.6 deployment guide
License
Please refer to the LICENSE.md file in this repository.
Acknowledgements
Lumen Outpost is built on Moonshot AI's Kimi-K2.6 base model. Credit to Moonshot AI for the Kimi-K2.6 architecture, training, and release. See the Kimi-K2.6 model card and technical blog for details. The underlying DeepSeek-V3 architecture is credited to DeepSeek.
- Downloads last month
- 1,803
Model tree for CosineAI/lumen-outpost-2026-04-27
Base model
moonshotai/Kimi-K2.6