north-code-43b-a5b-coder
north-code-43b-a5b-coder is a code and agentic software-engineering model
built from CohereLabs/North-Mini-Code-1.0.
The model keeps the North/Cohere chat format, interleaved reasoning behavior, tool-calling support, and long-context coding workflow focus.
Model Details
- Architecture: decoder-only sparse Mixture-of-Experts model
- Base model:
CohereLabs/North-Mini-Code-1.0 - Parameters: approximately 43B total
- Active parameters: approximately 4.7B at the default top-14 expert setting
- Experts: 184 total, 14 active per token
- Quantization: MXFP4 MoE expert weights
- Recommended runtime: vLLM with Cohere Command 4 parsing
Recommended vLLM Command
VLLM_USE_FLASHINFER_SAMPLER=0 \
VLLM_USE_FLASHINFER_MOE_FP16=0 \
vllm serve LLMWildling/north-code-43b-a5b-coder \
--served-model-name vllm/doobee \
--host 0.0.0.0 \
--port 23333 \
--dtype bfloat16 \
--tensor-parallel-size 1 \
--max-model-len 200000 \
--gpu-memory-utilization 0.96 \
--trust-remote-code \
--tool-call-parser cohere_command4 \
--reasoning-parser cohere_command4 \
--enable-auto-tool-choice \
--max-num-seqs 1 \
--max-num-batched-tokens 8192 \
--moe-backend auto
Use a vLLM build with Cohere2 MoE MXFP4 support. For tool and reasoning parsing, install the Cohere melody parser package if your vLLM build requires it.
Chat Format
Use the bundled tokenizer chat template. The model supports interleaved reasoning and tool-use workflows. For best multi-turn agentic behavior, preserve assistant reasoning and tool-call outputs in conversation history.
Intended Use
This model is intended for coding, software-engineering assistance, terminal workflow automation, and agentic tool-use experiments.
License
This model is released under the Apache 2.0 license, following the base model.
- Downloads last month
- 8
Model tree for LLMWildling/North-Mini-Code-1.0-43B-a5b
Base model
CohereLabs/North-Mini-Code-1.0