north-code-43b-a5b-coder

north-code-43b-a5b-coder is a code and agentic software-engineering model built from CohereLabs/North-Mini-Code-1.0.

The model keeps the North/Cohere chat format, interleaved reasoning behavior, tool-calling support, and long-context coding workflow focus.

Model Details

Architecture: decoder-only sparse Mixture-of-Experts model
Base model: CohereLabs/North-Mini-Code-1.0
Parameters: approximately 43B total
Active parameters: approximately 4.7B at the default top-14 expert setting
Experts: 184 total, 14 active per token
Quantization: MXFP4 MoE expert weights
Recommended runtime: vLLM with Cohere Command 4 parsing

Recommended vLLM Command

VLLM_USE_FLASHINFER_SAMPLER=0 \
VLLM_USE_FLASHINFER_MOE_FP16=0 \
vllm serve LLMWildling/north-code-43b-a5b-coder \
  --served-model-name vllm/doobee \
  --host 0.0.0.0 \
  --port 23333 \
  --dtype bfloat16 \
  --tensor-parallel-size 1 \
  --max-model-len 200000 \
  --gpu-memory-utilization 0.96 \
  --trust-remote-code \
  --tool-call-parser cohere_command4 \
  --reasoning-parser cohere_command4 \
  --enable-auto-tool-choice \
  --max-num-seqs 1 \
  --max-num-batched-tokens 8192 \
  --moe-backend auto

Use a vLLM build with Cohere2 MoE MXFP4 support. For tool and reasoning parsing, install the Cohere melody parser package if your vLLM build requires it.

Chat Format

Use the bundled tokenizer chat template. The model supports interleaved reasoning and tool-use workflows. For best multi-turn agentic behavior, preserve assistant reasoning and tool-call outputs in conversation history.

Intended Use

This model is intended for coding, software-engineering assistance, terminal workflow automation, and agentic tool-use experiments.

License

This model is released under the Apache 2.0 license, following the base model.

Downloads last month: 8

Safetensors

Model size

55B params

Tensor type

BF16

Model tree for LLMWildling/North-Mini-Code-1.0-43B-a5b

Base model

CohereLabs/North-Mini-Code-1.0

Quantized

(26)

this model