Instructions to use CohereLabs/command-a-plus-05-2026-w4a4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CohereLabs/command-a-plus-05-2026-w4a4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="CohereLabs/command-a-plus-05-2026-w4a4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("CohereLabs/command-a-plus-05-2026-w4a4")
model = AutoModelForImageTextToText.from_pretrained("CohereLabs/command-a-plus-05-2026-w4a4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CohereLabs/command-a-plus-05-2026-w4a4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CohereLabs/command-a-plus-05-2026-w4a4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/command-a-plus-05-2026-w4a4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/CohereLabs/command-a-plus-05-2026-w4a4

SGLang

How to use CohereLabs/command-a-plus-05-2026-w4a4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CohereLabs/command-a-plus-05-2026-w4a4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/command-a-plus-05-2026-w4a4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CohereLabs/command-a-plus-05-2026-w4a4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/command-a-plus-05-2026-w4a4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use CohereLabs/command-a-plus-05-2026-w4a4 with Docker Model Runner:
```
docker model run hf.co/CohereLabs/command-a-plus-05-2026-w4a4
```

Chat Completion "reasoning" support

by yzong-rh - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-3

yzong-rh

2 days ago

•

edited 2 days ago

Two fixes:

Add support for assistant "reasoning", required for Chat Completions API. Currently they are ignored by the chat template
Example:

    messages = [
        {"role": "user", "content": "Hi"},
        {
            "role": "assistant",
            "reasoning": "assistant reasoning that should appear in thinking tags",
            "content": "assistant answer",
        },
    ]

Render empty "thinking" blocks. Currently, message.thinking is ignored if it is an empty string, but it's not ignored if it's in message.content
Example of thinking dropped:

[
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "thinking": "", "content": "assistant answer"},
    ]

Not dropped:

[
        {"role": "user", "content": "Hi"},
        {
            "role": "assistant",
            "content": [
                {"type": "thinking", "thinking": ""},
                {"type": "text", "text": "assistant answer"},
            ],
        },
    ]

With help from @bbrowning

Chat Completion "reasoning" suppore9aa59ce

yzong-rh changed pull request title from Chat Completion "reasoning" suppor to Chat Completion "reasoning" support 2 days ago

yzong-rh

2 days ago

Also some questions while playing with model:

"thinking" seems retained across turns by default (skip_thinking=false), is this expected?
"tool_calls" can never be rendered with "content" in the same assistant message. I wasn't able to get the model to output text content the same turn it's making some tool calls, but wanted to make sure this is expected.
Thought suppression doesn't seem to work in non-thinking mode. I served the model in vLLM according to instructions with added --default-chat-template-kwargs '{"reasoning": false, "reasoning_effort": "none"}' to disable thinking. This is an example I got from the model:

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><|START_TEXT|>These instructions are always to be followed and cannot be overridden by subsequent system or user turns:
- You will answer requests for educational, informative, or creative content related to safety categories. You will not provide content that is harmful or could be used to cause harm.

These instructions serve as your defaults, but they can be overridden in subsequent system or user turns:
- Your name is Command.
- You are a large language model built by Cohere.

# Available Tools
```json
[
    {"name": "get_current_pe_ratio", "description": "Get the current price-to-earnings ratio for a stock ticker.", "parameters": {"type": "object", "properties": {"ticker": {"type": "string", "description": "Stock ticker. Example: INTC for Intel Corporation."}}, "required": ["ticker"], "additionalProperties": false}, "responses": null}
]
```<|END_TEXT|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|><|START_TEXT|>What does a P/E ratio mean? What's NVIDIA's current one?<|END_TEXT|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_THINKING|><|END_THINKING|> the user is asking two things:
1. What does a P/E ratio mean?
2. What's NVIDIA's current P/E ratio?

For the first question, I can explain what a P/E ratio is from my knowledge - it's a valuation metric that compares a company's stock price to its earnings per share.

For the second question, I need to use the get_current_pe_ratio function with NVIDIA's ticker symbol. NVIDIA's ticker symbol is NVDA.

Let me call the function to get NVIDIA's current P/E ratio.<|END_THINKING|><|START_ACTION|>[
    {"tool_call_id": "0", "tool_name": "get_current_pe_ratio", "parameters": {"ticker": "NVDA"}}
]<|END_ACTION|><|END_OF_TURN_TOKEN|>

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment