CohereForAI/c4ai-command-r-v01 · Update Readme with tool-use/RAG prompting guide

patrick-s-h-lewis

Mar 18

Update Readme with tool-use/RAG prompting guide

Update Readme with tool-use/RAG prompting guidee1f09213

patrick-s-h-lewis

Mar 18

•

edited Mar 18

If/when merged, I will open a PR to copy over the changes to the quantized model card

Cyleux

Mar 18

Thank you so very much!

Bullish on Cohere

sarahooker

Cohere For AI org Mar 18

This is great @patrick-s-h-lewis ! One suggestion, right now documentation talks through how to use the tokenizer https://docs.cohere.com/docs/prompting-command-r (screenshot attached) but only refers to it as hugging face tokenizer. Now, that our PR has been merged, our code is more accessible in transformer library -- might suggest you explicitly link to the relevant code section https://github.com/huggingface/transformers/blob/838b87abe231fd70be5132088d0dee72a7bb8d62/src/transformers/models/cohere/tokenization_cohere_fast.py#L420.

Merging now.

sarahooker changed pull request status to merged Mar 18

Cyleux

Mar 19

According to the prompt structure described in the instructions, the tool outputs are not included directly in the chat history itself, but rather in a separate dedicated section.

Specifically, the tool outputs are inserted into the {TOOL_OUTPUTS} placeholder, which comes after the {CHAT_HISTORY} section in the augmented generation prompt template:

augmented_gen_prompt_template = """
...
<|END_OF_TURN_TOKEN|> {CHAT_HISTORY} <|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|> {TOOL_OUTPUTS}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|> {INSTRUCTIONS}<|END_OF_TURN_TOKEN|>"""

The {TOOL_OUTPUTS} section contains the results from all the tool calls, with each result prefixed by a Document: {n} identifier.

For example:

<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>

Document: 0
[Tool 1 output]
Document: 1
[Tool 2 output]

<|END_OF_TURN_TOKEN|>

So even if multiple tools are used at different points in the conversation, their outputs are aggregated together in this single {TOOL_OUTPUTS} section, separate from the actual back-and-forth utterances captured in {CHAT_HISTORY}. This allows the model to reference the tool results when formulating its response, without muddling the conversation history.

Is this accurate? Thank you