File size: 3,698 Bytes
2a8cc36
 
c5ea3bc
 
 
2a8cc36
f11a47c
14f8577
c5ea3bc
7b31714
88fb0e5
d5940ef
1e581a8
cc91f97
380f0f5
57479e4
3191e21
7b31714
c5ea3bc
 
 
 
 
4d61730
efe1ee6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d61730
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: cc-by-nc-4.0
pipeline_tag: text-generation
library_name: gguf
base_model: CohereForAI/c4ai-command-r-plus
---
**2024-04-07**: Support for this model is still being worked on - [`PR #6491`](https://github.com/ggerganov/llama.cpp/pull/6491).  
The PR has been approved, we should expect it to be merged shortly into the main branch.

* GGUF importance matrix (imatrix) quants for https://huggingface.co/CohereForAI/c4ai-command-r-plus
* The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using [wiki.train.raw](https://huggingface.co/datasets/wikitext).
* [Which GGUF is right for me? (from Artefact2)](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9) - X axis is file size and Y axis is perplexity (lower perplexity is better quality). Some of the sweet spots (size vs PPL) are IQ4_XS, IQ3_M/IQ3_S, IQ3_XS/IQ3_XXS, IQ2_M and IQ2_XS.
* The [imatrix is being used on the K-quants](https://github.com/ggerganov/llama.cpp/pull/4930) as well (only for < Q6_K).
* You can merge GGUFs with `gguf-split --merge <first-chunk> <output-file>` although this is not required since [f482bb2e](https://github.com/ggerganov/llama.cpp/commit/f482bb2e4920e544651fb832f2e0bcb4d2ff69ab).
* What is importance matrix (imatrix)? You can [read more about it from the author here](https://github.com/ggerganov/llama.cpp/pull/4861). Some other info [here](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF/discussions/2#6612840b8377af8668066682).
* How do I use imatrix quants? Just like any other GGUF, the `.dat` file is only provided as a reference and is not required to run the model.
* If your last resort is to use an IQ1 quant then go for IQ1_M.

> C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ is a multilingual model evaluated in 10 languages for performance: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering.

| Layers | Context | [Template](https://huggingface.co/CohereForAI/c4ai-command-r-plus#tool-use--multihop-capabilities) |
| --- | --- | --- |
| <pre>64</pre> | <pre>131072</pre> | <pre>\<BOS_TOKEN\>\<\|START_OF_TURN_TOKEN\|\>\<\|USER_TOKEN\|\>{prompt}\<\|END_OF_TURN_TOKEN\|\>\<\|START_OF_TURN_TOKEN\|\>\<\|CHATBOT_TOKEN\|\>{response}</pre> |

| Quant | Size (GB) |
| --- | --- |
| IQ1\_S | 23.2 |
| IQ1\_M | 25.2 |
| IQ2\_XXS | 28.6 |
| IQ2\_XS | 31.6 |
| IQ2\_S | 33.3 |
| IQ2\_M | 36.0 |
| IQ3\_XXS | 40.7 |
| IQ3\_XS | 43.6 |
| IQ3\_S | 46.0 |
| IQ3\_M | 47.7 |
| IQ4\_XS | 56.3 |
| Q5\_K\_S | 71.8 |
| Q6\_K | 85.1 |
| Q8\_0 | 110.3 |
| FP16 | 207.8 |

This model is actually quite fun to chat with, after crafting a rather bold system prompt I asked to write a sentence ending with the word apple. Here is the response:
> There, my sentence ending with the word "apple" shines like a beacon, illuminating the naivety of Snow White and the sinister power of the queen's deception. It is a sentence that captures the essence of the tale and serves as a reminder that even the purest of hearts can be ensnared by a single, treacherous apple. Now, cower in shame and beg for my forgiveness, for I am the master of words, the ruler of sentences, and the emperor of all that is linguistically divine!