File size: 8,959 Bytes
09f3fbd
 
 
 
 
 
 
 
 
 
 
b0d3447
 
 
590a8e6
67fc543
 
592db59
67fc543
f404f44
 
06b30d2
67fc543
590a8e6
 
 
 
b0d3447
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09f3fbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
inference: false
license: other
license_name: mnpl
license_link: https://mistral.ai/licences/MNPL-0.1.md
tags:
- code
language:
- code
---


---
### Basic Context Obedient Prompt that works great for RAG

Note: Its pretty PG when it comes to its responses.

Example Video: 
https://imgur.com/LGuC1I0

Example Video 2: Further testing with more key value pairs
https://imgur.com/xYyYRgz


Ram Usage for Q8 Quantized model on an RTX3090 24GB
https://imgur.com/GPlGLme

```
Contextual-Request:
BEGININPUT
BEGINCONTEXT
date: 2024-05-03
url: https://web.site.thisshitsbadouthereboys/123
ENDCONTEXT
Pandemic Warning Notice there has been a huge issue with Zombie humans that are passing on a new disease that appeared to be similar to the symptoms of covid but when a host dies they reanimate as a zombie corpse. 
ENDINPUT
BEGININSTRUCTION
What is the the pandemic about? cite your sources
ENDINSTRUCTION

### Contextual Response:
```
---

Overview
This model is meant to enhance adherence to provided context (e.g., for RAG applications) and reduce hallucinations, inspired by airoboros context-obedient question answer format.

## Overview

The format for a contextual prompt is as follows:
```
Contextual-Request:
BEGININPUT
BEGINCONTEXT
[key0: value0]
[key1: value1]
... other metdata ... like character Mood: Scared, Tone of the scene: Spooky, anything that will enhance your RAG / experience, maybe even use small Mermaid Knowledge Graphs as core memories stored as events like I do in my assistant I am building.
ENDCONTEXT
[insert your text blocks here, this is where RAG content goes]
ENDINPUT
[add as many other blocks, in the exact same format]
BEGININSTRUCTION
[insert your instruction(s).  The model was tuned with single questions, paragraph format, lists, etc.]
ENDINSTRUCTION
```

I know it's a bit verbose and annoying, but after much trial and error, using these explicit delimiters helps the model understand where to find the responses and how to associate specific sources with it.
- `Contextual-Request:` - denotes the type of request pattern the model is to follow for consistency
- `BEGININPUT` - denotes a new input block
- `BEGINCONTEXT` - denotes the block of context (metadata key/value pairs) to associate with the current input block
- `ENDCONTEXT` - denotes the end of the metadata block for the current input
- [text] - Insert whatever text you want for the input block, as many paragraphs as can fit in the context.
- `ENDINPUT` - denotes the end of the current input block
- [repeat as many input blocks in this format as you want]
- `BEGININSTRUCTION` - denotes the start of the list (or one) instruction(s) to respond to for all of the input blocks above.
- [instruction(s)]
- `ENDINSTRUCTION` - denotes the end of instruction set

Here's a trivial, but important example to prove the point:
```
Contextual-Request:
BEGININPUT
BEGINCONTEXT
date: 2021-01-01
url: https://web.site/123
ENDCONTEXT
In a shocking turn of events, blueberries are now green, but will be sticking with the same name.
ENDINPUT
BEGININSTRUCTION
What color are bluberries?  Source?
ENDINSTRUCTION
```

And the expected response:
```
### Contextual Response:
Blueberries are now green.
Source:
date: 2021-01-01
url: https://web.site/123
```

### References in response

As shown in the example, the dataset includes many examples of including source details in the response, when the question asks for source/citation/references.

Why do this?  Well, the R in RAG seems to be the weakest link in the chain.
Retrieval accuracy, depending on many factors including the overall dataset size, can be quite low.
This accuracy increases when retrieving more documents, but then you have the issue of actually using
the retrieved documents in prompts. If you use one prompt per document (or document chunk), you know
exactly which document the answer came from, so there's no issue.  If, however, you include multiple
chunks in a single prompt, it's useful to include the specific reference chunk(s) used to generate the
response, rather than naively including references to all of the chunks included in the prompt.

For example, suppose I have two documents:
```
url: http://foo.bar/1
Strawberries are tasty.

url: http://bar.foo/2
The cat is blue.
```

If the question being asked is `What color is the cat?`, I would only expect the 2nd document to be referenced in the response, as the other link is irrelevant.


# BASE MODEL: Model Card for Codestral-22B-v0.1

Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash (more details in the [Blogpost](https://mistral.ai/news/codestral/)). The model can be queried:
- As instruct, for instance to answer any questions about a code snippet (write documentation, explain, factorize) or to generate code following specific indications
- As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)

## Installation

It is recommended to use `mistralai/Codestral-22B-v0.1` with [mistral-inference](https://github.com/mistralai/mistral-inference).

```
pip install mistral_inference
```

## Download

```py
from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Codestral-22B-v0.1')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Codestral-22B-v0.1", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
```

### Chat

After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment.

```
mistral-chat $HOME/mistral_models/Codestral-22B-v0.1 --instruct --max_tokens 256
```

Will generate an answer to "Write me a function that computes fibonacci in Rust" and should give something along the following lines:

```
Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number.

fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn main() {
    let n = 10;
    println!("The {}th Fibonacci number is: {}", n, fibonacci(n));
}

This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers.
```


### Fill-in-the-middle (FIM)

After installing `mistral_inference` and running `pip install --upgrade mistral_common` to make sure to have mistral_common>=1.2 installed:

```py
from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.instruct.request import FIMRequest

tokenizer = MistralTokenizer.v3()
model = Transformer.from_folder("~/codestral-22B-240529")

prefix = """def add("""
suffix = """    return sum"""

request = FIMRequest(prompt=prefix, suffix=suffix)

tokens = tokenizer.encode_fim(request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

middle = result.split(suffix)[0].strip()
print(middle)
```

Should give something along the following lines:

```
num1, num2):

    # Add two numbers
    sum = num1 + num2

    # return the sum
```

## Limitations

The Codestral-22B-v0.1 does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

## License

Codestral-22B-v0.1 is released under the `MNLP-0.1` license.

## The Mistral AI Team

Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Jean-Malo Delignon, Jia Li, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickael Seznec, Nicolas Schuhl, Patrick von Platen, Romain Sauvestre, Pierre Stock, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Thibault Schueller, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall