CISCai commited on
Commit
b1cba1f
1 Parent(s): 95493fe

Upload 13 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,15 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Codestral-22B-v0.1.imatrix.dat filter=lfs diff=lfs merge=lfs -text
37
+ Codestral-22B-v0.1.IQ1_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Codestral-22B-v0.1.IQ1_S.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Codestral-22B-v0.1.IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Codestral-22B-v0.1.IQ2_S.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Codestral-22B-v0.1.IQ2_XS.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Codestral-22B-v0.1.IQ2_XXS.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Codestral-22B-v0.1.IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Codestral-22B-v0.1.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
45
+ Codestral-22B-v0.1.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
46
+ Codestral-22B-v0.1.IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
47
+ Codestral-22B-v0.1.IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Codestral-22B-v0.1.IQ1_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0c88dd12f0ba45c2cc8f4d4a0653ad46a054bcd75e482e87d3bf8f17d499929
3
+ size 5267138368
Codestral-22B-v0.1.IQ1_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:698d70014cbe5a039273865a0adf8b9ad9950966f782c05ff4364b018e1e0eb2
3
+ size 4829488960
Codestral-22B-v0.1.IQ2_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65ca54784460e929114bb0992a2c1357633284799888244c969b9f7ad4491b30
3
+ size 7618963264
Codestral-22B-v0.1.IQ2_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f7d038a6e6921d579c45da9707a70886e50fc3036afb5690ab916ba72da98d5
3
+ size 7035430720
Codestral-22B-v0.1.IQ2_XS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:deed9845852b1a6ce603a2904aa87637dcb2eebd6885f589f911f814b04d25fe
3
+ size 6646146880
Codestral-22B-v0.1.IQ2_XXS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc6d1bcd43075fdfa5c44f27a049b2b8822d2d950fa4f3840b9942dd95e9a258
3
+ size 5996554048
Codestral-22B-v0.1.IQ3_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7143bf5722b359e278c0cc0f9223c420779a222c52fff6eae56e7efc5f54ce03
3
+ size 10062407488
Codestral-22B-v0.1.IQ3_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f18717bfb590f5a74ab49f4206515750069dd0d0a8987322a456453c3714f68f
3
+ size 9688065856
Codestral-22B-v0.1.IQ3_XS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:22f5c2ef5c2ea7bf662831a7fd8699cb49e8949d88fcf7a76c0df9f0d7a6ae23
3
+ size 9176098624
Codestral-22B-v0.1.IQ3_XXS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0005a6db79cab533b8b7dba3e7fca688273f0f474b18c0eed10966f67ef712e6
3
+ size 8598857536
Codestral-22B-v0.1.IQ4_XS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ec951f01871217e211b6038cc3079f7dfae51f7f54b7a32dc4db37ec64ec72a
3
+ size 11935295296
Codestral-22B-v0.1.imatrix.dat ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9601a2b8c35b80e5e02804b1a5d145c34f1460941619425c1110c2007be0f050
3
+ size 11940568
README.md CHANGED
@@ -1,5 +1,372 @@
1
- ---
2
- license: other
3
- license_name: mnpl
4
- license_link: https://mistral.ai/licenses/MNPL-0.1.md
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ license: other
4
+ license_name: mnpl
5
+ license_link: https://mistral.ai/licenses/MNPL-0.1.md
6
+ tags:
7
+ - code
8
+ language:
9
+ - code
10
+ base_model: mistralai/Codestral-22B-v0.1
11
+ model_creator: Mistral AI
12
+ model_name: Codestral-22B-v0.1
13
+ model_type: mistral
14
+ datasets:
15
+ - m-a-p/CodeFeedback-Filtered-Instruction
16
+ quantized_by: CISC
17
+ ---
18
+
19
+ # Codestral-22B-v0.1 - SOTA GGUF
20
+ - Model creator: [Mistral AI](https://huggingface.co/mistralai)
21
+ - Original model: [Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1)
22
+
23
+ <!-- description start -->
24
+ ## Description
25
+
26
+ This repo contains State Of The Art quantized GGUF format model files for [Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1).
27
+
28
+ Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
29
+
30
+ The embedded chat template has been extended to support function calling via OpenAI-compatible `tools` parameter and Fill-in-Middle token metadata has been added, see [example](#simple-llama-cpp-python-example-fill-in-middle-code). NOTE: Mistral's FIM requires support for [SPM infill mode](https://github.com/abetlen/llama-cpp-python/pull/1492)!
31
+
32
+ <!-- description end -->
33
+
34
+
35
+ <!-- prompt-template start -->
36
+ ## Prompt template: Mistral v3
37
+
38
+ ```
39
+ [AVAILABLE_TOOLS] [{"name": "function_name", "description": "Description", "parameters": {...}}, ...][/AVAILABLE_TOOLS][INST] {prompt}[/INST]
40
+ ```
41
+
42
+ <!-- prompt-template end -->
43
+
44
+
45
+ <!-- compatibility_gguf start -->
46
+ ## Compatibility
47
+
48
+ These quantised GGUFv3 files are compatible with llama.cpp from February 27th 2024 onwards, as of commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307)
49
+
50
+ They are also compatible with many third party UIs and libraries provided they are built using a recent llama.cpp.
51
+
52
+ ## Explanation of quantisation methods
53
+
54
+ <details>
55
+ <summary>Click to see details</summary>
56
+
57
+ The new methods available are:
58
+
59
+ * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
60
+ * GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
61
+ * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
62
+ * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
63
+ * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
64
+ * GGML_TYPE_IQ2_M - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.7 bpw
65
+ * GGML_TYPE_IQ3_XXS - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.06 bpw
66
+ * GGML_TYPE_IQ3_XS - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.3 bpw
67
+ * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
68
+ * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
69
+ * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
70
+ * GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
71
+
72
+ Refer to the Provided Files table below to see what files use which methods, and how.
73
+ </details>
74
+ <!-- compatibility_gguf end -->
75
+
76
+ <!-- README_GGUF.md-provided-files start -->
77
+ ## Provided files
78
+
79
+ | Name | Quant method | Bits | Size | Max RAM required | Use case |
80
+ | ---- | ---- | ---- | ---- | ---- | ----- |
81
+ | [Codestral-22B-v0.1.IQ1_S.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ1_S.gguf) | IQ1_S | 1 | 4.3 GB| 5.3 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
82
+ | [Codestral-22B-v0.1.IQ1_M.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ1_M.gguf) | IQ1_M | 1 | 4.8 GB| 5.8 GB | very small, significant quality loss |
83
+ | [Codestral-22B-v0.1.IQ2_XXS.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ2_XXS.gguf) | IQ2_XXS | 2 | 5.4 GB| 6.4 GB | very small, high quality loss |
84
+ | [Codestral-22B-v0.1.IQ2_XS.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ2_XS.gguf) | IQ2_XS | 2 | 6.0 GB| 7.0 GB | very small, high quality loss |
85
+ | [Codestral-22B-v0.1.IQ2_S.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ2_S.gguf) | IQ2_S | 2 | 6.4 GB| 7.4 GB | small, substantial quality loss |
86
+ | [Codestral-22B-v0.1.IQ2_M.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ2_M.gguf) | IQ2_M | 2 | 6.9 GB| 7.9 GB | small, greater quality loss |
87
+ | [Codestral-22B-v0.1.IQ3_XXS.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ3_XXS.gguf) | IQ3_XXS | 3 | 7.9 GB| 8.9 GB | very small, high quality loss |
88
+ | [Codestral-22B-v0.1.IQ3_XS.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ3_XS.gguf) | IQ3_XS | 3 | 8.4 GB| 9.4 GB | small, substantial quality loss |
89
+ | [Codestral-22B-v0.1.IQ3_S.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ3_S.gguf) | IQ3_S | 3 | 8.9 GB| 9.9 GB | small, greater quality loss |
90
+ | [Codestral-22B-v0.1.IQ3_M.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ3_M.gguf) | IQ3_M | 3 | 9.2 GB| 10.2 GB | medium, balanced quality - recommended |
91
+ | [Codestral-22B-v0.1.IQ4_XS.gguf](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.IQ4_XS.gguf) | IQ4_XS | 4 | 11.5 GB| 12.5 GB | small, substantial quality loss |
92
+
93
+ Generated importance matrix file: [Codestral-22B-v0.1.imatrix.dat](https://huggingface.co/CISCai/Codestral-22B-v0.1-SOTA-GGUF/blob/main/Codestral-22B-v0.1.imatrix.dat)
94
+
95
+ **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
96
+
97
+ <!-- README_GGUF.md-provided-files end -->
98
+
99
+ <!-- README_GGUF.md-how-to-run start -->
100
+ ## Example `llama.cpp` command
101
+
102
+ Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
103
+
104
+ ```shell
105
+ ./main -ngl 57 -m Codestral-22B-v0.1.IQ4_XS.gguf --color -c 32768 --temp 0 --repeat-penalty 1.1 -p "[AVAILABLE_TOOLS] {tools}[/AVAILABLE_TOOLS][INST] {prompt}[/INST]"
106
+ ```
107
+
108
+ Change `-ngl 57` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
109
+
110
+ Change `-c 32768` to the desired sequence length.
111
+
112
+ If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
113
+
114
+ If you are low on V/RAM try quantizing the K-cache with `-ctk q8_0` or even `-ctk q4_0` for big memory savings (depending on context size).
115
+ There is a similar option for V-cache (`-ctv`), however that is [not working yet](https://github.com/ggerganov/llama.cpp/issues/4425).
116
+
117
+ For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
118
+
119
+ ## How to run from Python code
120
+
121
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) module.
122
+
123
+ ### How to load this model in Python code, using llama-cpp-python
124
+
125
+ For full documentation, please see: [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/).
126
+
127
+ #### First install the package
128
+
129
+ Run one of the following commands, according to your system:
130
+
131
+ ```shell
132
+ # Prebuilt wheel with basic CPU support
133
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
134
+ # Prebuilt wheel with NVidia CUDA acceleration
135
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 (or cu122 etc.)
136
+ # Prebuilt wheel with Metal GPU acceleration
137
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
138
+ # Build base version with no GPU acceleration
139
+ pip install llama-cpp-python
140
+ # With NVidia CUDA acceleration
141
+ CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
142
+ # Or with OpenBLAS acceleration
143
+ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
144
+ # Or with CLBLast acceleration
145
+ CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
146
+ # Or with AMD ROCm GPU acceleration (Linux only)
147
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
148
+ # Or with Metal GPU acceleration for macOS systems only
149
+ CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
150
+ # Or with Vulkan acceleration
151
+ CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
152
+ # Or with Kompute acceleration
153
+ CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
154
+ # Or with SYCL acceleration
155
+ CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
156
+
157
+ # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
158
+ $env:CMAKE_ARGS = "-DLLAMA_CUDA=on"
159
+ pip install llama-cpp-python
160
+ ```
161
+
162
+ #### Simple llama-cpp-python example code
163
+
164
+ ```python
165
+ from llama_cpp import Llama
166
+
167
+ # Chat Completion API
168
+
169
+ llm = Llama(model_path="./Codestral-22B-v0.1.IQ4_XS.gguf", n_gpu_layers=57, n_ctx=32768)
170
+ print(llm.create_chat_completion(
171
+ repeat_penalty = 1.1,
172
+ messages = [
173
+ {
174
+ "role": "user",
175
+ "content": "Pick a LeetCode challenge and solve it in Python."
176
+ }
177
+ ]
178
+ ))
179
+ ```
180
+
181
+ #### Simple llama-cpp-python example fill-in-middle code
182
+
183
+ ```python
184
+ from llama_cpp import Llama
185
+
186
+ # Completion API
187
+
188
+ llm = Llama(model_path="./Codestral-22B-v0.1.IQ4_XS.gguf", n_gpu_layers=57, n_ctx=32768, spm_infill=True)
189
+ print(llm.create_completion(
190
+ temperature = 0.0,
191
+ repeat_penalty = 1.1,
192
+ prompt = "def add(",
193
+ suffix = " return sum"
194
+ ))
195
+ ```
196
+
197
+ #### Simple llama-cpp-python example function calling code
198
+
199
+ ```python
200
+ from llama_cpp import Llama
201
+
202
+ # Chat Completion API
203
+
204
+ llm = Llama(model_path="./Codestral-22B-v0.1.IQ4_XS.gguf", n_gpu_layers=57, n_ctx=32768)
205
+ print(llm.create_chat_completion(
206
+ temperature = 0.0,
207
+ repeat_penalty = 1.1,
208
+ messages = [
209
+ {
210
+ "role": "user",
211
+ "content": "In a physics experiment, you are given an object with a mass of 50 kilograms and a volume of 10 cubic meters. Can you use the 'calculate_density' function to determine the density of this object?"
212
+ },
213
+ { # The tool_calls is from the response to the above with tool_choice active
214
+ "role": "assistant",
215
+ "content": None,
216
+ "tool_calls": [
217
+ {
218
+ "id": "call__0_calculate_density_cmpl-...",
219
+ "type": "function",
220
+ "function": {
221
+ "name": "calculate_density",
222
+ "arguments": '{"mass": "50", "volume": "10"}'
223
+ }
224
+ }
225
+ ]
226
+ },
227
+ { # The tool_call_id is from tool_calls and content is the result from the function call you made
228
+ "role": "tool",
229
+ "content": "5.0",
230
+ "tool_call_id": "call__0_calculate_density_cmpl-..."
231
+ }
232
+ ],
233
+ tools=[{
234
+ "type": "function",
235
+ "function": {
236
+ "name": "calculate_density",
237
+ "description": "Calculates the density of an object.",
238
+ "parameters": {
239
+ "type": "object",
240
+ "properties": {
241
+ "mass": {
242
+ "type": "integer",
243
+ "description": "The mass of the object."
244
+ },
245
+ "volume": {
246
+ "type": "integer",
247
+ "description": "The volume of the object."
248
+ }
249
+ },
250
+ "required": [ "mass", "volume" ]
251
+ }
252
+ }
253
+ }],
254
+ #tool_choice={
255
+ # "type": "function",
256
+ # "function": {
257
+ # "name": "calculate_density"
258
+ # }
259
+ #}
260
+ ))
261
+ ```
262
+
263
+ <!-- README_GGUF.md-how-to-run end -->
264
+
265
+ <!-- original-model-card start -->
266
+ # Model Card for Codestral-22B-v0.1
267
+
268
+ Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash (more details in the [Blogpost](https://mistral.ai/news/codestral/)). The model can be queried:
269
+ - As instruct, for instance to answer any questions about a code snippet (write documentation, explain, factorize) or to generate code following specific indications
270
+ - As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)
271
+
272
+
273
+ ## Installation
274
+
275
+ It is recommended to use `mistralai/Codestral-22B-v0.1` with [mistral-inference](https://github.com/mistralai/mistral-inference).
276
+
277
+ ```
278
+ pip install mistral_inference
279
+ ```
280
+
281
+ ## Download
282
+
283
+ ```py
284
+ from huggingface_hub import snapshot_download
285
+ from pathlib import Path
286
+
287
+ mistral_models_path = Path.home().joinpath('mistral_models', 'Codestral-22B-v0.1')
288
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
289
+
290
+ snapshot_download(repo_id="mistralai/Codestral-22B-v0.1", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
291
+ ```
292
+
293
+ ### Chat
294
+
295
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment.
296
+
297
+ ```
298
+ mistral-chat $HOME/mistral_models/Codestral-22B-v0.1 --instruct --max_tokens 256
299
+ ```
300
+
301
+ Will generate an answer to "Write me a function that computes fibonacci in Rust" and should give something along the following lines:
302
+
303
+ ```
304
+ Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number.
305
+
306
+ fn fibonacci(n: u32) -> u32 {
307
+ match n {
308
+ 0 => 0,
309
+ 1 => 1,
310
+ _ => fibonacci(n - 1) + fibonacci(n - 2),
311
+ }
312
+ }
313
+
314
+ fn main() {
315
+ let n = 10;
316
+ println!("The {}th Fibonacci number is: {}", n, fibonacci(n));
317
+ }
318
+
319
+ This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers.
320
+ ```
321
+
322
+
323
+ ### Fill-in-the-middle (FIM)
324
+
325
+ After installing `mistral_inference` and running `pip install --upgrade mistral_common` to make sure to have mistral_common>=1.2 installed:
326
+
327
+ ```py
328
+ from mistral_inference.model import Transformer
329
+ from mistral_inference.generate import generate
330
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
331
+ from mistral_common.tokens.instruct.request import FIMRequest
332
+
333
+ tokenizer = MistralTokenizer.v3()
334
+ model = Transformer.from_folder("~/codestral-22B-240529")
335
+
336
+ prefix = """def add("""
337
+ suffix = """ return sum"""
338
+
339
+ request = FIMRequest(prompt=prefix, suffix=suffix)
340
+
341
+ tokens = tokenizer.encode_fim(request).tokens
342
+
343
+ out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
344
+ result = tokenizer.decode(out_tokens[0])
345
+
346
+ middle = result.split(suffix)[0].strip()
347
+ print(middle)
348
+ ```
349
+
350
+ Should give something along the following lines:
351
+
352
+ ```
353
+ num1, num2):
354
+
355
+ # Add two numbers
356
+ sum = num1 + num2
357
+
358
+ # return the sum
359
+ ```
360
+
361
+ ## Limitations
362
+
363
+ The Codestral-22B-v0.1 does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
364
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
365
+
366
+ ## License
367
+
368
+ Codestral-22B-v0.1 is released under the `MNLP-0.1` license.
369
+
370
+ ## The Mistral AI Team
371
+
372
+ Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Jean-Malo Delignon, Jia Li, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickael Seznec, Nicolas Schuhl, Patrick von Platen, Romain Sauvestre, Pierre Stock, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Thibault Schueller, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall