evilsocket Iheb-Chaabane commited on
Commit
550ee97
·
0 Parent(s):

Duplicate from tiiuae/Falcon3-1B-Instruct

Browse files

Co-authored-by: Iheb Chaabane <Iheb-Chaabane@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - es
6
+ - pt
7
+ tags:
8
+ - falcon3
9
+ base_model: tiiuae/Falcon3-1B-Base
10
+ license: other
11
+ license_name: falcon-llm-license
12
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
+ library_name: transformers
14
+ ---
15
+
16
+ <div align="center">
17
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
18
+ </div>
19
+
20
+ # Falcon3-1B-Instruct
21
+
22
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
23
+
24
+ This repository contains the **Falcon3-1B-Instruct**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
25
+ Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
26
+
27
+ ## Model Details
28
+ - Architecture
29
+ - Transformer-based causal decoder-only architecture
30
+ - 18 decoder blocks
31
+ - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
32
+ - Wider head dimension: 256
33
+ - High RoPE value to support long context understanding: 1000042
34
+ - Uses SwiGLU and RMSNorm
35
+ - 8K context length
36
+ - 131K vocab size
37
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
38
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
39
+ - Supports EN, FR, ES, PT
40
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
41
+ - License: TII Falcon-LLM License 2.0
42
+ - Model Release Date: December 2024
43
+
44
+
45
+ ## Getting started
46
+
47
+ <details>
48
+ <summary> Click to expand </summary>
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+
53
+
54
+ model_name = "tiiuae/Falcon3-1B-Instruct"
55
+
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ model_name,
58
+ torch_dtype="auto",
59
+ device_map="auto"
60
+ )
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+
63
+ prompt = "How many hours in one day?"
64
+ messages = [
65
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
66
+ {"role": "user", "content": prompt}
67
+ ]
68
+ text = tokenizer.apply_chat_template(
69
+ messages,
70
+ tokenize=False,
71
+ add_generation_prompt=True
72
+ )
73
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
74
+
75
+ generated_ids = model.generate(
76
+ **model_inputs,
77
+ max_new_tokens=1024
78
+ )
79
+ generated_ids = [
80
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
81
+ ]
82
+
83
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
84
+ print(response)
85
+ ```
86
+
87
+ </details>
88
+
89
+ <br>
90
+
91
+ ## Benchmarks
92
+ We report in the following table our internal pipeline benchmarks.
93
+ - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
94
+ - We report **raw scores** obtained by applying chat template and fewshot_as_multiturn.
95
+ - We use same batch-size across all models.
96
+
97
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
98
+ <colgroup>
99
+ <col style="width: 10%;">
100
+ <col style="width: 10%;">
101
+ <col style="width: 7%;">
102
+ <col style="width: 7%;">
103
+ <col style="width: 7%;">
104
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
105
+ </colgroup>
106
+ <thead>
107
+ <tr>
108
+ <th>Category</th>
109
+ <th>Benchmark</th>
110
+ <th>Llama-3.2-1B</th>
111
+ <th>Qwen2.5-1.5B</th>
112
+ <th>SmolLM2-1.7B</th>
113
+ <th>Falcon3-1B-Instruct</th>
114
+ </tr>
115
+ </thead>
116
+ <tbody>
117
+ <tr>
118
+ <td rowspan="3">General</td>
119
+ <td>MMLU (5-shot)</td>
120
+ <td><b>68.2</b></td>
121
+ <td>59.8</td>
122
+ <td>49.2</td>
123
+ <td>46.1</td>
124
+ </tr>
125
+ <tr>
126
+ <td>MMLU-PRO (5-shot)</td>
127
+ <td>16</td>
128
+ <td><b>28.2</b></td>
129
+ <td>20</td>
130
+ <td>18.6</td>
131
+ </tr>
132
+ <tr>
133
+ <td>IFEval</td>
134
+ <td><b>55.3</b></td>
135
+ <td>44.2</td>
136
+ <td>53</td>
137
+ <td>54.4</td>
138
+ </tr>
139
+ <tr>
140
+ <td rowspan="3">Math</td>
141
+ <td>GSM8K (5-shot)</td>
142
+ <td><b>82.6</b></td>
143
+ <td>57.8</td>
144
+ <td>47.6</td>
145
+ <td>43.9</td>
146
+ </tr>
147
+ <tr>
148
+ <td>GSM8K (8-shot, COT)</td>
149
+ <td>46.6</td>
150
+ <td><b>58.8</b></td>
151
+ <td>46.3</td>
152
+ <td>45.8</td>
153
+ </tr>
154
+ <tr>
155
+ <td>MATH Lvl-5 (4-shot)</td>
156
+ <td><b>5.2</b></td>
157
+ <td>1.1</td>
158
+ <td>3.1</td>
159
+ <td>1</td>
160
+ </tr>
161
+ <tr>
162
+ <td rowspan="5">Reasoning</td>
163
+ <td>Arc Challenge (25-shot)</td>
164
+ <td><b>58.6</b></td>
165
+ <td>50.7</td>
166
+ <td>49.7</td>
167
+ <td>47.7</td>
168
+ </tr>
169
+ <tr>
170
+ <td>GPQA (0-shot)</td>
171
+ <td>24.4</td>
172
+ <td><b>29.6</b></td>
173
+ <td>28.6</td>
174
+ <td>26.5</td>
175
+ </tr>
176
+ <tr>
177
+ <td>GPQA (0-shot, COT)</td>
178
+ <td>13.2</td>
179
+ <td>9.2</td>
180
+ <td>16</td>
181
+ <td><b>21.3</b></td>
182
+ </tr>
183
+ <tr>
184
+ <td>MUSR (0-shot)</td>
185
+ <td>32</td>
186
+ <td>36.5</td>
187
+ <td>32.9</td>
188
+ <td><b>40.7</b></td>
189
+ </tr>
190
+ <tr>
191
+ <td>BBH (3-shot)</td>
192
+ <td>33.8</td>
193
+ <td><b>39.2</b></td>
194
+ <td>34</td>
195
+ <td>35.1</td>
196
+ </tr>
197
+ <tr>
198
+ <td rowspan="5">CommonSense Understanding</td>
199
+ <td>PIQA (0-shot)</td>
200
+ <td>72.1</td>
201
+ <td>73.2</td>
202
+ <td><b>74.4</b></td>
203
+ <td>72</td>
204
+ </tr>
205
+ <tr>
206
+ <td>SciQ (0-shot)</td>
207
+ <td>61.8</td>
208
+ <td>69.5</td>
209
+ <td>71.4</td>
210
+ <td><b>86.8</b></td>
211
+ </tr>
212
+ <tr>
213
+ <td>Winogrande (0-shot)</td>
214
+ <td>-</td>
215
+ <td>-</td>
216
+ <td>-</td>
217
+ <td><b>60.2</b></td>
218
+ </tr>
219
+ <tr>
220
+ <td>OpenbookQA (0-shot)</td>
221
+ <td>40.2</td>
222
+ <td>40.4</td>
223
+ <td><b>42.8</b></td>
224
+ <td>40</td>
225
+ </tr>
226
+ <tr>
227
+ <td>MT-Bench (avg)</td>
228
+ <td>5.4</td>
229
+ <td><b>7.1</b></td>
230
+ <td>6.1</td>
231
+ <td>5.5</td>
232
+ </tr>
233
+ <tr>
234
+ <td rowspan="1">Instructions following</td>
235
+ <td>Alpaca (WC)</td>
236
+ <td><b>8.6</b></td>
237
+ <td><b>8.6</b></td>
238
+ <td>5.4</td>
239
+ <td>6.1</td>
240
+ </tr>
241
+ </tbody>
242
+ </table>
243
+
244
+ ## Useful links
245
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
246
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
247
+
248
+ ## Technical Report
249
+ Coming soon....
250
+
251
+ ## Citation
252
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
253
+
254
+ ```
255
+ @misc{Falcon3,
256
+ title = {The Falcon 3 Family of Open Models},
257
+ url = {https://huggingface.co/blog/falcon3},
258
+ author = {Falcon-LLM Team},
259
+ month = {December},
260
+ year = {2024}
261
+ }
262
+ ```
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 11,
8
+ "head_dim": 256,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 2048,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 8192,
13
+ "max_position_embeddings": 8192,
14
+ "mlp_bias": false,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 8,
17
+ "num_hidden_layers": 18,
18
+ "num_key_value_heads": 4,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 1000042,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.46.1",
26
+ "use_cache": true,
27
+ "vocab_size": 131072
28
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 11,
4
+ "eos_token_id": 11,
5
+ "transformers_version": "4.46.1"
6
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f551b3271b550bfcf7c65282cdc8b3627c6ba8887e6dfa7492809f7b16cb087
3
+ size 3338836632
special_tokens_map.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>EMAIL_ADDRESS<<",
12
+ ">>IP_ADDRESS<<",
13
+ "<|startoftext|>",
14
+ ">>IP_ADDRESS_0<<",
15
+ ">>IP_ADDRESS_1<<",
16
+ ">>IP_ADDRESS_2<<",
17
+ ">>IP_ADDRESS_3<<",
18
+ ">>IP_ADDRESS_4<<",
19
+ ">>IP_ADDRESS_5<<",
20
+ ">>IP_ADDRESS_6<<",
21
+ ">>IP_ADDRESS_7<<",
22
+ ">>IP_ADDRESS_8<<",
23
+ ">>IP_ADDRESS_9<<",
24
+ ">>PASSWORD<<",
25
+ ">>KEY<<"
26
+ ],
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<|pad|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ }
41
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff