ThomasTheMaker commited on
Commit
b83c9e0
·
verified ·
1 Parent(s): 348b900

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Falcon3-1B-Base-1.2.0.rkllm filter=lfs diff=lfs merge=lfs -text
Falcon3-1B-Base-1.2.0.rkllm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc9a395698ea958b41bfc8c0ee69d9373ea1c93246d1600bdbd4865cdec808a0
3
+ size 1960232838
README.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - es
6
+ - pt
7
+ tags:
8
+ - falcon3
9
+ license: other
10
+ license_name: falcon-llm-license
11
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
12
+ library_name: transformers
13
+ ---
14
+
15
+ <div align="center">
16
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
17
+ </div>
18
+
19
+ # Falcon3-1B-Base
20
+
21
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
22
+
23
+ This repository contains the **Falcon3-1B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
24
+ Falcon3-1B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 4K.
25
+ It was pruned in terms of depth, width, number of heads, and embedding channels from a larger 3B Falcon model, and was efficiently trained on only 80 GT using a knowledge distillation objective.
26
+
27
+ ⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases.**
28
+
29
+ ## Model Details
30
+ - Architecture
31
+ - Transformer-based causal decoder-only architecture
32
+ - 18 decoder blocks
33
+ - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
34
+ - Wider head dimension: 256
35
+ - High RoPE value to support long context understanding: 1000042
36
+ - Uses SwiGLU and RMSNorm
37
+ - 4K context length
38
+ - 131K vocab size
39
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
40
+ - Supports EN, FR, ES, PT
41
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
42
+ - License: TII Falcon-LLM License 2.0
43
+ - Model Release Date: December 2024
44
+
45
+
46
+ ## Getting started
47
+
48
+ <details>
49
+ <summary> Click to expand </summary>
50
+
51
+ ```python
52
+ import torch
53
+ from transformers import pipeline
54
+
55
+ pipe = pipeline(
56
+ "text-generation",
57
+ model="tiiuae/Falcon3-1B-Base",
58
+ torch_dtype=torch.bfloat16,
59
+ device_map="auto"
60
+ )
61
+ response = pipe("Question: How many hours in one day? Answer: ")
62
+ print(response[0]['generated_text'])
63
+ ```
64
+
65
+ </details>
66
+
67
+ <br>
68
+
69
+ ## Benchmarks
70
+ We report in the following table our internal pipeline benchmarks.
71
+ - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
72
+ - We report **raw scores**.
73
+ - We use same batch-size across all models.
74
+
75
+
76
+
77
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
78
+ <colgroup>
79
+ <col style="width: 10%;">
80
+ <col style="width: 10%;">
81
+ <col style="width: 7%;">
82
+ <col style="width: 7%;">
83
+ <col style="width: 7%;">
84
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
85
+ </colgroup>
86
+ <thead>
87
+ <tr>
88
+ <th>Category</th>
89
+ <th>Benchmark</th>
90
+ <th>Llama-3.2-1B</th>
91
+ <th>Qwen2.5-1.5B</th>
92
+ <th>SmolLM2-1.7B</th>
93
+ <th>Falcon3-1B-Base</th>
94
+ </tr>
95
+ </thead>
96
+ <tbody>
97
+ <tr>
98
+ <td rowspan="3">General</td>
99
+ <td>MMLU (5-shot)</td>
100
+ <td>31.1</td>
101
+ <td><b>61.0</b></td>
102
+ <td>50.1</td>
103
+ <td>42.5</td>
104
+ </tr>
105
+ <tr>
106
+ <td>MMLU-PRO (5-shot)</td>
107
+ <td>11.7</td>
108
+ <td><b>28.4</b></td>
109
+ <td>21.3</td>
110
+ <td>16.1</td>
111
+ </tr>
112
+ <tr>
113
+ <td>IFEval</td>
114
+ <td>14.8</td>
115
+ <td><b>26.0</b></td>
116
+ <td>24.2</td>
117
+ <td>25.2</td>
118
+ </tr>
119
+ <tr>
120
+ <td rowspan="2">Math</td>
121
+ <td>GSM8K (5-shot)</td>
122
+ <td>6.6</td>
123
+ <td><b>62.2</b></td>
124
+ <td>31.0</td>
125
+ <td>34.3</td>
126
+ </tr>
127
+ <tr>
128
+ <td>MATH Lvl-5 (4-shot)</td>
129
+ <td>0.2</td>
130
+ <td><b>6.7</b></td>
131
+ <td>1.4</td>
132
+ <td>2.2</td>
133
+ </tr>
134
+ <tr>
135
+ <td rowspan="4">Reasoning</td>
136
+ <td>Arc Challenge (25-shot)</td>
137
+ <td>40.2</td>
138
+ <td><b>54.8</b></td>
139
+ <td>54.1</td>
140
+ <td>48.1</td>
141
+ </tr>
142
+ <tr>
143
+ <td>GPQA (0-shot)</td>
144
+ <td>24.2</td>
145
+ <td>28.1</td>
146
+ <td><b>28.9</b></td>
147
+ <td>28.1</td>
148
+ </tr>
149
+ <tr>
150
+ <td>MUSR (0-shot)</td>
151
+ <td>34.5</td>
152
+ <td>35.5</td>
153
+ <td>34.7</td>
154
+ <td><b>41.9</b></td>
155
+ </tr>
156
+ <tr>
157
+ <td>BBH (3-shot)</td>
158
+ <td>31.2</td>
159
+ <td><b>41.1</b></td>
160
+ <td>34.2</td>
161
+ <td>36.0</td>
162
+ </tr>
163
+ <tr>
164
+ <td rowspan="4">CommonSense Understanding</td>
165
+ <td>PIQA (0-shot)</td>
166
+ <td>74.5</td>
167
+ <td>76.0</td>
168
+ <td><b>77.5</b></td>
169
+ <td>74.5</td>
170
+ </tr>
171
+ <tr>
172
+ <td>SciQ (0-shot)</td>
173
+ <td>88.5</td>
174
+ <td><b>93.1</b></td>
175
+ <td>90.8</td>
176
+ <td>91.1</td>
177
+ </tr>
178
+ <tr>
179
+ <td>Winogrande (0-shot)</td>
180
+ <td>60.4</td>
181
+ <td>63.0</td>
182
+ <td><b>66.1</b></td>
183
+ <td>61.2</td>
184
+ </tr>
185
+ <tr>
186
+ <td>OpenbookQA (0-shot)</td>
187
+ <td>37.4</td>
188
+ <td>40.4</td>
189
+ <td><b>44.0</b></td>
190
+ <td>41.0</td>
191
+ </tr>
192
+ </tbody>
193
+ </table>
194
+
195
+ ## Useful links
196
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
197
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
198
+
199
+ ## Technical Report
200
+ Coming soon....
201
+
202
+ ## Citation
203
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
204
+
205
+ ```
206
+ @misc{Falcon3,
207
+ title = {The Falcon 3 Family of Open Models},
208
+ url = {https://huggingface.co/blog/falcon3},
209
+ author = {Falcon-LLM Team},
210
+ month = {December},
211
+ year = {2024}
212
+ }
213
+ ```
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "activation": "swiglu",
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "eos_token_id": 11,
9
+ "head_dim": 256,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "intermediate_size": 8192,
13
+ "max_position_embeddings": 4096,
14
+ "mlp_bias": false,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 8,
17
+ "num_hidden_layers": 18,
18
+ "num_key_value_heads": 4,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 1000042,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.46.1",
26
+ "use_cache": true,
27
+ "vocab_size": 131072
28
+ }
generation_config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": 11,
4
+ "transformers_version": "4.46.1"
5
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>EMAIL_ADDRESS<<",
12
+ ">>IP_ADDRESS<<",
13
+ "<|startoftext|>",
14
+ ">>IP_ADDRESS_0<<",
15
+ ">>IP_ADDRESS_1<<",
16
+ ">>IP_ADDRESS_2<<",
17
+ ">>IP_ADDRESS_3<<",
18
+ ">>IP_ADDRESS_4<<",
19
+ ">>IP_ADDRESS_5<<",
20
+ ">>IP_ADDRESS_6<<",
21
+ ">>IP_ADDRESS_7<<",
22
+ ">>IP_ADDRESS_8<<",
23
+ ">>IP_ADDRESS_9<<",
24
+ ">>PASSWORD<<",
25
+ ">>KEY<<"
26
+ ],
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<|pad|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ }
41
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff