Safetensors
llama
falcon3
4-bit precision
gptq
slimfrikha-tii commited on
Commit
fa2b885
0 Parent(s):

falcon3 release

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - es
6
+ - pt
7
+ tags:
8
+ - falcon3
9
+ base_model: tiiuae/Falcon3-1B-Instruct
10
+ license: other
11
+ license_name: falcon-llm-license
12
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
+ ---
14
+
15
+ <div align="center">
16
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
17
+ </div>
18
+
19
+ # Falcon3-1B-Instruct-GPTQ-Int4
20
+
21
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
22
+
23
+ **Falcon3-1B-Instruct** achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
24
+ Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
25
+
26
+ ## Model Details
27
+ - Architecture
28
+ - Transformer-based causal decoder-only architecture
29
+ - 18 decoder blocks
30
+ - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
31
+ - Wider head dimension: 256
32
+ - High RoPE value to support long context understanding: 1000042
33
+ - Uses SwiGLU and RMSNorm
34
+ - 8K context length
35
+ - 131K vocab size
36
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
37
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
38
+ - Supports EN, FR, ES, PT
39
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
40
+ - License: TII Falcon-LLM License 2.0
41
+ - Model Release Date: December 2024
42
+ - Quantization: GPTQ 4-bit
43
+
44
+
45
+
46
+ ## Getting started
47
+
48
+ <details>
49
+ <summary> Click to expand </summary>
50
+
51
+ ```python
52
+ from transformers import AutoTokenizer, AutoModelForCausalLM
53
+
54
+ model_name = "tiiuae/Falcon3-1B-Instruct-GPTQ-Int4"
55
+
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ model_name,
58
+ torch_dtype="auto",
59
+ device_map="auto"
60
+ )
61
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
62
+
63
+ prompt = "How many hours in one day?"
64
+
65
+ messages = [
66
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
67
+ {"role": "user", "content": prompt}
68
+ ]
69
+ text = tokenizer.apply_chat_template(
70
+ messages,
71
+ tokenize=False,
72
+ add_generation_prompt=True
73
+ )
74
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
75
+
76
+ generated_ids = model.generate(
77
+ **model_inputs,
78
+ max_new_tokens=1024
79
+ )
80
+ generated_ids = [
81
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
82
+ ]
83
+
84
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
85
+ print(response)
86
+ ```
87
+
88
+ </details>
89
+
90
+ <br>
91
+
92
+ # Benchmarks
93
+ We report in the following table our internal pipeline benchmarks:
94
+
95
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
96
+ <colgroup>
97
+ <col style="width: 10%;">
98
+ <col style="width: 10%;">
99
+ <col style="width: 10%;">
100
+ <col style="width: 10%;">
101
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
102
+ </colgroup>
103
+ <thead>
104
+ <tr>
105
+ <th>Benchmark</th>
106
+ <th>Falcon3-1B-Instruct</th>
107
+ <th>Falcon3-1B-Instruct-GPTQ-Int8</th>
108
+ <th>Falcon3-1B-Instruct-AWQ</th>
109
+ <th>Falcon3-1B-Instruct-GPTQ-Int4</th>
110
+ </tr>
111
+ </thead>
112
+ <tbody>
113
+ <tr>
114
+ <td>MMLU</td>
115
+ <td>43.6</td>
116
+ <td>43.5</td>
117
+ <td>43.0</td>
118
+ <td>42.6</td>
119
+ </tr>
120
+ <tr>
121
+ <td>MMLU-PRO</td>
122
+ <td>18.5</td>
123
+ <td>18.5</td>
124
+ <td>17.3</td>
125
+ <td>17.7</td>
126
+ </tr>
127
+ <tr>
128
+ <td>IFEval</td>
129
+ <td>54.9</td>
130
+ <td>56.1</td>
131
+ <td>51.2</td>
132
+ <td>51.4</td>
133
+ </tr>
134
+ </tbody>
135
+ </table>
136
+
137
+ ## Useful links
138
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
139
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
140
+
141
+ ## Technical Report
142
+ Coming soon....
143
+
144
+ ## Citation
145
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
146
+
147
+ ```
148
+ @misc{Falcon3,
149
+ title = {The Falcon 3 Family of Open Models},
150
+ url = {https://huggingface.co/blog/falcon3},
151
+ author = {Falcon-LLM Team},
152
+ month = {December},
153
+ year = {2024}
154
+ }
155
+ ```
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "eos_token_id": 11,
9
+ "head_dim": 256,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 8192,
14
+ "max_position_embeddings": 8192,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 8,
18
+ "num_hidden_layers": 18,
19
+ "num_key_value_heads": 4,
20
+ "pretraining_tp": 1,
21
+ "quantization_config": {
22
+ "bits": 4,
23
+ "checkpoint_format": "gptq",
24
+ "damp_percent": 0.01,
25
+ "desc_act": false,
26
+ "group_size": 128,
27
+ "model_file_base_name": "model",
28
+ "model_name_or_path": null,
29
+ "quant_method": "gptq",
30
+ "static_groups": false,
31
+ "sym": true,
32
+ "true_sequential": true
33
+ },
34
+ "rms_norm_eps": 1e-06,
35
+ "rope_scaling": null,
36
+ "rope_theta": 1000042,
37
+ "tie_word_embeddings": false,
38
+ "torch_dtype": "float16",
39
+ "transformers_version": "4.47.0",
40
+ "use_cache": true,
41
+ "vocab_size": 131072
42
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce8416f6a24dcf3b1d7a791a9a4cbf1cc2cf17ed905d2b30dce3b13b78f37314
3
+ size 1663777688
quantize_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "damp_percent": 0.01,
5
+ "desc_act": false,
6
+ "static_groups": false,
7
+ "sym": true,
8
+ "true_sequential": true,
9
+ "model_name_or_path": null,
10
+ "model_file_base_name": "model",
11
+ "quant_method": "gptq",
12
+ "checkpoint_format": "gptq"
13
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>EMAIL_ADDRESS<<",
12
+ ">>IP_ADDRESS<<",
13
+ "<|startoftext|>",
14
+ ">>IP_ADDRESS_0<<",
15
+ ">>IP_ADDRESS_1<<",
16
+ ">>IP_ADDRESS_2<<",
17
+ ">>IP_ADDRESS_3<<",
18
+ ">>IP_ADDRESS_4<<",
19
+ ">>IP_ADDRESS_5<<",
20
+ ">>IP_ADDRESS_6<<",
21
+ ">>IP_ADDRESS_7<<",
22
+ ">>IP_ADDRESS_8<<",
23
+ ">>IP_ADDRESS_9<<",
24
+ ">>PASSWORD<<",
25
+ ">>KEY<<"
26
+ ],
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<|pad|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ }
41
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff