instruct model - initial commit
Browse files
README.md
CHANGED
@@ -1,3 +1,316 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!-- ---
|
2 |
+
pipeline_tag: text-generation
|
3 |
+
inference: false
|
4 |
+
license: apache-2.0
|
5 |
+
# datasets:
|
6 |
+
# metrics:
|
7 |
+
# - code_eval
|
8 |
+
library_name: transformers
|
9 |
+
tags:
|
10 |
+
- language
|
11 |
+
- granite-3.0
|
12 |
+
model-index:
|
13 |
+
- name: granite-3.0-2b-instruct
|
14 |
+
results:
|
15 |
+
- task:
|
16 |
+
type: text-generation
|
17 |
+
dataset:
|
18 |
+
type: human-exams
|
19 |
+
name: MMLU
|
20 |
+
metrics:
|
21 |
+
- name: pass@1
|
22 |
+
type: pass@1
|
23 |
+
value:
|
24 |
+
veriefied: false
|
25 |
+
- task:
|
26 |
+
type: text-generation
|
27 |
+
dataset:
|
28 |
+
type: human-exams
|
29 |
+
name: MMLU-Pro
|
30 |
+
metrics:
|
31 |
+
- name: pass@1
|
32 |
+
type: pass@1
|
33 |
+
value:
|
34 |
+
veriefied: false
|
35 |
+
- task:
|
36 |
+
type: text-generation
|
37 |
+
dataset:
|
38 |
+
type: human-exams
|
39 |
+
name: AGI-Eval
|
40 |
+
metrics:
|
41 |
+
- name: pass@1
|
42 |
+
type: pass@1
|
43 |
+
value:
|
44 |
+
veriefied: false
|
45 |
+
- task:
|
46 |
+
type: text-generation
|
47 |
+
dataset:
|
48 |
+
type: commonsense
|
49 |
+
name: WinoGrande
|
50 |
+
metrics:
|
51 |
+
- name: pass@1
|
52 |
+
type: pass@1
|
53 |
+
value:
|
54 |
+
veriefied: false
|
55 |
+
- task:
|
56 |
+
type: text-generation
|
57 |
+
dataset:
|
58 |
+
type: commonsense
|
59 |
+
name: OBQA
|
60 |
+
metrics:
|
61 |
+
- name: pass@1
|
62 |
+
type: pass@1
|
63 |
+
value:
|
64 |
+
veriefied: false
|
65 |
+
- task:
|
66 |
+
type: text-generation
|
67 |
+
dataset:
|
68 |
+
type: commonsense
|
69 |
+
name: SIQA
|
70 |
+
metrics:
|
71 |
+
- name: pass@1
|
72 |
+
type: pass@1
|
73 |
+
value:
|
74 |
+
veriefied: false
|
75 |
+
- task:
|
76 |
+
type: text-generation
|
77 |
+
dataset:
|
78 |
+
type: commonsense
|
79 |
+
name: PIQA
|
80 |
+
metrics:
|
81 |
+
- name: pass@1
|
82 |
+
type: pass@1
|
83 |
+
value:
|
84 |
+
veriefied: false
|
85 |
+
- task:
|
86 |
+
type: text-generation
|
87 |
+
dataset:
|
88 |
+
type: commonsense
|
89 |
+
name: Hellaswag
|
90 |
+
metrics:
|
91 |
+
- name: pass@1
|
92 |
+
type: pass@1
|
93 |
+
value:
|
94 |
+
veriefied: false
|
95 |
+
- task:
|
96 |
+
type: text-generation
|
97 |
+
dataset:
|
98 |
+
type: commonsense
|
99 |
+
name: TruthfulQA
|
100 |
+
metrics:
|
101 |
+
- name: pass@1
|
102 |
+
type: pass@1
|
103 |
+
value:
|
104 |
+
veriefied: false
|
105 |
+
- task:
|
106 |
+
type: text-generation
|
107 |
+
dataset:
|
108 |
+
type: reading-comprehension
|
109 |
+
name: BoolQ
|
110 |
+
metrics:
|
111 |
+
- name: pass@1
|
112 |
+
type: pass@1
|
113 |
+
value:
|
114 |
+
veriefied: false
|
115 |
+
- task:
|
116 |
+
type: text-generation
|
117 |
+
dataset:
|
118 |
+
type: reading-comprehension
|
119 |
+
name: SQuAD v2
|
120 |
+
metrics:
|
121 |
+
- name: pass@1
|
122 |
+
type: pass@1
|
123 |
+
value:
|
124 |
+
veriefied: false
|
125 |
+
- task:
|
126 |
+
type: text-generation
|
127 |
+
dataset:
|
128 |
+
type: reasoning
|
129 |
+
name: ARC-C
|
130 |
+
metrics:
|
131 |
+
- name: pass@1
|
132 |
+
type: pass@1
|
133 |
+
value:
|
134 |
+
veriefied: false
|
135 |
+
- task:
|
136 |
+
type: text-generation
|
137 |
+
dataset:
|
138 |
+
type: reasoning
|
139 |
+
name: GPQA
|
140 |
+
metrics:
|
141 |
+
- name: pass@1
|
142 |
+
type: pass@1
|
143 |
+
value:
|
144 |
+
veriefied: false
|
145 |
+
- task:
|
146 |
+
type: text-generation
|
147 |
+
dataset:
|
148 |
+
type: reasoning
|
149 |
+
name: BBH
|
150 |
+
metrics:
|
151 |
+
- name: pass@1
|
152 |
+
type: pass@1
|
153 |
+
value:
|
154 |
+
veriefied: false
|
155 |
+
- task:
|
156 |
+
type: text-generation
|
157 |
+
dataset:
|
158 |
+
type: code
|
159 |
+
name: HumanEval
|
160 |
+
metrics:
|
161 |
+
- name: pass@1
|
162 |
+
type: pass@1
|
163 |
+
value:
|
164 |
+
veriefied: false
|
165 |
+
- task:
|
166 |
+
type: text-generation
|
167 |
+
dataset:
|
168 |
+
type: code
|
169 |
+
name: MBPP
|
170 |
+
metrics:
|
171 |
+
- name: pass@1
|
172 |
+
type: pass@1
|
173 |
+
value:
|
174 |
+
veriefied: false
|
175 |
+
- task:
|
176 |
+
type: text-generation
|
177 |
+
dataset:
|
178 |
+
type: math
|
179 |
+
name: GSM8K
|
180 |
+
metrics:
|
181 |
+
- name: pass@1
|
182 |
+
type: pass@1
|
183 |
+
value:
|
184 |
+
veriefied: false
|
185 |
+
- task:
|
186 |
+
type: text-generation
|
187 |
+
dataset:
|
188 |
+
type: math
|
189 |
+
name: MATH
|
190 |
+
metrics:
|
191 |
+
- name: pass@1
|
192 |
+
type: pass@1
|
193 |
+
value:
|
194 |
+
veriefied: false
|
195 |
+
- task:
|
196 |
+
type: text-generation
|
197 |
+
dataset:
|
198 |
+
type: multilingual
|
199 |
+
name: MGSM
|
200 |
+
metrics:
|
201 |
+
- name: pass@1
|
202 |
+
type: pass@1
|
203 |
+
value:
|
204 |
+
veriefied: false
|
205 |
+
--- -->
|
206 |
+
<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
|
207 |
+
|
208 |
+
# Granite-3.0-2B-Instruct
|
209 |
+
|
210 |
+
## Model Summary
|
211 |
+
**Granite-3.0-2B-Instruct** is a lightweight and open-source 2B parameter model fine tuned from *Granite-3.0-2B-Base* on a combination of open-source and proprietary instruction data with a **permissively licensed**. This language model is designed to excel in instruction following tasks such as summarization, problem-solving, text translation, reasoning, code tasks, funcion-calling, and more.
|
212 |
+
<!-- The lightweight and open-source nature of this model makes it an excellent choice to serve as backbone of real-time applications such as chatbots and conversational agents. -->
|
213 |
+
|
214 |
+
- **Developers:** IBM Research
|
215 |
+
- **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
|
216 |
+
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
217 |
+
- **Paper:** [Granite Language Models](https://) <!-- TO DO: Update github repo link when it is ready -->
|
218 |
+
- **Release Date**: October 21st, 2024
|
219 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
220 |
+
|
221 |
+
## Supported Languages
|
222 |
+
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified)
|
223 |
+
|
224 |
+
## Usage
|
225 |
+
### Intended use
|
226 |
+
The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including bussiness applications.
|
227 |
+
|
228 |
+
### Capabilities
|
229 |
+
* Summarization
|
230 |
+
* Text classification
|
231 |
+
* Text extraction
|
232 |
+
* Question-answering
|
233 |
+
* Retrieval Augmented Generation (RAG)
|
234 |
+
* Code related
|
235 |
+
* Function-calling
|
236 |
+
* Multilingual dialog use cases
|
237 |
+
|
238 |
+
### Generation
|
239 |
+
This is a simple example of how to use **Granite-3.0-2B-Instruct** model.
|
240 |
+
|
241 |
+
Install the following libraries:
|
242 |
+
|
243 |
+
```shell
|
244 |
+
pip install torch torchvision torchaudio
|
245 |
+
pip install accelerate
|
246 |
+
pip install transformers
|
247 |
+
```
|
248 |
+
Then, copy the snippet from the section that is relevant for your usecase.
|
249 |
+
|
250 |
+
```python
|
251 |
+
import torch
|
252 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
253 |
+
|
254 |
+
device = "auto"
|
255 |
+
model_path = "ibm-granite/granite-3.0-2b-instruct"
|
256 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
257 |
+
# drop device_map if running on CPU
|
258 |
+
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
|
259 |
+
model.eval()
|
260 |
+
# change input text as desired
|
261 |
+
chat = [
|
262 |
+
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
|
263 |
+
]
|
264 |
+
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
|
265 |
+
# tokenize the text
|
266 |
+
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
|
267 |
+
# generate output tokens
|
268 |
+
output = model.generate(**input_tokens,
|
269 |
+
max_new_tokens=100)
|
270 |
+
# decode output tokens into text
|
271 |
+
output = tokenizer.batch_decode(output)
|
272 |
+
# print output
|
273 |
+
print(output)
|
274 |
+
```
|
275 |
+
|
276 |
+
<!-- TO DO: function-calling-example
|
277 |
+
-->
|
278 |
+
|
279 |
+
## Model Architeture
|
280 |
+
**Granite-3.0-2B-Instruct** is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
|
281 |
+
|
282 |
+
| Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
|
283 |
+
| :-------- | :-------- | :--------| :--------| :--------|
|
284 |
+
| Embedding size | **2048** | 4096 | 1024 | 1536 |
|
285 |
+
| Number of layers | **40** | 40 | 24 | 32 |
|
286 |
+
| Attention head size | **64** | 128 | 64 | 64 |
|
287 |
+
| Number of attention heads | **32** | 32 | 16 | 24 |
|
288 |
+
| Number of KV heads | **8** | 8 | 8 | 8 |
|
289 |
+
| MLP hidden size | **8192** | 12800 | 512 | 512 |
|
290 |
+
| MLP activation | **SwiGLU** | SwiGLU | SwiGLU | SwiGLU |
|
291 |
+
| Number of Experts | **—** | — | 32 | 40 |
|
292 |
+
| MoE TopK | **—** | — | 8 | 8 |
|
293 |
+
| Initialization std | **0.1** | 0.1 | 0.1 | 0.1 |
|
294 |
+
| Sequence Length | **4096** | 4096 | 4096 | 4096 |
|
295 |
+
| Position Embedding | **RoPE** | RoPE | RoPE | RoPE |
|
296 |
+
| # Paremeters | **2.5B** | 8.1B | 1.3B | 3.3B |
|
297 |
+
| # Active Parameters | **2.5B** | 8.1B | 400M | 800M |
|
298 |
+
| # Training tokens | **12T** | 12T | 10T | 10T |
|
299 |
+
|
300 |
+
<!-- TO DO: To be completed once the paper is ready, we may changed title to Supervised Finetuning -->
|
301 |
+
## Training Data
|
302 |
+
This model is trained on a mix of open-source and proprietary datasets.
|
303 |
+
<!-- ### Instruction Datasets
|
304 |
+
* Language Instruction Datasets: We include high-quality datasets such as [TO DO: List of datasets]
|
305 |
+
* Synthetic Instruction Datasets: [TO DO: paragraph about synthetic data]
|
306 |
+
### Processing
|
307 |
+
* [TO DO: Data annotation with MagPie pipeline: quality, duplicates] -->
|
308 |
+
|
309 |
+
<!-- CHECK: removed Vela, only talk about blue-vela-->
|
310 |
+
## Infrastructure
|
311 |
+
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
312 |
+
|
313 |
+
<!-- TO DO: Check multilingual statement once the paper is ready -->
|
314 |
+
## Ethical Considerations and Limitations
|
315 |
+
Granite instruct models are primarily finetuned using instruction-response pairs mostly in English, but also in German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese (Simplified). As this model has been exposed to multilingual data, it can handle multilingual dialog use cases with a limited performance in non-English tasks. In such case, introducing a small number of examples (few-shot) can help the model in generating more accurate outputs. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-3.0-2B-Base](https://huggingface.co/ibm-granite/granite-3.0-2b-base)* model card.
|
316 |
+
|