data update
Browse files
README.md
CHANGED
@@ -209,7 +209,7 @@ model-index:
|
|
209 |
# Granite-3.0-3B-A800M-Base
|
210 |
|
211 |
## Model Summary
|
212 |
-
**Granite-3.0-3B-A800M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-3B-A800M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains
|
213 |
|
214 |
|
215 |
- **Developers:** IBM Research
|
@@ -281,9 +281,10 @@ print(output)
|
|
281 |
|
282 |
<!-- TO DO: To be completed once the paper is ready -->
|
283 |
## Training Data
|
284 |
-
This model is trained on a mix of open-source and proprietary
|
|
|
|
|
285 |
|
286 |
-
<!-- CHECK: removed Vela, only talk about blue-vela-->
|
287 |
## Infrastructure
|
288 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
289 |
|
|
|
209 |
# Granite-3.0-3B-A800M-Base
|
210 |
|
211 |
## Model Summary
|
212 |
+
**Granite-3.0-3B-A800M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-3B-A800M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
|
213 |
|
214 |
|
215 |
- **Developers:** IBM Research
|
|
|
281 |
|
282 |
<!-- TO DO: To be completed once the paper is ready -->
|
283 |
## Training Data
|
284 |
+
This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
|
285 |
+
* Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
|
286 |
+
* Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
|
287 |
|
|
|
288 |
## Infrastructure
|
289 |
We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
290 |
|