nicholasKluge
commited on
Commit
•
7e9b032
1
Parent(s):
d538c84
Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ Teeny-tiny-llama has been trained by leveraging scaling laws to determine the op
|
|
43 |
|
44 |
- **Compact Design:** Teeny-tiny-llama is a downsized version of the Llama 2 architecture, making it suitable for applications with limited computational resources.
|
45 |
|
46 |
-
- **Optimized Scaling:** The model has been pre-trained using scaling
|
47 |
|
48 |
- **Custom Portuguese Dataset:** Teeny-tiny-llama has been trained on a custom Portuguese dataset. This dataset includes diverse linguistic contexts and preference pre-training, allowing the model to better cater to Portuguese language nuances and be better suited for fine-tuning tasks like instruction-tuning.
|
49 |
|
@@ -54,13 +54,13 @@ This repository has 21 checkpoints, saved as revisions, that were logged during
|
|
54 |
- **Size:** 162,417,408 million parameters
|
55 |
- **Dataset:** [Portuguese-Corpus-v3](https://huggingface.co/datasets/nicholasKluge/portuguese-corpus-v3)
|
56 |
- **Language:** Portuguese
|
57 |
-
- **Number of steps:** 457,969
|
58 |
-
- **Batch size:** 4
|
59 |
- **Optimizer:** `torch.optim.AdamW` (warmup_ratio = 0.01, learning_rate = 6e-4, epsilon = 1e-8)
|
60 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
61 |
- **Training time**: ~ 36 hours
|
62 |
- **Emissions:** 5.6 KgCO2 (Germany)
|
63 |
-
- **Total
|
64 |
|
65 |
This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
|
66 |
|
@@ -163,7 +163,7 @@ for i, completion in enumerate(completions):
|
|
163 |
|
164 |
| Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|
165 |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
|
166 |
-
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48
|
167 |
|
168 |
* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
|
169 |
|
|
|
43 |
|
44 |
- **Compact Design:** Teeny-tiny-llama is a downsized version of the Llama 2 architecture, making it suitable for applications with limited computational resources.
|
45 |
|
46 |
+
- **Optimized Scaling:** The model has been pre-trained using scaling laws to identify the ideal token-to-parameter ratio.
|
47 |
|
48 |
- **Custom Portuguese Dataset:** Teeny-tiny-llama has been trained on a custom Portuguese dataset. This dataset includes diverse linguistic contexts and preference pre-training, allowing the model to better cater to Portuguese language nuances and be better suited for fine-tuning tasks like instruction-tuning.
|
49 |
|
|
|
54 |
- **Size:** 162,417,408 million parameters
|
55 |
- **Dataset:** [Portuguese-Corpus-v3](https://huggingface.co/datasets/nicholasKluge/portuguese-corpus-v3)
|
56 |
- **Language:** Portuguese
|
57 |
+
- **Number of steps:** 457,969 (3.7B tokens)
|
58 |
+
- **Batch size:** 4 (8192 tokens)
|
59 |
- **Optimizer:** `torch.optim.AdamW` (warmup_ratio = 0.01, learning_rate = 6e-4, epsilon = 1e-8)
|
60 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
61 |
- **Training time**: ~ 36 hours
|
62 |
- **Emissions:** 5.6 KgCO2 (Germany)
|
63 |
+
- **Total energy consumption:** 15.5 kWh
|
64 |
|
65 |
This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
|
66 |
|
|
|
163 |
|
164 |
| Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|
165 |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
|
166 |
+
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
|
167 |
|
168 |
* Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
|
169 |
|