nicholasKluge
/

TeenyTinyLlama-160m

@@ -43,7 +43,7 @@ Teeny-tiny-llama has been trained by leveraging scaling laws to determine the op
 - **Compact Design:** Teeny-tiny-llama is a downsized version of the Llama 2 architecture, making it suitable for applications with limited computational resources.
-- **Optimized Scaling:** The model has been pre-trained using scaling logs to identify the ideal token-to-parameter ratio.
 - **Custom Portuguese Dataset:** Teeny-tiny-llama has been trained on a custom Portuguese dataset. This dataset includes diverse linguistic contexts and preference pre-training, allowing the model to better cater to Portuguese language nuances and be better suited for fine-tuning tasks like instruction-tuning.
@@ -54,13 +54,13 @@ This repository has 21 checkpoints, saved as revisions, that were logged during
 - **Size:** 162,417,408 million parameters
 - **Dataset:** [Portuguese-Corpus-v3](https://huggingface.co/datasets/nicholasKluge/portuguese-corpus-v3)
 - **Language:** Portuguese
-- **Number of steps:** 457,969
-- **Batch size:** 4
 - **Optimizer:** `torch.optim.AdamW` (warmup_ratio = 0.01, learning_rate = 6e-4, epsilon = 1e-8)
 - **GPU:** 1 NVIDIA A100-SXM4-40GB
 - **Training time**: ~ 36 hours
 - **Emissions:** 5.6 KgCO2 (Germany)
-- **Total Energy Consumption:** 15.5 kWh
 This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
@@ -163,7 +163,7 @@ for i, completion in enumerate(completions):
 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
-| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                        | 29.62                              | 27.36                          | 41.44                            |
 * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.

 - **Compact Design:** Teeny-tiny-llama is a downsized version of the Llama 2 architecture, making it suitable for applications with limited computational resources.
+- **Optimized Scaling:** The model has been pre-trained using scaling laws to identify the ideal token-to-parameter ratio.
 - **Custom Portuguese Dataset:** Teeny-tiny-llama has been trained on a custom Portuguese dataset. This dataset includes diverse linguistic contexts and preference pre-training, allowing the model to better cater to Portuguese language nuances and be better suited for fine-tuning tasks like instruction-tuning.
 - **Size:** 162,417,408 million parameters
 - **Dataset:** [Portuguese-Corpus-v3](https://huggingface.co/datasets/nicholasKluge/portuguese-corpus-v3)
 - **Language:** Portuguese
+- **Number of steps:** 457,969 (3.7B tokens)
+- **Batch size:** 4 (8192 tokens)
 - **Optimizer:** `torch.optim.AdamW` (warmup_ratio = 0.01, learning_rate = 6e-4, epsilon = 1e-8)
 - **GPU:** 1 NVIDIA A100-SXM4-40GB
 - **Training time**: ~ 36 hours
 - **Emissions:** 5.6 KgCO2 (Germany)
+- **Total energy consumption:** 15.5 kWh
 This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
 | Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
 |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
+| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48                                   | 29.62                                         | 27.36                                    | 41.44                                          |
 * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.