nicholasKluge commited on
Commit
7e9b032
1 Parent(s): d538c84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -43,7 +43,7 @@ Teeny-tiny-llama has been trained by leveraging scaling laws to determine the op
43
 
44
  - **Compact Design:** Teeny-tiny-llama is a downsized version of the Llama 2 architecture, making it suitable for applications with limited computational resources.
45
 
46
- - **Optimized Scaling:** The model has been pre-trained using scaling logs to identify the ideal token-to-parameter ratio.
47
 
48
  - **Custom Portuguese Dataset:** Teeny-tiny-llama has been trained on a custom Portuguese dataset. This dataset includes diverse linguistic contexts and preference pre-training, allowing the model to better cater to Portuguese language nuances and be better suited for fine-tuning tasks like instruction-tuning.
49
 
@@ -54,13 +54,13 @@ This repository has 21 checkpoints, saved as revisions, that were logged during
54
  - **Size:** 162,417,408 million parameters
55
  - **Dataset:** [Portuguese-Corpus-v3](https://huggingface.co/datasets/nicholasKluge/portuguese-corpus-v3)
56
  - **Language:** Portuguese
57
- - **Number of steps:** 457,969
58
- - **Batch size:** 4
59
  - **Optimizer:** `torch.optim.AdamW` (warmup_ratio = 0.01, learning_rate = 6e-4, epsilon = 1e-8)
60
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
61
  - **Training time**: ~ 36 hours
62
  - **Emissions:** 5.6 KgCO2 (Germany)
63
- - **Total Energy Consumption:** 15.5 kWh
64
 
65
  This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
66
 
@@ -163,7 +163,7 @@ for i, completion in enumerate(completions):
163
 
164
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
165
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
166
- | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
167
 
168
  * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
169
 
 
43
 
44
  - **Compact Design:** Teeny-tiny-llama is a downsized version of the Llama 2 architecture, making it suitable for applications with limited computational resources.
45
 
46
+ - **Optimized Scaling:** The model has been pre-trained using scaling laws to identify the ideal token-to-parameter ratio.
47
 
48
  - **Custom Portuguese Dataset:** Teeny-tiny-llama has been trained on a custom Portuguese dataset. This dataset includes diverse linguistic contexts and preference pre-training, allowing the model to better cater to Portuguese language nuances and be better suited for fine-tuning tasks like instruction-tuning.
49
 
 
54
  - **Size:** 162,417,408 million parameters
55
  - **Dataset:** [Portuguese-Corpus-v3](https://huggingface.co/datasets/nicholasKluge/portuguese-corpus-v3)
56
  - **Language:** Portuguese
57
+ - **Number of steps:** 457,969 (3.7B tokens)
58
+ - **Batch size:** 4 (8192 tokens)
59
  - **Optimizer:** `torch.optim.AdamW` (warmup_ratio = 0.01, learning_rate = 6e-4, epsilon = 1e-8)
60
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
61
  - **Training time**: ~ 36 hours
62
  - **Emissions:** 5.6 KgCO2 (Germany)
63
+ - **Total energy consumption:** 15.5 kWh
64
 
65
  This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
66
 
 
163
 
164
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
165
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
166
+ | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
167
 
168
  * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
169