File size: 15,586 Bytes
24f4359
 
 
c95afe2
24f4359
 
 
 
 
 
 
 
 
a205723
24f4359
67b0358
24f4359
67b0358
24f4359
 
 
d538c84
 
 
 
 
24f4359
98c4584
24f4359
 
 
 
 
b38dc02
24f4359
1f0fea5
24f4359
444939e
24f4359
f8157e8
24f4359
3c14459
24f4359
444939e
183333a
444939e
c95afe2
24f4359
d2c63d9
24f4359
 
98c4584
7e9b032
24f4359
d9c90c9
444939e
39282c7
 
 
 
e02db48
39282c7
 
444939e
90255eb
 
b1035cf
444939e
39282c7
24f4359
eb32620
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eabd984
eb32620
 
 
 
550157b
eb32620
 
 
 
 
 
32f8ebf
 
b38dc02
24f4359
2d6b65e
 
 
 
 
 
 
b80ed60
2d6b65e
c00e618
2d6b65e
 
 
 
 
 
24f4359
 
4736d3c
 
24f4359
4736d3c
b80ed60
 
24f4359
4736d3c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24f4359
 
 
 
39282c7
 
 
 
 
 
 
24f4359
39282c7
24f4359
 
 
183333a
 
5b701dc
 
 
 
98c4584
 
 
ae869ee
 
24f4359
 
074009c
d0f5027
ea1e122
075aa74
8bf9847
b38dc02
ea1e122
 
 
 
 
24f4359
2610017
39282c7
2866abf
 
0644137
 
 
 
 
39282c7
24f4359
2866abf
24f4359
 
 
 
b38dc02
24f4359
4dce67f
24f4359
 
 
 
 
 
 
07190f6
 
 
 
24f4359
 
b38dc02
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
license: apache-2.0
datasets:
- nicholasKluge/Pt-Corpus-Instruct
language:
- pt
metrics:
- perplexity
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
widget:
- text: "A PUCRS é uma universidade "
  example_title: Exemplo
- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
  example_title: Exemplo
- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
  example_title: Exemplo
inference:
  parameters:
    repetition_penalty: 1.2
    temperature: 0.2
    top_k: 20
    top_p: 0.2
    max_new_tokens: 150
co2_eq_emissions:
  emissions: 5.6
  source: CodeCarbon
  training_type: pre-training
  geographical_location: Germany
  hardware_used: NVIDIA A100-SXM4-40GB
---
# TeenyTinyLlama-160m

<img src="./logo.png" alt="A little llama wearing a mushroom hat and a monocle." height="200">

## Model Summary

Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: _a pair of small foundational models trained in Brazilian Portuguese._

## Details

- **Architecture:** a Transformer-based model pre-trained via causal language modeling
- **Size:** 162,417,408 parameters
- **Context length:** 2048 tokens
- **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
- **Language:** Portuguese
- **Number of steps:** 458,000 
- **GPU:** 1 NVIDIA A100-SXM4-40GB
- **Training time**: ~ 36 hours
- **Emissions:** 5.6 KgCO2 (Germany)
- **Total energy consumption:** 15.5 kWh

This repository has the [source code](https://github.com/Nkluge-correa/TeenyTinyLlama) used to train this model. The main libraries used are:

- [Transformers](https://github.com/huggingface/transformers)
- [PyTorch](https://github.com/pytorch/pytorch)
- [Datasets](https://github.com/huggingface/datasets)
- [Tokenizers](https://github.com/huggingface/tokenizers)
- [Sentencepiece](https://github.com/google/sentencepiece)
- [Accelerate](https://github.com/huggingface/accelerate)
- [Codecarbon](https://github.com/mlco2/codecarbon)

Check out the training logs in [Weights and Biases](https://api.wandb.ai/links/nkluge-correa/vws4g032).

## Training Set-up

These are the main arguments used in the training of this model:

| Arguments                     | Value                                |
|-------------------------------|--------------------------------------|
| vocabulary size               | 32000                                |
| hidden dimension size         | 768                                  |
| intermediate dimension size   | 3072                                 |
| context length                | 2048                                 |
| nº attention heads            | 12                                   |
| nº hidden layers              | 12                                   |
| nº key value heads            | 12                                   |
| nº training samples           | 1831873                              |
| nº validation samples         | 18000                                |
| nº epochs                     | 1                                    |
| evaluation steps				| 100000							   |
| train batch size              | 4                                    |
| eval batch size               | 4                                    |
| gradient accumulation steps   | 1                                    |
| optimizer                     | torch.optim.AdamW                    |
| learning rate                 | 0.0006                               |
| adam epsilon                  | 0.00000001                           |
| weight decay                  | 0.01                                 |
| scheduler type                | "cosine"                             |
| warmup steps                  | 5000                                 | 
| gradient checkpointing        | false                                |
| seed                          | 42                                   |
| mixed precision               | 'no'                                 |
| torch dtype                   | "float32"                            |
| tf32                          | true                                 |

## Intended Uses

The primary intended use of TeenyTinyLlama is to research the behavior, functionality, and limitations of large language models. Checkpoints saved during training are intended to provide a controlled setting for performing scientific experiments. You may also further fine-tune and adapt TeenyTinyLlama-160m for deployment, as long as your use is in accordance with the Apache 2.0 license. If you decide to use pre-trained TeenyTinyLlama-160m as a basis for your fine-tuned model, please conduct your own risk and bias assessment. 

## Basic usage

Using the `pipeline`:

```python
from transformers import pipeline

generator = pipeline("text-generation", model="nicholasKluge/TeenyTinyLlama-160m")

completions  = generator("Astronomia é a ciência", num_return_sequences=2, max_new_tokens=100)

for comp in completions:
  print(f"🤖 {comp['generated_text']}")
```

Using the `AutoTokenizer` and `AutoModelForCausalLM`:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and the tokenizer
tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/TeenyTinyLlama-160m", revision='main')
model = AutoModelForCausalLM.from_pretrained("nicholasKluge/TeenyTinyLlama-160m", revision='main')

# Pass the model to your device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.eval()
model.to(device)

# Tokenize the inputs and pass them to the device
inputs = tokenizer("Astronomia é a ciência", return_tensors="pt").to(device)

# Generate some text
completions = model.generate(**inputs, num_return_sequences=2, max_new_tokens=100)

# Print the generated text
for i, completion in enumerate(completions):
    print(f'🤖 {tokenizer.decode(completion)}')
```

## Limitations

- **Hallucinations:** This model can produce content that can be mistaken for truth but is, in fact, misleading or entirely false, i.e., hallucination.

- **Biases and Toxicity:** This model inherits the social and historical stereotypes from the data used to train it. Given these biases, the model can produce toxic content, i.e., harmful, offensive, or detrimental to individuals, groups, or communities.

- **Unreliable Code:** The model may produce incorrect code snippets and statements. These code generations should not be treated as suggestions or accurate solutions.

- **Language Limitations:** The model is primarily designed to understand standard Portuguese (BR). Other languages might challenge its comprehension, leading to potential misinterpretations or errors in response.

- **Repetition and Verbosity:** The model may get stuck on repetition loops (especially if the repetition penalty during generations is set to a meager value) or produce verbose responses unrelated to the prompt it was given.

## Evaluations

| Steps   | Evaluation Loss | Perplexity | Total Energy Consumption | Emissions    |
|---------|-----------------|------------|--------------------------|--------------|
| 100,000 | 3.19            | 24.52      | 3.75 kWh                 | 1.28 KgCO2eq |
| 200,000 | 3.02            | 20.58      | 7.51 kWh                 | 2.56 KgCO2eq |
| 300,000 | 2.83            | 16.98      | 11.25 kWh                | 3.84 KgCO2eq |
| 400,000 | 2.79            | 16.41      | 14.52 kWh                | 5.11 KgCO2eq |

## Benchmarks

Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness. The results of models marked with an "*" were extracted from the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

| Models                                                                              | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
|-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
| [Pythia-410m](https://huggingface.co/EleutherAI/pythia-410m-deduped)                | 33.26   | 24.83*                                  | 41.29*                                        | 25.99*                                   | 40.95*                                         |
| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)     | 33.01   | 29.40                                   | 33.00                                         | 28.55                                    | 41.10                                          |
| [Bloom-560m](https://huggingface.co/bigscience/bloom-560m)                          | 32.13   | 24.74*                                  | 37.15*                                        | 24.22*                                   | 42.44*                                         |
| [Xglm-564M](https://huggingface.co/facebook/xglm-564M)                              | 31.97   | 25.56                                   | 34.64*                                        | 25.18*                                   | 42.53                                          |
| [OPT-350m](https://huggingface.co/facebook/opt-350m)                                | 31.78   | 23.55*                                  | 36.73*                                        | 26.02*                                   | 40.83*                                         |
| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)     | 31.16   | 26.15                                   | 29.29                                         | 28.11                                    | 41.12                                          |
| [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped)                | 31.16   | 24.06*                                  | 31.39*                                        | 24.86*                                   | 44.34*                                         |
| [OPT-125m](https://huggingface.co/facebook/opt-125m)                                | 30.80   | 22.87                                   | 31.47                                         | 26.02                                    | 42.87                                          |
| [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22   | 22.48*                                  | 29.62*                                        | 27.36*                                   | 41.44*                                         |
| [Gpt2-small](https://huggingface.co/gpt2)                                           | 29.97   | 21.48*                                  | 31.60*                                        | 25.79*                                   | 40.65*                                         | 
| [Multilingual GPT](https://huggingface.co/ai-forever/mGPT)                          | 29.45   | 24.79                                   | 26.37*                                        | 25.17*                                   | 41.50                                          |

## Fine-Tuning Comparisons

| Models                                                                                      | Average | [IMDB](https://huggingface.co/datasets/christykoh/imdb_pt) | [FaQuAD-NLI](https://huggingface.co/datasets/ruanchaves/faquad-nli) | [HateBr](https://huggingface.co/datasets/ruanchaves/hatebr) | [Assin2](https://huggingface.co/datasets/assin2) | [AgNews](https://huggingface.co/datasets/maritaca-ai/ag_news_pt) |
|---------------------------------------------------------------------------------------------|---------|------------------------------------------------------------|---------------------------------------------------------------------|-------------------------------------------------------------|--------------------------------------------------|------------------------------------------------------------------|
| [Bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) | 92.09   | 93.58                                                      | 92.26                                                               | 91.57                                                       | 88.97                                            | 94.11                                                            |
| [Bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased)  | 91.64   | 92.22                                                      | 93.07                                                               | 91.28                                                       | 87.45                                            | 94.19                                                            |
| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)             | 91.19   | 91.64                                                      | 91.18                                                               | 92.28                                                       | 86.43                                            | 94.42                                                            |
| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)             | 90.33   | 91.14                                                      | 90.00                                                               | 90.71                                                       | 85.78                                            | 94.05                                                            |
| [Gpt2-small-portuguese](https://huggingface.co/pierreguillou/gpt2-small-portuguese)         | 89.13   | 91.60                                                      | 86.46                                                               | 87.42                                                       | 86.11                                            | 94.07                                                            |

## Cite as 🤗
 
```latex

@misc{nicholas22llama,
  doi = {10.5281/zenodo.6989727},
  url = {https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m},
  author = {Nicholas Kluge Corrêa},
  title = {TeenyTinyLlama},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
}

```

## Funding

This repository was built as part of the RAIES ([Rede de Inteligência Artificial Ética e Segura](https://www.raies.org/)) initiative, a project supported by FAPERGS - ([Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul](https://fapergs.rs.gov.br/inicial)), Brazil.

## License

TeenyTinyLlama-160m is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.