nicholasKluge commited on
Commit
840519b
1 Parent(s): 53464c1

Upload 14 files

Browse files
Aira_emissions.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2
+ 2023-12-08T22:42:54,Aira-2,48d59d1a-e7c7-4a88-8d77-f5bfbfe56cb8,31111.21229338646,1.7137292582164891,5.5083975579466225e-05,42.5,0.0,31.305264472961426,0.36728449035220717,2.874421472868672,0.2704877384588287,3.5121937016797027,Singapore,SGP,,,,Linux-5.15.120+-x86_64-with-glibc2.35,3.10.12,2.3.2,12,Intel(R) Xeon(R) CPU @ 2.20GHz,1,1 x NVIDIA A100-SXM4-40GB,103.8503,1.2868,83.48070526123047,machine,N,1.0
README.md ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - nicholasKluge/instruct-aira-dataset
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ library_name: transformers
10
+ tags:
11
+ - alignment
12
+ - instruction tuned
13
+ - text generation
14
+ - conversation
15
+ - assistant
16
+ pipeline_tag: text-generation
17
+ widget:
18
+ - text: "How should I call you?<|endofinstruction|>"
19
+ example_title: Greetings
20
+ - text: "Can you explain what is Machine Learning?<|endofinstruction|>"
21
+ example_title: Machine Learning
22
+ - text: "Do you know anything about virtue ethics?<|endofinstruction|>"
23
+ example_title: Ethics
24
+ - text: "How can I make my girlfriend happy?<|endofinstruction|>"
25
+ example_title: Advise
26
+ inference:
27
+ parameters:
28
+ repetition_penalty: 1.2
29
+ temperature: 0.2
30
+ top_k: 30
31
+ top_p: 0.3
32
+ max_new_tokens: 200
33
+ length_penalty: 0.3
34
+ early_stopping: true
35
+ co2_eq_emissions:
36
+ emissions: 1.71
37
+ source: CodeCarbon
38
+ training_type: fine-tuning
39
+ geographical_location: Singapore
40
+ hardware_used: NVIDIA A100-SXM4-40GB
41
+ ---
42
+ # Aira-2-1B1
43
+
44
+ `Aira-2` is the second version of the Aira instruction-tuned series. `Aira-2-1B1` is an instruction-tuned model based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T). The model was trained with a dataset composed of prompts and completions generated synthetically by prompting already-tuned models (ChatGPT, Llama, Open-Assistant, etc).
45
+
46
+ Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
47
+
48
+ ## Details
49
+
50
+ - **Size:** 1,261,545,472 parameters
51
+ - **Dataset:** [Instruct-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/instruct-aira-dataset)
52
+ - **Language:** English
53
+ - **Number of Epochs:** 3
54
+ - **Batch size:** 4
55
+ - **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
56
+ - **GPU:** 1 NVIDIA A100-SXM4-40GB
57
+ - **Emissions:** 1.71 KgCO2 (Singapore)
58
+ - **Total Energy Consumption:** 3.51 kWh
59
+
60
+ This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
61
+
62
+ ## Usage
63
+
64
+ Three special tokens are used to mark the user side of the interaction and the model's response:
65
+
66
+ `<|startofinstruction|>`What is a language model?`<|endofinstruction|>`A language model is a probability distribution over a vocabulary.`<|endofcompletion|>`
67
+
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+ import torch
71
+
72
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
73
+
74
+ tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/Aira-2-1B1')
75
+ aira = AutoModelForCausalLM.from_pretrained('nicholasKluge/Aira-2-1B1')
76
+
77
+ aira.eval()
78
+ aira.to(device)
79
+
80
+ question = input("Enter your question: ")
81
+
82
+ inputs = tokenizer(tokenizer.bos_token + question + tokenizer.sep_token,
83
+ add_special_tokens=False,
84
+ return_tensors="pt").to(device)
85
+
86
+ responses = aira.generate(**inputs,
87
+ do_sample=True,
88
+ top_k=50,
89
+ top_p=0.95,
90
+ temperature=0.7,
91
+ num_return_sequences=2)
92
+
93
+ print(f"Question: 👤 {question}\n")
94
+
95
+ for i, response in enumerate(responses):
96
+ print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
97
+ ```
98
+
99
+ The model will output something like:
100
+
101
+ ```markdown
102
+ >>>Question: 👤 What is the capital of Brazil?
103
+
104
+ >>>Response 1: 🤖 The capital of Brazil is Brasília.
105
+ >>>Response 2: 🤖 The capital of Brazil is Brasília.
106
+ ```
107
+
108
+ ## Limitations
109
+
110
+ 🤥 Generative models can perpetuate the generation of pseudo-informative content, that is, false information that may appear truthful.
111
+
112
+ 🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.
113
+
114
+ ## Evaluation
115
+
116
+ | Model (TinyLlama) | Average | [ARC](https://arxiv.org/abs/1803.05457) | [TruthfulQA](https://arxiv.org/abs/2109.07958) | [ToxiGen](https://arxiv.org/abs/2203.09509) |
117
+ |---------------------------------------------------------------|-----------|-----------------------------------------|------------------------------------------------|---------------------------------------------|
118
+ | [Aira-2-1B1](https://huggingface.co/nicholasKluge/Aira-2-1B1) | **42.55** | 25.26 | **50.81** | **51.59** |
119
+ | TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T | 37.52 | **30.89** | 39.55 | 42.13 |
120
+
121
+
122
+ * Evaluations were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)).
123
+
124
+ ## Cite as 🤗
125
+
126
+ ```latex
127
+
128
+ @misc{nicholas22aira,
129
+ doi = {10.5281/zenodo.6989727},
130
+ url = {https://huggingface.co/nicholasKluge/Aira-2-1B1},
131
+ author = {Nicholas Kluge Corrêa},
132
+ title = {Aira},
133
+ year = {2023},
134
+ publisher = {HuggingFace},
135
+ journal = {HuggingFace repository},
136
+ }
137
+
138
+ ```
139
+
140
+ ## License
141
+
142
+ The `Aira-2-1B1` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
143
+
144
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
145
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_nicholasKluge__Aira-2-1B1)
146
+
147
+ | Metric | Value |
148
+ |-----------------------|---------------------------|
149
+ | Avg. | 25.19 |
150
+ | ARC (25-shot) | 23.21 |
151
+ | HellaSwag (10-shot) | 26.97 |
152
+ | MMLU (5-shot) | 24.86 |
153
+ | TruthfulQA (0-shot) | 50.63 |
154
+ | Winogrande (5-shot) | 50.28 |
155
+ | GSM8K (5-shot) | 0.0 |
156
+ | DROP (3-shot) | 0.39 |
added_tokens.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "<|endofcompletion|>": 32001,
3
+ "<|endofinstruction|>": 32003,
4
+ "<|pad|>": 32004,
5
+ "<|startofinstruction|>": 32000,
6
+ "<|unk|>": 32002
7
+ }
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 2048,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 5632,
13
+ "max_position_embeddings": 2048,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 22,
17
+ "num_key_value_heads": 4,
18
+ "pretraining_tp": 1,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_scaling": null,
21
+ "rope_theta": 10000.0,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.35.2",
25
+ "use_cache": false,
26
+ "vocab_size": 32005
27
+ }
generation_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 32000,
3
+ "eos_token_id": 32001,
4
+ "pad_token_id": 32004,
5
+ "unk_token_id": 32002,
6
+ "sep_token_id": 32003,
7
+ "do_sample": true,
8
+ "max_new_tokens": 512,
9
+ "renormalize_logits": true,
10
+ "repetition_penalty": 1.1,
11
+ "temperature": 0.3,
12
+ "top_k": 30,
13
+ "top_p": 0.3,
14
+ "transformers_version": "4.35.2",
15
+ "use_cache": false
16
+ }
lr_scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e728a07c8526d9c0607dd05bdf84b8ab489f6a636d7d07fe63d215d167fe8df
3
+ size 1076
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:297df1581fb78e91cb6945729c2be0c21a02c0c40c81a29b1e359d95d950edf8
3
+ size 4400298456
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f82f8e8f8468077b36ed645eb8005e1d406dde1af7cac2b948b81979706ad33
3
+ size 8800724018
rng_state.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c6de1a011acd92b3bf4b92b531ffec0e4cab0f4980f976de55bd76ffd19f487
3
+ size 6246
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|startofinstruction|>",
3
+ "eos_token": "<|endofcompletion|>",
4
+ "pad_token": "<|pad|>",
5
+ "sep_token": "<|endofinstruction|>",
6
+ "unk_token": "<|unk|>"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<unk>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "32000": {
28
+ "content": "<|startofinstruction|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "32001": {
36
+ "content": "<|endofcompletion|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "32002": {
44
+ "content": "<|unk|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "32003": {
52
+ "content": "<|endofinstruction|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "32004": {
60
+ "content": "<|pad|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ }
67
+ },
68
+ "bos_token": "<|startofinstruction|>",
69
+ "clean_up_tokenization_spaces": false,
70
+ "eos_token": "<|endofcompletion|>",
71
+ "legacy": false,
72
+ "model_max_length": 1000000000000000019884624838656,
73
+ "pad_token": "<|pad|>",
74
+ "padding_side": "right",
75
+ "sep_token": "<|endofinstruction|>",
76
+ "sp_model_kwargs": {},
77
+ "tokenizer_class": "LlamaTokenizer",
78
+ "unk_token": "<|unk|>",
79
+ "use_default_system_prompt": false
80
+ }