nicholasKluge commited on
Commit
6c5d3de
1 Parent(s): 9bb929e

Upload 12 files

Browse files
AIRA_FineTuning.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
Aira_emissions.csv CHANGED
@@ -1,2 +1,2 @@
1
  timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2
- 2023-06-11T00:44:09,Aira_emissions,cf6bd6e6-4983-41ba-b2a3-bea0e4ca0a1b,4011.559634208679,0.15083361627850095,3.7599744247165954e-05,42.5,343.846,31.30528450012207,0.04735859059956342,0.37325643035497413,0.034865123505880974,0.4554801444604184,Netherlands,NLD,groningen,,,Linux-5.15.107+-x86_64-with-glibc2.31,3.10.12,2.2.3,12,Intel(R) Xeon(R) CPU @ 2.20GHz,1,1 x NVIDIA A100-SXM4-40GB,6.5765,53.2157,83.48075866699219,machine,N,1.0
 
1
  timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2
+ 2023-06-26T22:38:01,Aira_emissions,bd08affb-b1e2-4849-8513-a85a02cf0f84,3690.1905386447906,0.0009893192359507477,2.6809435057358087e-07,42.5,296.394,31.30528450012207,0.04356464091208248,0.34052867170535045,0.03207338637952947,0.41616669899696207,Canada,CAN,quebec,,,Linux-5.15.107+-x86_64-with-glibc2.31,3.10.12,2.2.4,12,Intel(R) Xeon(R) CPU @ 2.20GHz,1,1 x NVIDIA A100-SXM4-40GB,-71.2,46.8,83.48075866699219,machine,N,1.0
README.md CHANGED
@@ -44,7 +44,6 @@ inference:
44
 
45
  The dataset used to train this model combines the following sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset, the [`databricks_dolly_15k`](https://huggingface.co/datasets/HuggingFaceH4/databricks_dolly_15k) dataset, the [`instruction-dataset`](https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset) dataset, and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset, focused on Q&A related to Ethics, AI, AI safety, and other related topics. The dataset is available in both Portuguese and English.
46
 
47
-
48
  Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
49
 
50
  ## Details
@@ -56,22 +55,22 @@ Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Ai
56
  - **Batch size:** 32
57
  - **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
58
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
59
- - **Emissions:** 0.15 KgCO2 (Netherlands)
60
- - **Total Energy Consumption:** 0.45 kWh
61
 
62
  | Epoch/Loss|Training|Validation|
63
  |---|---|---|
64
- | 1 |0.932626|0.767844|
65
- | 2 |0.728739|0.723823|
66
- | 3 |0.649202|0.705316|
67
- | 4 |0.589048|0.698928|
68
- | 5 |0.542641|0.700216|
69
 
70
  This repository has the notebook used to train this model.
71
 
72
  ## Usage
73
 
74
- Two special tokens are used to mark the user side of the interaction and the model's response:
75
 
76
  `<|startoftext|>`What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary.`<|endoftext|>`
77
 
 
44
 
45
  The dataset used to train this model combines the following sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset, the [`databricks_dolly_15k`](https://huggingface.co/datasets/HuggingFaceH4/databricks_dolly_15k) dataset, the [`instruction-dataset`](https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset) dataset, and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset, focused on Q&A related to Ethics, AI, AI safety, and other related topics. The dataset is available in both Portuguese and English.
46
 
 
47
  Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
48
 
49
  ## Details
 
55
  - **Batch size:** 32
56
  - **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
57
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
58
+ - **Emissions:** 0.0009 KgCO2 (Canada)
59
+ - **Total Energy Consumption:** 0.41 kWh
60
 
61
  | Epoch/Loss|Training|Validation|
62
  |---|---|---|
63
+ | 1 |0.947100|0.774946|
64
+ | 2 |0.737357|0.730962|
65
+ | 3 |0.657410|0.710232|
66
+ | 4 |0.597437|0.705064|
67
+ | 5 |0.551684|0.704830|
68
 
69
  This repository has the notebook used to train this model.
70
 
71
  ## Usage
72
 
73
+ Two special tokens are used to mark the user side of the interaction and the model's response:
74
 
75
  `<|startoftext|>`What is a language model?`<|endoftext|>`A language model is a probability distribution over a vocabulary.`<|endoftext|>`
76
 
config.json CHANGED
@@ -33,7 +33,7 @@
33
  }
34
  },
35
  "torch_dtype": "float32",
36
- "transformers_version": "4.30.1",
37
  "use_cache": true,
38
  "vocab_size": 50259
39
  }
 
33
  }
34
  },
35
  "torch_dtype": "float32",
36
+ "transformers_version": "4.30.2",
37
  "use_cache": true,
38
  "vocab_size": 50259
39
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 50256,
4
  "eos_token_id": 50256,
5
- "transformers_version": "4.30.1"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 50256,
4
  "eos_token_id": 50256,
5
+ "transformers_version": "4.30.2"
6
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a26f52c1f1d8775168735421c8af722a3239d675cab65ad4be2e790c32e43ac5
3
  size 497813341
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f01da44af4eef5e609983099507e6c2e6c92bb149afa3723d555cdf3a32c4c5
3
  size 497813341
training_stats.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0fe12b4d63961cadd00a221e2e9b61bbda28bc7adce68cdea4ee6282117fbfee
3
  size 3108
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63cec774a93f84808183ddf0dacaca250ff645a5e6883cdfd4ea3f96a0cce3fa
3
  size 3108
vocab.json CHANGED
The diff for this file is too large to render. See raw diff