wissamantoun commited on
Commit
9a0e5ca
1 Parent(s): c71eba5

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - deberta-v2
6
+ - text-classification
7
+ - review-classification
8
+ base_model: almanach/camembertav2-base
9
+ datasets:
10
+ - FLUE-CLS
11
+ metrics:
12
+ - accuracy
13
+ pipeline_tag: text-classification
14
+ library_name: transformers
15
+ widget:
16
+ # example for the french classification model
17
+ - text: "Le livre est très intéressant et j'ai appris beaucoup de choses."
18
+ example_title: Books Review
19
+ - text: "Le film était ennuyeux et je n'ai pas aimé les acteurs."
20
+ example_title: DVD Review
21
+ - text: "La musique était très bonne et j'ai adoré les paroles."
22
+ example_title: Music Review
23
+ model-index:
24
+ - name: almanach/camembertav2-base-cls
25
+ results:
26
+ - task:
27
+ type: text-classification
28
+ name: Amazon Review Classification
29
+ dataset:
30
+ type: flue-cls
31
+ name: FLUE-CLS
32
+ metrics:
33
+ - name: accuracy
34
+ type: accuracy
35
+ value: 0.95849
36
+ verified: false
37
+ ---
38
+
39
+ # Model Card for almanach/camembertav2-base-cls
40
+
41
+ almanach/camembertav2-base-cls is a deberta-v2 model for text classification. It is trained on the FLUE-CLS dataset for the task of Amazon Review Classification. The model achieves an accuracy of 0.95849 on the FLUE-CLS dataset.
42
+
43
+ The model is part of the almanach/camembertav2-base family of model finetunes.
44
+
45
+ ## Model Details
46
+
47
+ ### Model Description
48
+
49
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
50
+ - **Model type:** deberta-v2
51
+ - **Language(s) (NLP):** French
52
+ - **License:** MIT
53
+ - **Finetuned from model [optional]:** almanach/camembertav2-base
54
+
55
+ ### Model Sources [optional]
56
+
57
+ <!-- Provide the basic links for the model. -->
58
+
59
+ - **Repository:** https://github.com/WissamAntoun/camemberta
60
+ - **Paper:** https://arxiv.org/abs/2411.08868
61
+
62
+ ## Uses
63
+
64
+ The model can be used for text classification tasks in French of Movie, Music, and Book reviews from Amazon.
65
+
66
+ ## Bias, Risks, and Limitations
67
+
68
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
69
+
70
+
71
+ ## How to Get Started with the Model
72
+
73
+ Use the code below to get started with the model.
74
+
75
+ ```python
76
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
77
+
78
+ model = AutoModelForSequenceClassification.from_pretrained("almanach/camembertav2-base-cls")
79
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camembertav2-base-cls")
80
+
81
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
82
+
83
+ classifier("Le livre est très intéressant et j'ai appris beaucoup de choses.")
84
+ ```
85
+
86
+
87
+ ## Training Details
88
+
89
+ ### Training Data
90
+
91
+ The model is trained on the FLUE-CLS dataset.
92
+
93
+ - Dataset Name: FLUE-CLS
94
+ - Dataset Size:
95
+ - Train: 5997
96
+ - Test: 5999
97
+
98
+
99
+ ### Training Procedure
100
+
101
+ Model trained with the run_classification.py script from the huggingface repository.
102
+
103
+
104
+
105
+ #### Training Hyperparameters
106
+
107
+ ```yml
108
+ accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'':
109
+ True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'':
110
+ None}'
111
+ adafactor: false
112
+ adam_beta1: 0.9
113
+ adam_beta2: 0.999
114
+ adam_epsilon: 1.0e-08
115
+ auto_find_batch_size: false
116
+ base_model: camembertv2
117
+ base_model_name: camembertav2-base-bf16-p2-17000
118
+ batch_eval_metrics: false
119
+ bf16: false
120
+ bf16_full_eval: false
121
+ data_seed: 1.0
122
+ dataloader_drop_last: false
123
+ dataloader_num_workers: 0
124
+ dataloader_persistent_workers: false
125
+ dataloader_pin_memory: true
126
+ dataloader_prefetch_factor: .nan
127
+ ddp_backend: .nan
128
+ ddp_broadcast_buffers: .nan
129
+ ddp_bucket_cap_mb: .nan
130
+ ddp_find_unused_parameters: .nan
131
+ ddp_timeout: 1800
132
+ debug: '[]'
133
+ deepspeed: .nan
134
+ disable_tqdm: false
135
+ dispatch_batches: .nan
136
+ do_eval: true
137
+ do_predict: false
138
+ do_train: true
139
+ epoch: 5.984
140
+ eval_accumulation_steps: 4
141
+ eval_accuracy: 0.9584930821803634
142
+ eval_delay: 0
143
+ eval_do_concat_batches: true
144
+ eval_loss: 0.1653172671794891
145
+ eval_on_start: false
146
+ eval_runtime: 85.3752
147
+ eval_samples: 5999
148
+ eval_samples_per_second: 70.266
149
+ eval_steps: .nan
150
+ eval_steps_per_second: 8.785
151
+ eval_strategy: epoch
152
+ eval_use_gather_object: false
153
+ evaluation_strategy: epoch
154
+ fp16: false
155
+ fp16_backend: auto
156
+ fp16_full_eval: false
157
+ fp16_opt_level: O1
158
+ fsdp: '[]'
159
+ fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'':
160
+ False}'
161
+ fsdp_min_num_params: 0
162
+ fsdp_transformer_layer_cls_to_wrap: .nan
163
+ full_determinism: false
164
+ gradient_accumulation_steps: 4
165
+ gradient_checkpointing: false
166
+ gradient_checkpointing_kwargs: .nan
167
+ greater_is_better: true
168
+ group_by_length: false
169
+ half_precision_backend: auto
170
+ hub_always_push: false
171
+ hub_model_id: .nan
172
+ hub_private_repo: false
173
+ hub_strategy: every_save
174
+ hub_token: <HUB_TOKEN>
175
+ ignore_data_skip: false
176
+ include_inputs_for_metrics: false
177
+ include_num_input_tokens_seen: false
178
+ include_tokens_per_second: false
179
+ jit_mode_eval: false
180
+ label_names: .nan
181
+ label_smoothing_factor: 0.0
182
+ learning_rate: 3.0e-05
183
+ length_column_name: length
184
+ load_best_model_at_end: true
185
+ local_rank: 0
186
+ log_level: debug
187
+ log_level_replica: warning
188
+ log_on_each_node: true
189
+ logging_dir: /scratch/camembertv2/runs/results/flue-CLS/camembertav2-base-bf16-p2-17000/max_seq_length-1024-gradient_accumulation_steps-4-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-linear-warmup_steps-0/SEED-1/logs
190
+ logging_first_step: false
191
+ logging_nan_inf_filter: true
192
+ logging_steps: 100
193
+ logging_strategy: steps
194
+ lr_scheduler_kwargs: '{}'
195
+ lr_scheduler_type: linear
196
+ max_grad_norm: 1.0
197
+ max_steps: -1
198
+ metric_for_best_model: accuracy
199
+ mp_parameters: .nan
200
+ name: camembertv2/runs/results/flue-CLS/camembertav2-base-bf16-p2-17000/max_seq_length-1024-gradient_accumulation_steps-4-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-linear-warmup_steps-0
201
+ neftune_noise_alpha: .nan
202
+ no_cuda: false
203
+ num_train_epochs: 6.0
204
+ optim: adamw_torch
205
+ optim_args: .nan
206
+ optim_target_modules: .nan
207
+ output_dir: /scratch/camembertv2/runs/results/flue-CLS/camembertav2-base-bf16-p2-17000/max_seq_length-1024-gradient_accumulation_steps-4-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-linear-warmup_steps-0/SEED-1
208
+ overwrite_output_dir: false
209
+ past_index: -1
210
+ per_device_eval_batch_size: 8
211
+ per_device_train_batch_size: 8
212
+ per_gpu_eval_batch_size: .nan
213
+ per_gpu_train_batch_size: .nan
214
+ prediction_loss_only: false
215
+ push_to_hub: false
216
+ push_to_hub_model_id: .nan
217
+ push_to_hub_organization: .nan
218
+ push_to_hub_token: <PUSH_TO_HUB_TOKEN>
219
+ ray_scope: last
220
+ remove_unused_columns: true
221
+ report_to: '[''tensorboard'']'
222
+ restore_callback_states_from_checkpoint: false
223
+ resume_from_checkpoint: .nan
224
+ run_name: /scratch/camembertv2/runs/results/flue-CLS/camembertav2-base-bf16-p2-17000/max_seq_length-1024-gradient_accumulation_steps-4-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-linear-warmup_steps-0/SEED-1
225
+ save_on_each_node: false
226
+ save_only_model: false
227
+ save_safetensors: true
228
+ save_steps: 500
229
+ save_strategy: epoch
230
+ save_total_limit: .nan
231
+ seed: 1
232
+ skip_memory_metrics: true
233
+ split_batches: .nan
234
+ tf32: .nan
235
+ torch_compile: true
236
+ torch_compile_backend: inductor
237
+ torch_compile_mode: .nan
238
+ torch_empty_cache_steps: .nan
239
+ torchdynamo: .nan
240
+ total_flos: 6620583341429724.0
241
+ tpu_metrics_debug: false
242
+ tpu_num_cores: .nan
243
+ train_loss: 0.0933089647276091
244
+ train_runtime: 1923.7045
245
+ train_samples: 5997
246
+ train_samples_per_second: 18.705
247
+ train_steps_per_second: 0.583
248
+ use_cpu: false
249
+ use_ipex: false
250
+ use_legacy_prediction_loop: false
251
+ use_mps_device: false
252
+ warmup_ratio: 0.0
253
+ warmup_steps: 0
254
+ weight_decay: 0.0
255
+
256
+ ```
257
+
258
+ #### Results
259
+
260
+ **Accuracy:** 0.95849
261
+
262
+ ## Technical Specifications
263
+
264
+ ### Model Architecture and Objective
265
+
266
+ deberta-v2 for sequence classification.
267
+
268
+ ## Citation
269
+
270
+ **BibTeX:**
271
+
272
+ ```bibtex
273
+ @misc{antoun2024camembert20smarterfrench,
274
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
275
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
276
+ year={2024},
277
+ eprint={2411.08868},
278
+ archivePrefix={arXiv},
279
+ primaryClass={cs.CL},
280
+ url={https://arxiv.org/abs/2411.08868},
281
+ }
282
+ ```
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.984,
3
+ "eval_accuracy": 0.9584930821803634,
4
+ "eval_loss": 0.16531726717948914,
5
+ "eval_runtime": 85.3752,
6
+ "eval_samples": 5999,
7
+ "eval_samples_per_second": 70.266,
8
+ "eval_steps_per_second": 8.785,
9
+ "total_flos": 6620583341429724.0,
10
+ "train_loss": 0.09330896472760913,
11
+ "train_runtime": 1923.7045,
12
+ "train_samples": 5997,
13
+ "train_samples_per_second": 18.705,
14
+ "train_steps_per_second": 0.583
15
+ }
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertav2-base-bf16/post/ckpt-p2-17000/pt/discriminator/",
3
+ "architectures": [
4
+ "DebertaV2ForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "conv_act": "gelu",
9
+ "conv_kernel_size": 0,
10
+ "embedding_size": 768,
11
+ "eos_token_id": 2,
12
+ "finetuning_task": "cls",
13
+ "hidden_act": "gelu",
14
+ "hidden_dropout_prob": 0.1,
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 3072,
18
+ "label2id": {
19
+ "negative": 0,
20
+ "positive": 1
21
+ },
22
+ "layer_norm_eps": 1e-07,
23
+ "max_position_embeddings": 1024,
24
+ "max_relative_positions": -1,
25
+ "model_name": "camembertav2-base-bf16",
26
+ "model_type": "deberta-v2",
27
+ "norm_rel_ebd": "layer_norm",
28
+ "num_attention_heads": 12,
29
+ "num_hidden_layers": 12,
30
+ "pad_token_id": 0,
31
+ "pooler_dropout": 0,
32
+ "pooler_hidden_act": "gelu",
33
+ "pooler_hidden_size": 768,
34
+ "pos_att_type": [
35
+ "p2c",
36
+ "c2p"
37
+ ],
38
+ "position_biased_input": false,
39
+ "position_buckets": 256,
40
+ "relative_attention": true,
41
+ "share_att_key": true,
42
+ "torch_dtype": "float32",
43
+ "transformers_version": "4.44.2",
44
+ "type_vocab_size": 0,
45
+ "vocab_size": 32768
46
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.984,
3
+ "eval_accuracy": 0.9584930821803634,
4
+ "eval_loss": 0.16531726717948914,
5
+ "eval_runtime": 85.3752,
6
+ "eval_samples": 5999,
7
+ "eval_samples_per_second": 70.266,
8
+ "eval_steps_per_second": 8.785
9
+ }
logs/events.out.tfevents.1724566533.nefgpu39.130290.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5670b63a53e903afc0f718787b3193f4bcdab2cb7e5846296837027c7f8dd8
3
+ size 10545
logs/events.out.tfevents.1724568542.nefgpu39.130290.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43ad1f5572b764b762de1773e2c2901bbc302f3486ef692e7f22d0a4bce93acd
3
+ size 363
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e05161ff21802b1cdcaee50bda7652dea5df3565b816df098b248c4b8a13eb8
3
+ size 444859368
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "[UNK]"
57
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.984,
3
+ "total_flos": 6620583341429724.0,
4
+ "train_loss": 0.09330896472760913,
5
+ "train_runtime": 1923.7045,
6
+ "train_samples": 5997,
7
+ "train_samples_per_second": 18.705,
8
+ "train_steps_per_second": 0.583
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9584930821803634,
3
+ "best_model_checkpoint": "/scratch/camembertv2/runs/results/flue-CLS/camembertav2-base-bf16-p2-17000/max_seq_length-1024-gradient_accumulation_steps-4-precision-fp32-learning_rate-3e-05-epochs-6-lr_scheduler-linear-warmup_steps-0/SEED-1/checkpoint-562",
4
+ "epoch": 5.984,
5
+ "eval_steps": 500,
6
+ "global_step": 1122,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.5333333333333333,
13
+ "grad_norm": 4.058077812194824,
14
+ "learning_rate": 2.732620320855615e-05,
15
+ "loss": 0.3373,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.9973333333333333,
20
+ "eval_accuracy": 0.9546591098516419,
21
+ "eval_loss": 0.13871271908283234,
22
+ "eval_runtime": 86.0844,
23
+ "eval_samples_per_second": 69.687,
24
+ "eval_steps_per_second": 8.712,
25
+ "step": 187
26
+ },
27
+ {
28
+ "epoch": 1.0666666666666667,
29
+ "grad_norm": 14.420208930969238,
30
+ "learning_rate": 2.4652406417112303e-05,
31
+ "loss": 0.1886,
32
+ "step": 200
33
+ },
34
+ {
35
+ "epoch": 1.6,
36
+ "grad_norm": 5.297502517700195,
37
+ "learning_rate": 2.197860962566845e-05,
38
+ "loss": 0.1315,
39
+ "step": 300
40
+ },
41
+ {
42
+ "epoch": 2.0,
43
+ "eval_accuracy": 0.953492248708118,
44
+ "eval_loss": 0.15359961986541748,
45
+ "eval_runtime": 85.7484,
46
+ "eval_samples_per_second": 69.961,
47
+ "eval_steps_per_second": 8.747,
48
+ "step": 375
49
+ },
50
+ {
51
+ "epoch": 2.1333333333333333,
52
+ "grad_norm": 5.223259925842285,
53
+ "learning_rate": 1.93048128342246e-05,
54
+ "loss": 0.1186,
55
+ "step": 400
56
+ },
57
+ {
58
+ "epoch": 2.6666666666666665,
59
+ "grad_norm": 0.2902381420135498,
60
+ "learning_rate": 1.663101604278075e-05,
61
+ "loss": 0.0632,
62
+ "step": 500
63
+ },
64
+ {
65
+ "epoch": 2.997333333333333,
66
+ "eval_accuracy": 0.9584930821803634,
67
+ "eval_loss": 0.16531726717948914,
68
+ "eval_runtime": 85.8419,
69
+ "eval_samples_per_second": 69.884,
70
+ "eval_steps_per_second": 8.737,
71
+ "step": 562
72
+ },
73
+ {
74
+ "epoch": 3.2,
75
+ "grad_norm": 0.11738034337759018,
76
+ "learning_rate": 1.39572192513369e-05,
77
+ "loss": 0.0659,
78
+ "step": 600
79
+ },
80
+ {
81
+ "epoch": 3.7333333333333334,
82
+ "grad_norm": 1.2773685455322266,
83
+ "learning_rate": 1.1283422459893049e-05,
84
+ "loss": 0.0421,
85
+ "step": 700
86
+ },
87
+ {
88
+ "epoch": 4.0,
89
+ "eval_accuracy": 0.9574929154859143,
90
+ "eval_loss": 0.19776488840579987,
91
+ "eval_runtime": 86.0955,
92
+ "eval_samples_per_second": 69.678,
93
+ "eval_steps_per_second": 8.711,
94
+ "step": 750
95
+ },
96
+ {
97
+ "epoch": 4.266666666666667,
98
+ "grad_norm": 0.0813373252749443,
99
+ "learning_rate": 8.609625668449198e-06,
100
+ "loss": 0.0377,
101
+ "step": 800
102
+ },
103
+ {
104
+ "epoch": 4.8,
105
+ "grad_norm": 0.8819140791893005,
106
+ "learning_rate": 5.935828877005348e-06,
107
+ "loss": 0.0193,
108
+ "step": 900
109
+ },
110
+ {
111
+ "epoch": 4.997333333333334,
112
+ "eval_accuracy": 0.9553258876479414,
113
+ "eval_loss": 0.22088374197483063,
114
+ "eval_runtime": 85.8351,
115
+ "eval_samples_per_second": 69.89,
116
+ "eval_steps_per_second": 8.738,
117
+ "step": 937
118
+ },
119
+ {
120
+ "epoch": 5.333333333333333,
121
+ "grad_norm": 12.588311195373535,
122
+ "learning_rate": 3.2620320855614974e-06,
123
+ "loss": 0.0196,
124
+ "step": 1000
125
+ },
126
+ {
127
+ "epoch": 5.866666666666667,
128
+ "grad_norm": 0.04329814016819,
129
+ "learning_rate": 5.882352941176471e-07,
130
+ "loss": 0.0222,
131
+ "step": 1100
132
+ },
133
+ {
134
+ "epoch": 5.984,
135
+ "eval_accuracy": 0.9554925820970162,
136
+ "eval_loss": 0.2227182686328888,
137
+ "eval_runtime": 86.0509,
138
+ "eval_samples_per_second": 69.715,
139
+ "eval_steps_per_second": 8.716,
140
+ "step": 1122
141
+ },
142
+ {
143
+ "epoch": 5.984,
144
+ "step": 1122,
145
+ "total_flos": 6620583341429724.0,
146
+ "train_loss": 0.09330896472760913,
147
+ "train_runtime": 1923.7045,
148
+ "train_samples_per_second": 18.705,
149
+ "train_steps_per_second": 0.583
150
+ }
151
+ ],
152
+ "logging_steps": 100,
153
+ "max_steps": 1122,
154
+ "num_input_tokens_seen": 0,
155
+ "num_train_epochs": 6,
156
+ "save_steps": 500,
157
+ "stateful_callbacks": {
158
+ "TrainerControl": {
159
+ "args": {
160
+ "should_epoch_stop": false,
161
+ "should_evaluate": false,
162
+ "should_log": false,
163
+ "should_save": true,
164
+ "should_training_stop": true
165
+ },
166
+ "attributes": {}
167
+ }
168
+ },
169
+ "total_flos": 6620583341429724.0,
170
+ "train_batch_size": 8,
171
+ "trial_name": null,
172
+ "trial_params": null
173
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cbabdbec06e222418e949caaf387e771ff57de22ba0d18efc939af09fb1f22f
3
+ size 5560
vocab.txt ADDED
The diff for this file is too large to render. See raw diff