pietrolesci commited on
Commit
2388064
β€’
1 Parent(s): 540b78e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +323 -0
README.md ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Overview
2
+
3
+ T5-Base v1.1 model trained to generate hypotheses given a premise and a label. Below the settings used to train it.
4
+
5
+ ```yaml
6
+ Experiment configurations
7
+ β”œβ”€β”€ datasets
8
+ β”‚ └── snli_train:
9
+ β”‚ dataset_name: snli
10
+ β”‚ dataset_config_name: null
11
+ β”‚ cache_dir: null
12
+ β”‚ input_fields:
13
+ β”‚ - premise
14
+ β”‚ - hypothesis
15
+ β”‚ target_field: label
16
+ β”‚ train_subset_names: null
17
+ β”‚ val_subset_names: validation
18
+ β”‚ test_subset_names: none
19
+ β”‚ train_val_split: null
20
+ β”‚ limit_train_samples: null
21
+ β”‚ limit_val_samples: null
22
+ β”‚ limit_test_samples: null
23
+ β”‚ sampling_kwargs:
24
+ β”‚ sampling_strategy: random
25
+ β”‚ seed: 42
26
+ β”‚ replace: false
27
+ β”‚ align_labels_with_mapping: null
28
+ β”‚ avoid_consistency_check: false
29
+ β”‚ predict_label_mapping: null
30
+ β”‚ anli_train:
31
+ β”‚ dataset_name: anli
32
+ β”‚ dataset_config_name: null
33
+ β”‚ cache_dir: null
34
+ β”‚ input_fields:
35
+ β”‚ - premise
36
+ β”‚ - hypothesis
37
+ β”‚ target_field: label
38
+ β”‚ train_subset_names:
39
+ β”‚ - train_r1
40
+ β”‚ - train_r2
41
+ β”‚ - train_r3
42
+ β”‚ val_subset_names:
43
+ β”‚ - dev_r1
44
+ β”‚ - dev_r2
45
+ β”‚ - dev_r3
46
+ β”‚ test_subset_names: none
47
+ β”‚ train_val_split: null
48
+ β”‚ limit_train_samples: null
49
+ β”‚ limit_val_samples: null
50
+ β”‚ limit_test_samples: null
51
+ β”‚ sampling_kwargs:
52
+ β”‚ sampling_strategy: random
53
+ β”‚ seed: 42
54
+ β”‚ replace: false
55
+ β”‚ align_labels_with_mapping: null
56
+ β”‚ avoid_consistency_check: false
57
+ β”‚ predict_label_mapping: null
58
+ β”‚ mnli_train:
59
+ β”‚ dataset_name: multi_nli
60
+ β”‚ dataset_config_name: null
61
+ β”‚ cache_dir: null
62
+ β”‚ input_fields:
63
+ β”‚ - premise
64
+ β”‚ - hypothesis
65
+ β”‚ target_field: label
66
+ β”‚ train_subset_names: null
67
+ β”‚ val_subset_names: validation_matched
68
+ β”‚ test_subset_names: none
69
+ β”‚ train_val_split: null
70
+ β”‚ limit_train_samples: null
71
+ β”‚ limit_val_samples: null
72
+ β”‚ limit_test_samples: null
73
+ β”‚ sampling_kwargs:
74
+ β”‚ sampling_strategy: random
75
+ β”‚ seed: 42
76
+ β”‚ replace: false
77
+ β”‚ align_labels_with_mapping: null
78
+ β”‚ avoid_consistency_check: false
79
+ β”‚ predict_label_mapping: null
80
+ β”‚ snli:
81
+ β”‚ dataset_name: snli
82
+ β”‚ dataset_config_name: null
83
+ β”‚ cache_dir: null
84
+ β”‚ input_fields:
85
+ β”‚ - premise
86
+ β”‚ - hypothesis
87
+ β”‚ target_field: label
88
+ β”‚ train_subset_names: none
89
+ β”‚ val_subset_names: none
90
+ β”‚ test_subset_names: null
91
+ β”‚ train_val_split: null
92
+ β”‚ limit_train_samples: null
93
+ β”‚ limit_val_samples: null
94
+ β”‚ limit_test_samples: null
95
+ β”‚ sampling_kwargs:
96
+ β”‚ sampling_strategy: random
97
+ β”‚ seed: 42
98
+ β”‚ replace: false
99
+ β”‚ align_labels_with_mapping: null
100
+ β”‚ avoid_consistency_check: false
101
+ β”‚ predict_label_mapping: null
102
+ β”‚ anli:
103
+ β”‚ dataset_name: anli
104
+ β”‚ dataset_config_name: null
105
+ β”‚ cache_dir: null
106
+ β”‚ input_fields:
107
+ β”‚ - premise
108
+ β”‚ - hypothesis
109
+ β”‚ target_field: label
110
+ β”‚ train_subset_names: none
111
+ β”‚ val_subset_names: none
112
+ β”‚ test_subset_names:
113
+ β”‚ - test_r1
114
+ β”‚ - test_r2
115
+ β”‚ - test_r3
116
+ β”‚ train_val_split: null
117
+ β”‚ limit_train_samples: null
118
+ β”‚ limit_val_samples: null
119
+ β”‚ limit_test_samples: null
120
+ β”‚ sampling_kwargs:
121
+ β”‚ sampling_strategy: random
122
+ β”‚ seed: 42
123
+ β”‚ replace: false
124
+ β”‚ align_labels_with_mapping: null
125
+ οΏ½οΏ½οΏ½ avoid_consistency_check: false
126
+ β”‚ predict_label_mapping: null
127
+ β”‚ mnli:
128
+ β”‚ dataset_name: multi_nli
129
+ β”‚ dataset_config_name: null
130
+ β”‚ cache_dir: null
131
+ β”‚ input_fields:
132
+ β”‚ - premise
133
+ β”‚ - hypothesis
134
+ β”‚ target_field: label
135
+ β”‚ train_subset_names: none
136
+ β”‚ val_subset_names: none
137
+ β”‚ test_subset_names: validation_mismatched
138
+ β”‚ train_val_split: null
139
+ β”‚ limit_train_samples: null
140
+ β”‚ limit_val_samples: null
141
+ β”‚ limit_test_samples: null
142
+ β”‚ sampling_kwargs:
143
+ β”‚ sampling_strategy: random
144
+ β”‚ seed: 42
145
+ β”‚ replace: false
146
+ β”‚ align_labels_with_mapping: null
147
+ β”‚ avoid_consistency_check: false
148
+ β”‚ predict_label_mapping: null
149
+ β”‚
150
+ β”œβ”€β”€ data
151
+ β”‚ └── _target_: src.task.nli.data.NLIGenerationData.from_config
152
+ β”‚ main_dataset_name: null
153
+ β”‚ use_additional_as_test: null
154
+ β”‚ dataloader:
155
+ β”‚ batch_size: 96
156
+ β”‚ eval_batch_size: 96
157
+ β”‚ num_workers: 8
158
+ β”‚ pin_memory: true
159
+ β”‚ drop_last: false
160
+ β”‚ persistent_workers: false
161
+ β”‚ shuffle: true
162
+ β”‚ seed_dataloader: 42
163
+ β”‚ replacement: false
164
+ β”‚ processing:
165
+ β”‚ preprocessing_num_workers: 8
166
+ β”‚ preprocessing_batch_size: 1000
167
+ β”‚ load_from_cache_file: true
168
+ β”‚ padding: longest
169
+ β”‚ truncation: longest_first
170
+ β”‚ max_source_length: 128
171
+ β”‚ max_target_length: 128
172
+ β”‚ template: 'premise: $premise $label hypothesis: '
173
+ β”‚ tokenizer:
174
+ β”‚ _target_: transformers.AutoTokenizer.from_pretrained
175
+ β”‚ pretrained_model_name_or_path: pietrolesci/t5-v1_1-base_nli_gen
176
+ β”‚ use_fast: true
177
+ β”‚
178
+ β”œβ”€β”€ task
179
+ β”‚ └── optimizer:
180
+ β”‚ name: Adafactor
181
+ β”‚ lr: 0.001
182
+ β”‚ weight_decay: 0.0
183
+ β”‚ no_decay:
184
+ β”‚ - bias
185
+ β”‚ - LayerNorm.weight
186
+ β”‚ decay_rate: -0.8
187
+ β”‚ clip_threshold: 1.0
188
+ β”‚ relative_step: false
189
+ β”‚ scale_parameter: false
190
+ β”‚ warmup_init: false
191
+ β”‚ scheduler:
192
+ β”‚ name: constant_schedule
193
+ β”‚ model:
194
+ β”‚ model_name_or_path: pietrolesci/t5-v1_1-base_nli_gen
195
+ β”‚ checkpoint_path: null
196
+ β”‚ freeze: false
197
+ β”‚ seed_init_weight: 42
198
+ β”‚ _target_: src.task.nli.NLIGenerationTask.from_config
199
+ β”‚ generation:
200
+ β”‚ generation_max_length: 128
201
+ β”‚ generation_min_length: 3
202
+ β”‚ do_sample: true
203
+ β”‚ early_stopping: false
204
+ β”‚ num_beams: 1
205
+ β”‚ temperature: 1.0
206
+ β”‚ top_k: 50
207
+ β”‚ top_p: 0.95
208
+ β”‚ repetition_penalty: null
209
+ β”‚ length_penalty: null
210
+ β”‚ no_repeat_ngram_size: null
211
+ β”‚ encoder_no_repeat_ngram_size: null
212
+ β”‚ num_return_sequences: 1
213
+ β”‚ max_time: null
214
+ β”‚ max_new_tokens: null
215
+ β”‚ decoder_start_token_id: null
216
+ β”‚ use_cache: null
217
+ β”‚ num_beam_groups: null
218
+ β”‚ diversity_penalty: null
219
+ β”‚
220
+ β”œβ”€β”€ trainer
221
+ β”‚ └── _target_: pytorch_lightning.Trainer
222
+ β”‚ callbacks:
223
+ β”‚ lr_monitor:
224
+ β”‚ _target_: pytorch_lightning.callbacks.LearningRateMonitor
225
+ β”‚ logging_interval: step
226
+ β”‚ log_momentum: false
227
+ β”‚ model_checkpoint:
228
+ β”‚ _target_: pytorch_lightning.callbacks.ModelCheckpoint
229
+ β”‚ dirpath: ./checkpoints/
230
+ β”‚ filename: nli_generator_sma-epoch={epoch:02d}-val_loss={val/aggregat
231
+ β”‚ monitor: val/aggregated_loss
232
+ β”‚ mode: min
233
+ β”‚ verbose: false
234
+ β”‚ save_last: true
235
+ β”‚ save_top_k: 1
236
+ β”‚ auto_insert_metric_name: false
237
+ β”‚ save_on_train_epoch_end: false
238
+ β”‚ rich_model_summary:
239
+ β”‚ _target_: pytorch_lightning.callbacks.RichModelSummary
240
+ β”‚ max_depth: 1
241
+ β”‚ log_grad_norm:
242
+ β”‚ _target_: src.core.callbacks.LogGradNorm
243
+ β”‚ norm_type: 2
244
+ β”‚ group_separator: /
245
+ β”‚ only_total: true
246
+ β”‚ on_step: true
247
+ β”‚ on_epoch: false
248
+ β”‚ prog_bar: true
249
+ β”‚ log_generated_text:
250
+ β”‚ _target_: src.core.callbacks.GenerateAndLogText
251
+ β”‚ dirpath: ./generated_text
252
+ β”‚ type: generated_text
253
+ β”‚ pop_keys_after_logging: true
254
+ β”‚ on_train: false
255
+ β”‚ on_validation: false
256
+ β”‚ on_test: true
257
+ β”‚ log_to_wandb: true
258
+ β”‚ wandb_log_dataset_sizes:
259
+ β”‚ _target_: src.core.callbacks.WandbLogDatasetSizes
260
+ β”‚ logger:
261
+ β”‚ wandb:
262
+ β”‚ _target_: pytorch_lightning.loggers.WandbLogger
263
+ β”‚ project: nli_debiasing
264
+ β”‚ entity: team_brushino
265
+ β”‚ name: nli_generator_sma
266
+ β”‚ save_dir: ./
267
+ β”‚ offline: false
268
+ β”‚ log_model: false
269
+ β”‚ group: generator
270
+ β”‚ job_type: genearator_training
271
+ β”‚ tags:
272
+ β”‚ - nli_generator_sma
273
+ β”‚ - seed=42
274
+ β”‚ - seed_dataloader=42
275
+ β”‚ notes: nli_generator_sma_time=01-37-04
276
+ β”‚ enable_checkpointing: true
277
+ β”‚ enable_progress_bar: true
278
+ β”‚ enable_model_summary: true
279
+ β”‚ gradient_clip_val: 6
280
+ β”‚ gradient_clip_algorithm: null
281
+ β”‚ accelerator: gpu
282
+ β”‚ devices: auto
283
+ β”‚ gpus: null
284
+ β”‚ auto_select_gpus: true
285
+ β”‚ accumulate_grad_batches: 1
286
+ β”‚ max_epochs: 2
287
+ β”‚ min_epochs: 1
288
+ β”‚ max_steps: -1
289
+ β”‚ min_steps: null
290
+ β”‚ max_time: null
291
+ β”‚ num_sanity_val_steps: 2
292
+ β”‚ overfit_batches: 0.0
293
+ β”‚ fast_dev_run: false
294
+ β”‚ limit_train_batches: 1.0
295
+ β”‚ limit_val_batches: 1.0
296
+ β”‚ limit_test_batches: 1.0
297
+ β”‚ profiler: null
298
+ β”‚ detect_anomaly: false
299
+ β”‚ deterministic: false
300
+ β”‚ check_val_every_n_epoch: 1
301
+ β”‚ val_check_interval: 0.5
302
+ β”‚ log_every_n_steps: 1
303
+ β”‚ move_metrics_to_cpu: false
304
+ β”‚
305
+ └── training
306
+ └── run_val_before_fit: false
307
+ run_val_after_fit: false
308
+ run_test_before_fit: false
309
+ run_test_after_fit: true
310
+ lr: 0.001
311
+ seed: 42
312
+ show_batch: false
313
+ batch_size: 96
314
+ eval_batch_size: 96
315
+ num_workers: 8
316
+ pin_memory: true
317
+ drop_last: false
318
+ persistent_workers: false
319
+ shuffle: true
320
+ seed_dataloader: 42
321
+ ignore_warnings: true
322
+ experiment_name: nli_generator_sma
323
+ ```