pietrolesci commited on
Commit
1cc5664
β€’
1 Parent(s): ab74fee

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +227 -0
README.md ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Overview
2
+ T5-Base v1.1 model trained to generate hypotheses given a premise and a label. Below the settings used to train it
3
+
4
+ ```yaml
5
+
6
+ Experiment configurations
7
+ β”œβ”€β”€ datasets
8
+ β”‚ └── mnli_train:
9
+ β”‚ dataset_name: multi_nli
10
+ β”‚ dataset_config_name: null
11
+ β”‚ cache_dir: null
12
+ β”‚ input_fields:
13
+ β”‚ - premise
14
+ β”‚ - hypothesis
15
+ β”‚ target_field: label
16
+ β”‚ train_subset_names: null
17
+ β”‚ val_subset_names: validation_matched
18
+ β”‚ test_subset_names: none
19
+ β”‚ train_val_split: null
20
+ β”‚ limit_train_samples: null
21
+ β”‚ limit_val_samples: null
22
+ β”‚ limit_test_samples: null
23
+ β”‚ sampling_kwargs:
24
+ β”‚ sampling_strategy: random
25
+ β”‚ seed: 42
26
+ β”‚ replace: false
27
+ β”‚ align_labels_with_mapping: null
28
+ β”‚ avoid_consistency_check: false
29
+ β”‚ predict_label_mapping: null
30
+ β”‚ mnli:
31
+ β”‚ dataset_name: multi_nli
32
+ β”‚ dataset_config_name: null
33
+ β”‚ cache_dir: null
34
+ β”‚ input_fields:
35
+ β”‚ - premise
36
+ β”‚ - hypothesis
37
+ β”‚ target_field: label
38
+ β”‚ train_subset_names: none
39
+ β”‚ val_subset_names: none
40
+ β”‚ test_subset_names: validation_mismatched
41
+ β”‚ train_val_split: null
42
+ β”‚ limit_train_samples: null
43
+ β”‚ limit_val_samples: null
44
+ β”‚ limit_test_samples: null
45
+ β”‚ sampling_kwargs:
46
+ β”‚ sampling_strategy: random
47
+ β”‚ seed: 42
48
+ β”‚ replace: false
49
+ β”‚ align_labels_with_mapping: null
50
+ β”‚ avoid_consistency_check: false
51
+ β”‚ predict_label_mapping: null
52
+ β”‚
53
+ β”œβ”€β”€ data
54
+ β”‚ └── _target_: src.task.nli.data.NLIGenerationData.from_config
55
+ β”‚ main_dataset_name: null
56
+ β”‚ use_additional_as_test: null
57
+ β”‚ dataloader:
58
+ β”‚ batch_size: 64
59
+ β”‚ eval_batch_size: 100
60
+ β”‚ num_workers: 16
61
+ β”‚ pin_memory: true
62
+ β”‚ drop_last: false
63
+ β”‚ persistent_workers: false
64
+ β”‚ shuffle: true
65
+ β”‚ seed_dataloader: 42
66
+ β”‚ replacement: false
67
+ β”‚ processing:
68
+ β”‚ preprocessing_num_workers: 16
69
+ β”‚ preprocessing_batch_size: 1000
70
+ β”‚ load_from_cache_file: true
71
+ β”‚ padding: longest
72
+ β”‚ truncation: longest_first
73
+ β”‚ max_source_length: 128
74
+ β”‚ max_target_length: 128
75
+ β”‚ template: 'premise: $premise $label hypothesis: '
76
+ β”‚ tokenizer:
77
+ β”‚ _target_: transformers.AutoTokenizer.from_pretrained
78
+ β”‚ pretrained_model_name_or_path: google/t5-v1_1-base
79
+ β”‚ use_fast: true
80
+ β”‚
81
+ β”œβ”€β”€ task
82
+ β”‚ └── optimizer:
83
+ β”‚ name: Adafactor
84
+ β”‚ lr: 0.001
85
+ β”‚ weight_decay: 0.0
86
+ β”‚ no_decay:
87
+ β”‚ - bias
88
+ β”‚ - LayerNorm.weight
89
+ β”‚ decay_rate: -0.8
90
+ β”‚ clip_threshold: 1.0
91
+ β”‚ relative_step: false
92
+ β”‚ scale_parameter: false
93
+ β”‚ warmup_init: false
94
+ β”‚ scheduler:
95
+ β”‚ name: constant_schedule
96
+ β”‚ model:
97
+ β”‚ model_name_or_path: google/t5-v1_1-base
98
+ β”‚ checkpoint_path: null
99
+ β”‚ freeze: false
100
+ β”‚ seed_init_weight: 42
101
+ β”‚ _target_: src.task.nli.NLIGenerationTask.from_config
102
+ β”‚ generation:
103
+ β”‚ max_length: 128
104
+ β”‚ min_length: 3
105
+ β”‚ do_sample: true
106
+ β”‚ early_stopping: false
107
+ β”‚ num_beams: 1
108
+ β”‚ temperature: 1.0
109
+ β”‚ top_k: 50
110
+ β”‚ top_p: 0.95
111
+ β”‚ repetition_penalty: null
112
+ β”‚ length_penalty: null
113
+ β”‚ no_repeat_ngram_size: null
114
+ β”‚ encoder_no_repeat_ngram_size: null
115
+ β”‚ num_return_sequences: 1
116
+ β”‚ max_time: null
117
+ β”‚ max_new_tokens: null
118
+ β”‚ decoder_start_token_id: null
119
+ β”‚ use_cache: null
120
+ β”‚ num_beam_groups: null
121
+ β”‚ diversity_penalty: null
122
+ β”‚
123
+ β”œβ”€β”€ trainer
124
+ β”‚ └── _target_: pytorch_lightning.Trainer
125
+ β”‚ callbacks:
126
+ β”‚ lr_monitor:
127
+ β”‚ _target_: pytorch_lightning.callbacks.LearningRateMonitor
128
+ β”‚ logging_interval: step
129
+ β”‚ log_momentum: false
130
+ β”‚ model_checkpoint:
131
+ β”‚ _target_: pytorch_lightning.callbacks.ModelCheckpoint
132
+ β”‚ dirpath: ./checkpoints/
133
+ β”‚ filename: nli_generator_mnli-epoch={epoch:02d}-val_loss={val/aggregated_loss:.2f}
134
+ β”‚ monitor: val/aggregated_loss
135
+ β”‚ mode: min
136
+ β”‚ verbose: false
137
+ β”‚ save_last: true
138
+ β”‚ save_top_k: 1
139
+ β”‚ auto_insert_metric_name: false
140
+ β”‚ save_on_train_epoch_end: false
141
+ β”‚ rich_model_summary:
142
+ β”‚ _target_: pytorch_lightning.callbacks.RichModelSummary
143
+ β”‚ max_depth: 1
144
+ β”‚ log_grad_norm:
145
+ β”‚ _target_: src.core.callbacks.LogGradNorm
146
+ β”‚ norm_type: 2
147
+ β”‚ group_separator: /
148
+ β”‚ only_total: true
149
+ β”‚ on_step: true
150
+ β”‚ on_epoch: false
151
+ β”‚ prog_bar: true
152
+ β”‚ log_generated_text:
153
+ β”‚ _target_: src.core.callbacks.GenerateAndLogText
154
+ β”‚ dirpath: ./generated_text
155
+ β”‚ type: generated_text
156
+ β”‚ pop_keys_after_logging: true
157
+ β”‚ on_train: false
158
+ β”‚ on_validation: false
159
+ β”‚ on_test: true
160
+ β”‚ log_to_wandb: true
161
+ β”‚ wandb_log_dataset_sizes:
162
+ β”‚ _target_: src.core.callbacks.WandbLogDatasetSizes
163
+ β”‚ logger:
164
+ β”‚ wandb:
165
+ β”‚ _target_: pytorch_lightning.loggers.WandbLogger
166
+ β”‚ project: nli_debiasing
167
+ β”‚ entity: team_brushino
168
+ β”‚ name: nli_generator_mnli
169
+ β”‚ save_dir: ./
170
+ β”‚ offline: false
171
+ β”‚ log_model: false
172
+ β”‚ group: mnli
173
+ β”‚ job_type: generator
174
+ β”‚ tags:
175
+ β”‚ - nli_generator_mnli
176
+ β”‚ - seed=42
177
+ β”‚ - seed_dataloader=42
178
+ β”‚ notes: nli_generator_mnli_time=02-24-53
179
+ β”‚ enable_checkpointing: true
180
+ β”‚ enable_progress_bar: true
181
+ β”‚ enable_model_summary: true
182
+ β”‚ gradient_clip_val: 0.0
183
+ β”‚ gradient_clip_algorithm: null
184
+ β”‚ accelerator: gpu
185
+ β”‚ devices: auto
186
+ β”‚ gpus: null
187
+ β”‚ auto_select_gpus: true
188
+ β”‚ accumulate_grad_batches: 1
189
+ β”‚ max_epochs: 3
190
+ β”‚ min_epochs: 1
191
+ β”‚ max_steps: -1
192
+ β”‚ min_steps: null
193
+ β”‚ max_time: null
194
+ β”‚ num_sanity_val_steps: 2
195
+ β”‚ overfit_batches: 0.0
196
+ β”‚ fast_dev_run: false
197
+ β”‚ limit_train_batches: 1.0
198
+ β”‚ limit_val_batches: 1.0
199
+ β”‚ limit_test_batches: 1.0
200
+ β”‚ profiler: null
201
+ β”‚ detect_anomaly: false
202
+ β”‚ deterministic: false
203
+ β”‚ check_val_every_n_epoch: 1
204
+ β”‚ val_check_interval: 0.1
205
+ β”‚ log_every_n_steps: 10
206
+ β”‚ move_metrics_to_cpu: false
207
+ β”‚
208
+ └── training
209
+ └── run_val_before_fit: false
210
+ run_val_after_fit: false
211
+ run_test_before_fit: false
212
+ run_test_after_fit: true
213
+ lr: 0.001
214
+ seed: 42
215
+ show_batch: false
216
+ batch_size: 64
217
+ eval_batch_size: 100
218
+ num_workers: 16
219
+ pin_memory: true
220
+ drop_last: false
221
+ persistent_workers: false
222
+ shuffle: true
223
+ seed_dataloader: 42
224
+ ignore_warnings: true
225
+ experiment_name: nli_generator_mnli
226
+
227
+ ```