File size: 5,378 Bytes
9369186
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Downloading and preparing dataset json/default to /home/ace14459tv/t5maru/cache/json/default-76e405bf2a5f1b35/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]
Downloading data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 7410.43it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]
Extracting data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 374.29it/s]

Generating train split: 0 examples [00:00, ? examples/s]
                                                        
Dataset json downloaded and prepared to /home/ace14459tv/t5maru/cache/json/default-76e405bf2a5f1b35/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.

Map (num_proc=4):   0%|          | 0/2880 [00:00<?, ? examples/s]
Map (num_proc=4):   3%|β–Ž         | 94/2880 [00:00<00:15, 175.45 examples/s]
Map (num_proc=4):  13%|β–ˆβ–Ž        | 384/2880 [00:00<00:03, 655.72 examples/s]
Map (num_proc=4):  32%|β–ˆβ–ˆβ–ˆβ–      | 913/2880 [00:00<00:01, 1620.08 examples/s]
Map (num_proc=4):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 1672/2880 [00:00<00:00, 2983.67 examples/s]
Map (num_proc=4):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2467/2880 [00:01<00:00, 4206.04 examples/s]
                                                                              
Downloading and preparing dataset json/default to /home/ace14459tv/t5maru/cache/json/default-8eccec914afb5393/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]
Downloading data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 8305.55it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]
Extracting data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 426.86it/s]

Generating train split: 0 examples [00:00, ? examples/s]
                                                        
Dataset json downloaded and prepared to /home/ace14459tv/t5maru/cache/json/default-8eccec914afb5393/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.

Map (num_proc=4):   0%|          | 0/618 [00:00<?, ? examples/s]
Map (num_proc=4):  16%|β–ˆβ–Œ        | 96/618 [00:01<00:05, 94.84 examples/s]
                                                                         
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type            | Params
------------------------------------------
0 | model | OptimizedModule | 300 M 
------------------------------------------
300 M     Trainable params
0         Non-trainable params
300 M     Total params
1,200.707 Total estimated model params size (MB)
[2023-06-30 20:54:37,264] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Metric val_loss improved. New best score: 1.139
Metric val_loss improved by 0.743 >= min_delta = 0.0. New best score: 0.397
Metric val_loss improved by 0.178 >= min_delta = 0.0. New best score: 0.219
Metric val_loss improved by 0.058 >= min_delta = 0.0. New best score: 0.161
Metric val_loss improved by 0.027 >= min_delta = 0.0. New best score: 0.134
Metric val_loss improved by 0.020 >= min_delta = 0.0. New best score: 0.115
Metric val_loss improved by 0.012 >= min_delta = 0.0. New best score: 0.103
Metric val_loss improved by 0.005 >= min_delta = 0.0. New best score: 0.098
Metric val_loss improved by 0.011 >= min_delta = 0.0. New best score: 0.087
Metric val_loss improved by 0.000 >= min_delta = 0.0. New best score: 0.086
Metric val_loss improved by 0.004 >= min_delta = 0.0. New best score: 0.083
Metric val_loss improved by 0.006 >= min_delta = 0.0. New best score: 0.077
Metric val_loss improved by 0.002 >= min_delta = 0.0. New best score: 0.075
Monitored metric val_loss did not improve in the last 3 records. Best score: 0.075. Signaling Trainer to stop.
{"log": "trained", "date": "2023-06-30T20:53:45", "elapsed": "00:05:22", "model": "google/mt5-small", "max_length": 128, "target_max_length": 128, "batch_size": 32, "gradient_accumulation_steps": 1, "train_steps": 2700, "accelerator": "gpu", "devices": "auto", "precision": 32, "strategy": "auto", "gradient_clip_val": 1.0, "compile": true, "solver": "adamw", "lr": 0.0003, "warmup_steps": 1, "training_steps": 100000, "adam_epsilon": 1e-08, "weight_decay": 0.0, "epoch": 17, "step": 1530, "saved": "0630_mT5"}
😊 testing /home/ace14459tv/t5maru/error_data/0630/error_3_0630_test.jsonl on cuda
Downloading and preparing dataset generator/default to /home/ace14459tv/t5maru/cache/generator/default-f9a3d4be341e4e78/0.0.0...

Generating train split: 0 examples [00:00, ? examples/s]
                                                        
Dataset generator downloaded and prepared to /home/ace14459tv/t5maru/cache/generator/default-f9a3d4be341e4e78/0.0.0. Subsequent calls will reuse this data.

Map (num_proc=4):   0%|          | 0/617 [00:00<?, ? examples/s]
Map (num_proc=4):  25%|β–ˆβ–ˆβ–Œ       | 155/617 [00:00<00:01, 267.27 examples/s]
                                                                           
😊 Tested 617 items. See 0630_mT5/error_3_0630_tested.jsonl