almersawi commited on
Commit
713b969
·
verified ·
1 Parent(s): fcb51f0

Upload logs

Browse files
Files changed (1) hide show
  1. logs.log +118 -0
logs.log ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total: 4 local rank: 0.
2
+ 2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total: 4 local rank: 1.
3
+ 2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 3, total: 4 local rank: 3.
4
+ 2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 2, total: 4 local rank: 2.
5
+ 2024-04-21 14:34:39,411 - INFO: Problem Type: text_causal_language_modeling
6
+ 2024-04-21 14:34:39,411 - INFO: Global random seed: 291800
7
+ 2024-04-21 14:34:39,411 - INFO: Preparing the data...
8
+ 2024-04-21 14:34:39,411 - INFO: Setting up automatic validation split...
9
+ 2024-04-21 14:34:39,443 - INFO: Preparing train and validation data
10
+ 2024-04-21 14:34:39,443 - INFO: Loading train dataset...
11
+ 2024-04-21 14:34:40,411 - INFO: Stop token ids: []
12
+ 2024-04-21 14:34:40,417 - INFO: Loading validation dataset...
13
+ 2024-04-21 14:34:40,905 - INFO: Stop token ids: []
14
+ 2024-04-21 14:34:40,909 - INFO: Number of observations in train dataset: 495
15
+ 2024-04-21 14:34:40,909 - INFO: Number of observations in validation dataset: 5
16
+ 2024-04-21 14:34:41,280 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
17
+ 2024-04-21 14:34:41,280 - INFO: Setting pretraining_tp of model config to 1.
18
+ 2024-04-21 14:34:41,283 - INFO: Using bfloat16 for backbone
19
+ 2024-04-21 14:34:41,307 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
20
+ 2024-04-21 14:34:41,307 - INFO: Setting pretraining_tp of model config to 1.
21
+ 2024-04-21 14:34:41,310 - INFO: Using bfloat16 for backbone
22
+ 2024-04-21 14:34:41,314 - INFO: Stop token ids: []
23
+ 2024-04-21 14:34:41,316 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
24
+ 2024-04-21 14:34:41,317 - INFO: Setting pretraining_tp of model config to 1.
25
+ 2024-04-21 14:34:41,319 - INFO: Using bfloat16 for backbone
26
+ 2024-04-21 14:34:41,319 - INFO: Loading meta-llama/Llama-2-13b-hf. This may take a while.
27
+ 2024-04-21 14:34:41,329 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
28
+ 2024-04-21 14:34:41,330 - INFO: Setting pretraining_tp of model config to 1.
29
+ 2024-04-21 14:34:41,332 - INFO: Using bfloat16 for backbone
30
+ 2024-04-21 14:36:27,752 - INFO: Loaded meta-llama/Llama-2-13b-hf.
31
+ 2024-04-21 14:36:27,757 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
32
+ 2024-04-21 14:36:29,940 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
33
+ 2024-04-21 14:36:29,945 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
34
+ 2024-04-21 14:36:29,949 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
35
+ 2024-04-21 14:36:29,949 - INFO: Optimizer AdamW has been provided with parameters {'eps': 1e-08, 'weight_decay': 0.0, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
36
+ 2024-04-21 14:36:29,953 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
37
+ 2024-04-21 14:36:29,954 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
38
+ 2024-04-21 14:36:29,959 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
39
+ 2024-04-21 14:36:29,964 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
40
+ 2024-04-21 14:36:32,979 - INFO: started process: 2, can_track: False, tracking_mode: TrackingMode.DURING_EPOCH
41
+ 2024-04-21 14:36:32,981 - INFO: started process: 3, can_track: False, tracking_mode: TrackingMode.DURING_EPOCH
42
+ 2024-04-21 14:36:32,982 - INFO: started process: 1, can_track: False, tracking_mode: TrackingMode.DURING_EPOCH
43
+ 2024-04-21 14:36:33,128 - INFO: Evaluation step: 61
44
+ 2024-04-21 14:36:33,132 - INFO: Evaluation step: 61
45
+ 2024-04-21 14:36:33,137 - INFO: Evaluation step: 61
46
+ 2024-04-21 14:36:34,155 - INFO: started process: 0, can_track: True, tracking_mode: TrackingMode.DURING_EPOCH
47
+ 2024-04-21 14:36:34,156 - INFO: Training Epoch: 1 / 1
48
+ 2024-04-21 14:36:34,157 - INFO: train loss: 0%| | 0/61 [00:00<?, ?it/s]
49
+ 2024-04-21 14:36:34,221 - INFO: Evaluation step: 61
50
+ 2024-04-21 14:36:51,365 - INFO: train loss: 1.35: 2%|1 | 1/61 [00:17<17:12, 17.21s/it]
51
+ 2024-04-21 14:37:00,312 - INFO: train loss: 1.40: 3%|3 | 2/61 [00:26<12:08, 12.35s/it]
52
+ 2024-04-21 14:37:09,287 - INFO: train loss: 1.40: 5%|4 | 3/61 [00:35<10:26, 10.81s/it]
53
+ 2024-04-21 14:37:18,309 - INFO: train loss: 1.42: 7%|6 | 4/61 [00:44<09:35, 10.10s/it]
54
+ 2024-04-21 14:37:27,240 - INFO: train loss: 1.39: 8%|8 | 5/61 [00:53<09:02, 9.68s/it]
55
+ 2024-04-21 14:37:36,272 - INFO: train loss: 1.40: 10%|9 | 6/61 [01:02<08:40, 9.46s/it]
56
+ 2024-04-21 14:37:45,212 - INFO: train loss: 1.38: 11%|#1 | 7/61 [01:11<08:21, 9.29s/it]
57
+ 2024-04-21 14:37:54,160 - INFO: train loss: 1.35: 13%|#3 | 8/61 [01:20<08:06, 9.18s/it]
58
+ 2024-04-21 14:38:03,126 - INFO: train loss: 1.31: 15%|#4 | 9/61 [01:28<07:53, 9.11s/it]
59
+ 2024-04-21 14:38:12,051 - INFO: train loss: 1.31: 16%|#6 | 10/61 [01:37<07:41, 9.06s/it]
60
+ 2024-04-21 14:38:21,037 - INFO: train loss: 1.31: 18%|#8 | 11/61 [01:46<07:31, 9.03s/it]
61
+ 2024-04-21 14:38:29,947 - INFO: train loss: 1.26: 20%|#9 | 12/61 [01:55<07:20, 9.00s/it]
62
+ 2024-04-21 14:38:38,903 - INFO: train loss: 1.21: 21%|##1 | 13/61 [02:04<07:11, 8.98s/it]
63
+ 2024-04-21 14:38:47,845 - INFO: train loss: 1.17: 23%|##2 | 14/61 [02:13<07:01, 8.97s/it]
64
+ 2024-04-21 14:38:56,780 - INFO: train loss: 1.15: 25%|##4 | 15/61 [02:22<06:52, 8.96s/it]
65
+ 2024-04-21 14:39:05,765 - INFO: train loss: 1.11: 26%|##6 | 16/61 [02:31<06:43, 8.97s/it]
66
+ 2024-04-21 14:39:14,666 - INFO: train loss: 1.11: 28%|##7 | 17/61 [02:40<06:33, 8.95s/it]
67
+ 2024-04-21 14:39:23,600 - INFO: train loss: 1.13: 30%|##9 | 18/61 [02:49<06:24, 8.94s/it]
68
+ 2024-04-21 14:39:32,533 - INFO: train loss: 1.12: 31%|###1 | 19/61 [02:58<06:15, 8.94s/it]
69
+ 2024-04-21 14:39:41,495 - INFO: train loss: 1.10: 33%|###2 | 20/61 [03:07<06:06, 8.95s/it]
70
+ 2024-04-21 14:39:50,450 - INFO: train loss: 1.06: 34%|###4 | 21/61 [03:16<05:57, 8.95s/it]
71
+ 2024-04-21 14:39:59,374 - INFO: train loss: 1.04: 36%|###6 | 22/61 [03:25<05:48, 8.94s/it]
72
+ 2024-04-21 14:40:08,308 - INFO: train loss: 1.05: 38%|###7 | 23/61 [03:34<05:39, 8.94s/it]
73
+ 2024-04-21 14:40:17,264 - INFO: train loss: 1.03: 39%|###9 | 24/61 [03:43<05:30, 8.94s/it]
74
+ 2024-04-21 14:40:26,187 - INFO: train loss: 1.02: 41%|#### | 25/61 [03:52<05:21, 8.94s/it]
75
+ 2024-04-21 14:40:35,142 - INFO: train loss: 0.98: 43%|####2 | 26/61 [04:00<05:13, 8.94s/it]
76
+ 2024-04-21 14:40:44,061 - INFO: train loss: 0.92: 44%|####4 | 27/61 [04:09<05:03, 8.94s/it]
77
+ 2024-04-21 14:40:52,992 - INFO: train loss: 0.85: 46%|####5 | 28/61 [04:18<04:54, 8.93s/it]
78
+ 2024-04-21 14:41:01,952 - INFO: train loss: 0.83: 48%|####7 | 29/61 [04:27<04:46, 8.94s/it]
79
+ 2024-04-21 14:41:10,919 - INFO: train loss: 0.85: 49%|####9 | 30/61 [04:36<04:37, 8.95s/it]
80
+ 2024-04-21 14:41:19,862 - INFO: train loss: 0.84: 51%|##### | 31/61 [04:45<04:28, 8.95s/it]
81
+ 2024-04-21 14:41:28,813 - INFO: train loss: 0.85: 52%|#####2 | 32/61 [04:54<04:19, 8.95s/it]
82
+ 2024-04-21 14:41:37,748 - INFO: train loss: 0.82: 54%|#####4 | 33/61 [05:03<04:10, 8.94s/it]
83
+ 2024-04-21 14:41:46,762 - INFO: train loss: 0.82: 56%|#####5 | 34/61 [05:12<04:02, 8.97s/it]
84
+ 2024-04-21 14:41:55,706 - INFO: train loss: 0.81: 57%|#####7 | 35/61 [05:21<03:52, 8.96s/it]
85
+ 2024-04-21 14:42:04,630 - INFO: train loss: 0.82: 59%|#####9 | 36/61 [05:30<03:43, 8.95s/it]
86
+ 2024-04-21 14:42:13,555 - INFO: train loss: 0.81: 61%|###### | 37/61 [05:39<03:34, 8.94s/it]
87
+ 2024-04-21 14:42:22,509 - INFO: train loss: 0.83: 62%|######2 | 38/61 [05:48<03:25, 8.95s/it]
88
+ 2024-04-21 14:42:31,454 - INFO: train loss: 0.82: 64%|######3 | 39/61 [05:57<03:16, 8.95s/it]
89
+ 2024-04-21 14:42:40,413 - INFO: train loss: 0.80: 66%|######5 | 40/61 [06:06<03:07, 8.95s/it]
90
+ 2024-04-21 14:42:49,327 - INFO: train loss: 0.80: 67%|######7 | 41/61 [06:15<02:58, 8.94s/it]
91
+ 2024-04-21 14:42:58,273 - INFO: train loss: 0.78: 69%|######8 | 42/61 [06:24<02:49, 8.94s/it]
92
+ 2024-04-21 14:43:07,232 - INFO: train loss: 0.80: 70%|####### | 43/61 [06:33<02:41, 8.95s/it]
93
+ 2024-04-21 14:43:16,167 - INFO: train loss: 0.79: 72%|#######2 | 44/61 [06:42<02:32, 8.94s/it]
94
+ 2024-04-21 14:43:25,110 - INFO: train loss: 0.76: 74%|#######3 | 45/61 [06:50<02:23, 8.94s/it]
95
+ 2024-04-21 14:43:34,053 - INFO: train loss: 0.77: 75%|#######5 | 46/61 [06:59<02:14, 8.94s/it]
96
+ 2024-04-21 14:43:42,995 - INFO: train loss: 0.78: 77%|#######7 | 47/61 [07:08<02:05, 8.94s/it]
97
+ 2024-04-21 14:43:51,975 - INFO: train loss: 0.81: 79%|#######8 | 48/61 [07:17<01:56, 8.95s/it]
98
+ 2024-04-21 14:44:00,910 - INFO: train loss: 0.83: 80%|######## | 49/61 [07:26<01:47, 8.95s/it]
99
+ 2024-04-21 14:44:09,869 - INFO: train loss: 0.80: 82%|########1 | 50/61 [07:35<01:38, 8.95s/it]
100
+ 2024-04-21 14:44:18,807 - INFO: train loss: 0.77: 84%|########3 | 51/61 [07:44<01:29, 8.95s/it]
101
+ 2024-04-21 14:44:27,727 - INFO: train loss: 0.78: 85%|########5 | 52/61 [07:53<01:20, 8.94s/it]
102
+ 2024-04-21 14:44:36,658 - INFO: train loss: 0.80: 87%|########6 | 53/61 [08:02<01:11, 8.94s/it]
103
+ 2024-04-21 14:44:45,580 - INFO: train loss: 0.81: 89%|########8 | 54/61 [08:11<01:02, 8.93s/it]
104
+ 2024-04-21 14:44:54,501 - INFO: train loss: 0.82: 90%|######### | 55/61 [08:20<00:53, 8.93s/it]
105
+ 2024-04-21 14:45:03,452 - INFO: train loss: 0.81: 92%|#########1| 56/61 [08:29<00:44, 8.94s/it]
106
+ 2024-04-21 14:45:12,387 - INFO: train loss: 0.83: 93%|#########3| 57/61 [08:38<00:35, 8.94s/it]
107
+ 2024-04-21 14:45:21,353 - INFO: train loss: 0.80: 95%|#########5| 58/61 [08:47<00:26, 8.94s/it]
108
+ 2024-04-21 14:45:30,309 - INFO: train loss: 0.79: 97%|#########6| 59/61 [08:56<00:17, 8.95s/it]
109
+ 2024-04-21 14:45:39,259 - INFO: train loss: 0.79: 98%|#########8| 60/61 [09:05<00:08, 8.95s/it]
110
+ 2024-04-21 14:45:48,169 - INFO: train loss: 0.80: 100%|##########| 61/61 [09:14<00:00, 8.94s/it]
111
+ 2024-04-21 14:45:48,169 - INFO: train loss: 0.80: 100%|##########| 61/61 [09:14<00:00, 9.08s/it]
112
+ 2024-04-21 14:45:48,170 - INFO: Saving last model checkpoint to /app/output
113
+ 2024-04-21 14:46:35,145 - INFO: Starting validation inference
114
+ 2024-04-21 14:46:35,145 - INFO: validation progress: 0%| | 0/1 [00:00<?, ?it/s]
115
+ 2024-04-21 14:46:40,403 - INFO: validation progress: 100%|##########| 1/1 [00:05<00:00, 5.26s/it]
116
+ 2024-04-21 14:46:40,406 - INFO: validation progress: 100%|##########| 1/1 [00:05<00:00, 5.26s/it]
117
+ 2024-04-21 14:46:40,421 - INFO: Validation Perplexity: 2.01606
118
+ 2024-04-21 14:46:40,421 - INFO: Mean validation loss: 0.70670