Upload logs
Browse files
logs.log
ADDED
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total: 4 local rank: 0.
|
2 |
+
2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total: 4 local rank: 1.
|
3 |
+
2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 3, total: 4 local rank: 3.
|
4 |
+
2024-04-21 14:34:38,729 - INFO: Training in distributed mode with multiple processes, 1 GPU per process. Process 2, total: 4 local rank: 2.
|
5 |
+
2024-04-21 14:34:39,411 - INFO: Problem Type: text_causal_language_modeling
|
6 |
+
2024-04-21 14:34:39,411 - INFO: Global random seed: 291800
|
7 |
+
2024-04-21 14:34:39,411 - INFO: Preparing the data...
|
8 |
+
2024-04-21 14:34:39,411 - INFO: Setting up automatic validation split...
|
9 |
+
2024-04-21 14:34:39,443 - INFO: Preparing train and validation data
|
10 |
+
2024-04-21 14:34:39,443 - INFO: Loading train dataset...
|
11 |
+
2024-04-21 14:34:40,411 - INFO: Stop token ids: []
|
12 |
+
2024-04-21 14:34:40,417 - INFO: Loading validation dataset...
|
13 |
+
2024-04-21 14:34:40,905 - INFO: Stop token ids: []
|
14 |
+
2024-04-21 14:34:40,909 - INFO: Number of observations in train dataset: 495
|
15 |
+
2024-04-21 14:34:40,909 - INFO: Number of observations in validation dataset: 5
|
16 |
+
2024-04-21 14:34:41,280 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
|
17 |
+
2024-04-21 14:34:41,280 - INFO: Setting pretraining_tp of model config to 1.
|
18 |
+
2024-04-21 14:34:41,283 - INFO: Using bfloat16 for backbone
|
19 |
+
2024-04-21 14:34:41,307 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
|
20 |
+
2024-04-21 14:34:41,307 - INFO: Setting pretraining_tp of model config to 1.
|
21 |
+
2024-04-21 14:34:41,310 - INFO: Using bfloat16 for backbone
|
22 |
+
2024-04-21 14:34:41,314 - INFO: Stop token ids: []
|
23 |
+
2024-04-21 14:34:41,316 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
|
24 |
+
2024-04-21 14:34:41,317 - INFO: Setting pretraining_tp of model config to 1.
|
25 |
+
2024-04-21 14:34:41,319 - INFO: Using bfloat16 for backbone
|
26 |
+
2024-04-21 14:34:41,319 - INFO: Loading meta-llama/Llama-2-13b-hf. This may take a while.
|
27 |
+
2024-04-21 14:34:41,329 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
|
28 |
+
2024-04-21 14:34:41,330 - INFO: Setting pretraining_tp of model config to 1.
|
29 |
+
2024-04-21 14:34:41,332 - INFO: Using bfloat16 for backbone
|
30 |
+
2024-04-21 14:36:27,752 - INFO: Loaded meta-llama/Llama-2-13b-hf.
|
31 |
+
2024-04-21 14:36:27,757 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
|
32 |
+
2024-04-21 14:36:29,940 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
|
33 |
+
2024-04-21 14:36:29,945 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
|
34 |
+
2024-04-21 14:36:29,949 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
|
35 |
+
2024-04-21 14:36:29,949 - INFO: Optimizer AdamW has been provided with parameters {'eps': 1e-08, 'weight_decay': 0.0, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
|
36 |
+
2024-04-21 14:36:29,953 - INFO: Enough space available for saving model weights.Required space: 25632.04MB, Available space: 973827.04MB.
|
37 |
+
2024-04-21 14:36:29,954 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
|
38 |
+
2024-04-21 14:36:29,959 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
|
39 |
+
2024-04-21 14:36:29,964 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
|
40 |
+
2024-04-21 14:36:32,979 - INFO: started process: 2, can_track: False, tracking_mode: TrackingMode.DURING_EPOCH
|
41 |
+
2024-04-21 14:36:32,981 - INFO: started process: 3, can_track: False, tracking_mode: TrackingMode.DURING_EPOCH
|
42 |
+
2024-04-21 14:36:32,982 - INFO: started process: 1, can_track: False, tracking_mode: TrackingMode.DURING_EPOCH
|
43 |
+
2024-04-21 14:36:33,128 - INFO: Evaluation step: 61
|
44 |
+
2024-04-21 14:36:33,132 - INFO: Evaluation step: 61
|
45 |
+
2024-04-21 14:36:33,137 - INFO: Evaluation step: 61
|
46 |
+
2024-04-21 14:36:34,155 - INFO: started process: 0, can_track: True, tracking_mode: TrackingMode.DURING_EPOCH
|
47 |
+
2024-04-21 14:36:34,156 - INFO: Training Epoch: 1 / 1
|
48 |
+
2024-04-21 14:36:34,157 - INFO: train loss: 0%| | 0/61 [00:00<?, ?it/s]
|
49 |
+
2024-04-21 14:36:34,221 - INFO: Evaluation step: 61
|
50 |
+
2024-04-21 14:36:51,365 - INFO: train loss: 1.35: 2%|1 | 1/61 [00:17<17:12, 17.21s/it]
|
51 |
+
2024-04-21 14:37:00,312 - INFO: train loss: 1.40: 3%|3 | 2/61 [00:26<12:08, 12.35s/it]
|
52 |
+
2024-04-21 14:37:09,287 - INFO: train loss: 1.40: 5%|4 | 3/61 [00:35<10:26, 10.81s/it]
|
53 |
+
2024-04-21 14:37:18,309 - INFO: train loss: 1.42: 7%|6 | 4/61 [00:44<09:35, 10.10s/it]
|
54 |
+
2024-04-21 14:37:27,240 - INFO: train loss: 1.39: 8%|8 | 5/61 [00:53<09:02, 9.68s/it]
|
55 |
+
2024-04-21 14:37:36,272 - INFO: train loss: 1.40: 10%|9 | 6/61 [01:02<08:40, 9.46s/it]
|
56 |
+
2024-04-21 14:37:45,212 - INFO: train loss: 1.38: 11%|#1 | 7/61 [01:11<08:21, 9.29s/it]
|
57 |
+
2024-04-21 14:37:54,160 - INFO: train loss: 1.35: 13%|#3 | 8/61 [01:20<08:06, 9.18s/it]
|
58 |
+
2024-04-21 14:38:03,126 - INFO: train loss: 1.31: 15%|#4 | 9/61 [01:28<07:53, 9.11s/it]
|
59 |
+
2024-04-21 14:38:12,051 - INFO: train loss: 1.31: 16%|#6 | 10/61 [01:37<07:41, 9.06s/it]
|
60 |
+
2024-04-21 14:38:21,037 - INFO: train loss: 1.31: 18%|#8 | 11/61 [01:46<07:31, 9.03s/it]
|
61 |
+
2024-04-21 14:38:29,947 - INFO: train loss: 1.26: 20%|#9 | 12/61 [01:55<07:20, 9.00s/it]
|
62 |
+
2024-04-21 14:38:38,903 - INFO: train loss: 1.21: 21%|##1 | 13/61 [02:04<07:11, 8.98s/it]
|
63 |
+
2024-04-21 14:38:47,845 - INFO: train loss: 1.17: 23%|##2 | 14/61 [02:13<07:01, 8.97s/it]
|
64 |
+
2024-04-21 14:38:56,780 - INFO: train loss: 1.15: 25%|##4 | 15/61 [02:22<06:52, 8.96s/it]
|
65 |
+
2024-04-21 14:39:05,765 - INFO: train loss: 1.11: 26%|##6 | 16/61 [02:31<06:43, 8.97s/it]
|
66 |
+
2024-04-21 14:39:14,666 - INFO: train loss: 1.11: 28%|##7 | 17/61 [02:40<06:33, 8.95s/it]
|
67 |
+
2024-04-21 14:39:23,600 - INFO: train loss: 1.13: 30%|##9 | 18/61 [02:49<06:24, 8.94s/it]
|
68 |
+
2024-04-21 14:39:32,533 - INFO: train loss: 1.12: 31%|###1 | 19/61 [02:58<06:15, 8.94s/it]
|
69 |
+
2024-04-21 14:39:41,495 - INFO: train loss: 1.10: 33%|###2 | 20/61 [03:07<06:06, 8.95s/it]
|
70 |
+
2024-04-21 14:39:50,450 - INFO: train loss: 1.06: 34%|###4 | 21/61 [03:16<05:57, 8.95s/it]
|
71 |
+
2024-04-21 14:39:59,374 - INFO: train loss: 1.04: 36%|###6 | 22/61 [03:25<05:48, 8.94s/it]
|
72 |
+
2024-04-21 14:40:08,308 - INFO: train loss: 1.05: 38%|###7 | 23/61 [03:34<05:39, 8.94s/it]
|
73 |
+
2024-04-21 14:40:17,264 - INFO: train loss: 1.03: 39%|###9 | 24/61 [03:43<05:30, 8.94s/it]
|
74 |
+
2024-04-21 14:40:26,187 - INFO: train loss: 1.02: 41%|#### | 25/61 [03:52<05:21, 8.94s/it]
|
75 |
+
2024-04-21 14:40:35,142 - INFO: train loss: 0.98: 43%|####2 | 26/61 [04:00<05:13, 8.94s/it]
|
76 |
+
2024-04-21 14:40:44,061 - INFO: train loss: 0.92: 44%|####4 | 27/61 [04:09<05:03, 8.94s/it]
|
77 |
+
2024-04-21 14:40:52,992 - INFO: train loss: 0.85: 46%|####5 | 28/61 [04:18<04:54, 8.93s/it]
|
78 |
+
2024-04-21 14:41:01,952 - INFO: train loss: 0.83: 48%|####7 | 29/61 [04:27<04:46, 8.94s/it]
|
79 |
+
2024-04-21 14:41:10,919 - INFO: train loss: 0.85: 49%|####9 | 30/61 [04:36<04:37, 8.95s/it]
|
80 |
+
2024-04-21 14:41:19,862 - INFO: train loss: 0.84: 51%|##### | 31/61 [04:45<04:28, 8.95s/it]
|
81 |
+
2024-04-21 14:41:28,813 - INFO: train loss: 0.85: 52%|#####2 | 32/61 [04:54<04:19, 8.95s/it]
|
82 |
+
2024-04-21 14:41:37,748 - INFO: train loss: 0.82: 54%|#####4 | 33/61 [05:03<04:10, 8.94s/it]
|
83 |
+
2024-04-21 14:41:46,762 - INFO: train loss: 0.82: 56%|#####5 | 34/61 [05:12<04:02, 8.97s/it]
|
84 |
+
2024-04-21 14:41:55,706 - INFO: train loss: 0.81: 57%|#####7 | 35/61 [05:21<03:52, 8.96s/it]
|
85 |
+
2024-04-21 14:42:04,630 - INFO: train loss: 0.82: 59%|#####9 | 36/61 [05:30<03:43, 8.95s/it]
|
86 |
+
2024-04-21 14:42:13,555 - INFO: train loss: 0.81: 61%|###### | 37/61 [05:39<03:34, 8.94s/it]
|
87 |
+
2024-04-21 14:42:22,509 - INFO: train loss: 0.83: 62%|######2 | 38/61 [05:48<03:25, 8.95s/it]
|
88 |
+
2024-04-21 14:42:31,454 - INFO: train loss: 0.82: 64%|######3 | 39/61 [05:57<03:16, 8.95s/it]
|
89 |
+
2024-04-21 14:42:40,413 - INFO: train loss: 0.80: 66%|######5 | 40/61 [06:06<03:07, 8.95s/it]
|
90 |
+
2024-04-21 14:42:49,327 - INFO: train loss: 0.80: 67%|######7 | 41/61 [06:15<02:58, 8.94s/it]
|
91 |
+
2024-04-21 14:42:58,273 - INFO: train loss: 0.78: 69%|######8 | 42/61 [06:24<02:49, 8.94s/it]
|
92 |
+
2024-04-21 14:43:07,232 - INFO: train loss: 0.80: 70%|####### | 43/61 [06:33<02:41, 8.95s/it]
|
93 |
+
2024-04-21 14:43:16,167 - INFO: train loss: 0.79: 72%|#######2 | 44/61 [06:42<02:32, 8.94s/it]
|
94 |
+
2024-04-21 14:43:25,110 - INFO: train loss: 0.76: 74%|#######3 | 45/61 [06:50<02:23, 8.94s/it]
|
95 |
+
2024-04-21 14:43:34,053 - INFO: train loss: 0.77: 75%|#######5 | 46/61 [06:59<02:14, 8.94s/it]
|
96 |
+
2024-04-21 14:43:42,995 - INFO: train loss: 0.78: 77%|#######7 | 47/61 [07:08<02:05, 8.94s/it]
|
97 |
+
2024-04-21 14:43:51,975 - INFO: train loss: 0.81: 79%|#######8 | 48/61 [07:17<01:56, 8.95s/it]
|
98 |
+
2024-04-21 14:44:00,910 - INFO: train loss: 0.83: 80%|######## | 49/61 [07:26<01:47, 8.95s/it]
|
99 |
+
2024-04-21 14:44:09,869 - INFO: train loss: 0.80: 82%|########1 | 50/61 [07:35<01:38, 8.95s/it]
|
100 |
+
2024-04-21 14:44:18,807 - INFO: train loss: 0.77: 84%|########3 | 51/61 [07:44<01:29, 8.95s/it]
|
101 |
+
2024-04-21 14:44:27,727 - INFO: train loss: 0.78: 85%|########5 | 52/61 [07:53<01:20, 8.94s/it]
|
102 |
+
2024-04-21 14:44:36,658 - INFO: train loss: 0.80: 87%|########6 | 53/61 [08:02<01:11, 8.94s/it]
|
103 |
+
2024-04-21 14:44:45,580 - INFO: train loss: 0.81: 89%|########8 | 54/61 [08:11<01:02, 8.93s/it]
|
104 |
+
2024-04-21 14:44:54,501 - INFO: train loss: 0.82: 90%|######### | 55/61 [08:20<00:53, 8.93s/it]
|
105 |
+
2024-04-21 14:45:03,452 - INFO: train loss: 0.81: 92%|#########1| 56/61 [08:29<00:44, 8.94s/it]
|
106 |
+
2024-04-21 14:45:12,387 - INFO: train loss: 0.83: 93%|#########3| 57/61 [08:38<00:35, 8.94s/it]
|
107 |
+
2024-04-21 14:45:21,353 - INFO: train loss: 0.80: 95%|#########5| 58/61 [08:47<00:26, 8.94s/it]
|
108 |
+
2024-04-21 14:45:30,309 - INFO: train loss: 0.79: 97%|#########6| 59/61 [08:56<00:17, 8.95s/it]
|
109 |
+
2024-04-21 14:45:39,259 - INFO: train loss: 0.79: 98%|#########8| 60/61 [09:05<00:08, 8.95s/it]
|
110 |
+
2024-04-21 14:45:48,169 - INFO: train loss: 0.80: 100%|##########| 61/61 [09:14<00:00, 8.94s/it]
|
111 |
+
2024-04-21 14:45:48,169 - INFO: train loss: 0.80: 100%|##########| 61/61 [09:14<00:00, 9.08s/it]
|
112 |
+
2024-04-21 14:45:48,170 - INFO: Saving last model checkpoint to /app/output
|
113 |
+
2024-04-21 14:46:35,145 - INFO: Starting validation inference
|
114 |
+
2024-04-21 14:46:35,145 - INFO: validation progress: 0%| | 0/1 [00:00<?, ?it/s]
|
115 |
+
2024-04-21 14:46:40,403 - INFO: validation progress: 100%|##########| 1/1 [00:05<00:00, 5.26s/it]
|
116 |
+
2024-04-21 14:46:40,406 - INFO: validation progress: 100%|##########| 1/1 [00:05<00:00, 5.26s/it]
|
117 |
+
2024-04-21 14:46:40,421 - INFO: Validation Perplexity: 2.01606
|
118 |
+
2024-04-21 14:46:40,421 - INFO: Mean validation loss: 0.70670
|