diff --git "a/log/llama2_chat_13B_70W_CH-LAW-CR.nohup" "b/log/llama2_chat_13B_70W_CH-LAW-CR.nohup" new file mode 100644--- /dev/null +++ "b/log/llama2_chat_13B_70W_CH-LAW-CR.nohup" @@ -0,0 +1,3362 @@ +nohup: 忽略输入 +WARNING:torch.distributed.run: +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:01,036] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) +[2023-08-22 23:48:08,285] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,285] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,285] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,285] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,285] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,285] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,285] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,285] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,285] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl +[2023-08-22 23:48:08,285] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,285] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,287] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,287] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,296] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,296] [INFO] [comm.py:616:init_distributed] cdb=None +[2023-08-22 23:48:08,296] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented +[2023-08-22 23:48:08,298] [INFO] [comm.py:616:init_distributed] cdb=None +08/22/2023 23:48:08 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: True +08/22/2023 23:48:08 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True +[WARNING|logging.py:295] 2023-08-22 23:48:08,724 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +[WARNING|logging.py:295] 2023-08-22 23:48:08,731 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:08 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: True +[WARNING|logging.py:295] 2023-08-22 23:48:08,774 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:08 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: True +[WARNING|logging.py:295] 2023-08-22 23:48:08,793 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:09 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: True +[WARNING|logging.py:295] 2023-08-22 23:48:09,186 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:09 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: True +[WARNING|logging.py:295] 2023-08-22 23:48:09,338 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:09 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True +[INFO|configuration_utils.py:710] 2023-08-22 23:48:09,961 >> loading configuration file /data3/litian/Redemption/LLama-2/chat/13B_HF/config.json +[INFO|configuration_utils.py:768] 2023-08-22 23:48:09,963 >> Model config LlamaConfig { + "_name_or_path": "/data3/litian/Redemption/LLama-2/chat/13B_HF", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 2048, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +[INFO|tokenization_utils_base.py:1837] 2023-08-22 23:48:09,964 >> loading file tokenizer.model +[INFO|tokenization_utils_base.py:1837] 2023-08-22 23:48:09,964 >> loading file added_tokens.json +[INFO|tokenization_utils_base.py:1837] 2023-08-22 23:48:09,964 >> loading file special_tokens_map.json +[INFO|tokenization_utils_base.py:1837] 2023-08-22 23:48:09,964 >> loading file tokenizer_config.json +[WARNING|logging.py:295] 2023-08-22 23:48:09,965 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:09 - INFO - __main__ - training files: /data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json +08/22/2023 23:48:09 - WARNING - root - building dataset... +08/22/2023 23:48:10 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:10 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: True +[WARNING|logging.py:295] 2023-08-22 23:48:10,595 >> You are using the legacy behaviour of the . This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 +08/22/2023 23:48:15 - INFO - __main__ - Num train_samples 737503 +08/22/2023 23:48:15 - INFO - __main__ - training example: +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Belle_alpaca_50W_LawQA_20W_CodeReviewPython_37421_737503/train.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __main__ - Below is an instruction that describes a task. Write a response that appropriately completes the request. + +### Instruction: +在给定的一段文本中找出所有的人名、地名和组织机构名。 +在上海举行的第二十二届国际计算机大会上,中国科技部部长李荣融发表了重要讲话。 + +### Response: 该文本中的人名、地名和组织机构名有: +1. 人名: 李荣融 +2. 地名: 上海 +3. 组织机构名: 国际计算机大会,中国科技部 +解释: +1. 人名: 文本中提到了“中国科技部部长李荣融发表了重要讲话”,因此,李荣融是该文本中的人名。 +2. 地名: 文本中提到了“在上海举行的第二十二届国际计算机大会上”,表明此事情发生的地点是上海,因此上海是该文本中的地名。 +3. 组织机构名:文本中提到了“中国科技部部长李荣融发表了重要讲话”,表明李荣融所在的组织机构是中国科技部,而“在上海举行的第二十二届国际计算机大会”是指国际计算机大会,因此,国际计算机大会和中国科技部均为该文本中的组织机构名。 +08/22/2023 23:48:15 - INFO - __main__ - training files: /data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - WARNING - root - building dataset... +08/22/2023 23:48:15 - INFO - __main__ - Num eval_samples 524 +08/22/2023 23:48:15 - INFO - __main__ - eval example: +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __name__ - training datasets-/data3/litian/Redemption/litian_data/Validation_Belle_alpaca_100W_LawQA_100W_CodeReviewPython_324_524/validation.json has been loaded from disk +08/22/2023 23:48:15 - INFO - __main__ - Below is an instruction that describes a task. Write a response that appropriately completes the request. + +### Instruction: +Question:

I've been playing around with Python off and on for about the past year and recently came up with the following 68 (was 62) lines. I think I'll try making a calculator out of it. I'd really like to know what readers here think of its attributes such as coding style, readability, and feasible purposefulness.

+ +
# notes: separate addresses from data lest the loop of doom cometh
+
+class Interpreter:
+
+  def __init__(self):
+    self.memory = { }
+    self.dictionary = {"mov" : self.mov,
+                       "put" : self.put,
+                       "add" : self.add,
+                       "sub" : self.sub,
+                       "clr" : self.clr,
+                       "cpy" : self.cpy,
+                       "ref" : self.ref }
+    self.hooks = {self.val("0") : self.out }
+
+  def interpret(self, line):
+    x = line.split(" ")
+    vals = tuple(self.val(y) for y in x[1:])
+    dereferenced = []
+    keys_only = tuple(key for key in self.memory)
+    for val in vals:
+      while val in self.memory: val = self.memory[val]
+      dereferenced.append(val)
+    vals = tuple(y for y in dereferenced)
+    self.dictionary[x[0]](vals)
+
+  def val(self, x):
+    return tuple(int(y) for y in str(x).split("."))
+
+  def mov(self, value):
+    self.ptr = value[0]
+
+  def put(self, value):
+    self.memory[self.ptr] = value[0]
+
+  def clr(self, value):
+    if self.ptr in self.hooks and self.ptr in self.memory:
+      x = self.hooks[self.ptr]
+      y = self.memory[self.ptr]
+      for z in y: x(z)
+    del self.memory[self.ptr]
+
+  def add(self, values):
+    self.put(self.mat(values, lambda x, y: x + y))
+
+  def sub(self, values):
+    self.put(self.mat(values, lambda x, y: x - y))
+
+  def mat(self, values, op):
+    a, b = self.memory[values[0]], self.memory[values[1]]
+    if len(a) > len(b): a, b = b, a
+    c = [op(a[x], b[x]) for x in xrange(len(b))] + [x for x in a[len(a):]]
+    return [tuple(x for x in c)]
+
+  def cpy(self, value):
+    self.put(value)
+
+  def out(self, x):
+    print chr(x),
+
+  def ref(self, x):
+    self.put(x)
+
+interp = Interpreter()
+for x in file(__file__.split('/')[-1].split(".")[-2] + ".why"):
+  interp.interpret(x.strip())
+
+ + + Answer:

To allow your module to be loadable by other files, it's customary to write the end of it with a if __name__ == '__main__': conditional like so:

+ +
if __name__ == '__main__':
+    interp = Interpreter()
+    for x in file(__file__.split('/')[-1].split(".")[-2] + ".why"):
+        interp.interpret(x.strip())
+
+ +

Maybe I'm being picky (but you did ask for style input), read PEP8 and try to follow it as best you can (stand). One thing that jumped out at me right away was your 2 space indentation vs. the PEP8 recommendation of 4. One letter variables are usually only recommended for looping vars. You could probably increase the readability of your code by renaming some of those x's, y's, a's, etc.

+ +

Another maxim of Python programming is to use the tools provided, I was pondering what you were doing with:

+ +
__file__.split('/')[-1].split(".")[-2] + ".why"
+
+ +

an alternative that uses existing Python modules (and is more portable across platforms) is:

+ +
os.path.splitext(os.path.basename(__file__))[0] + ".why"
+
+ +

It's about the same length, and is a good deal more clear as to what you're doing as the function names spell it out.

+ + + This is a question and answer from a forum where users review and improve the code of other users. Please output the original code, a summary of the critique, and the revised code using the format ORIGINAL: [write original code here] CRITIQUE: [write critique here] REVISED: [write revision code here]. + +### Response: ORIGINAL: +class Interpreter: + + def __init__(self): + self.memory = { } + self.dictionary = {"mov" : self.mov, + "put" : self.put, + "add" : self.add, + "sub" : self.sub, + "clr" : self.clr, + "cpy" : self.cpy, + "ref" : self.ref } + self.hooks = {self.val("0") : self.out } + + def interpret(self, line): + x = line.split(" ") + vals = tuple(self.val(y) for y in x[1:]) + dereferenced = [] + keys_only = tuple(key for key in self.memory) + for val in vals: + while val in self.memory: val = self.memory[val] + dereferenced.append(val) + vals = tuple(y for y in dereferenced) + self.dictionary[x[0]](vals) + + def val(self, x): + return tuple(int(y) for y in str(x).split(".")) + + def mov(self, value): + self.ptr = value[0] + + def put(self, value): + self.memory[self.ptr] = value[0] + + def clr(self, value): + if self.ptr in self.hooks and self.ptr in self.memory: + x = self.hooks[self.ptr] + y = self.memory[self.ptr] + for z in y: x(z) + del self.memory[self.ptr] + + def add(self, values): + self.put(self.mat(values, lambda x, y: x + y)) + + def sub(self, values): + self.put(self.mat(values, lambda x, y: x - y)) + + def mat(self, values, op): + a, b = self.memory[values[0]], self.memory[values[1]] + if len(a) > len(b): a, b = b, a + c = [op(a[x], b[x]) for x in xrange(len(b))] + [x for x in a[len(a):]] + return [tuple(x for x in c)] + + def cpy(self, value): + self.put(value) + + def out(self, x): + print chr(x), + + def ref(self, x): + self.put(x) + +interp = Interpreter() +for x in file(__file__.split('/')[-1].split(".")[-2] + ".why"): + interp.interpret(x.strip()) + +CRITIQUE: To allow your module to be loadable by other files, it's customary to write the end of it with a if __name__ == '__main__': conditional like so: Maybe I'm being picky (but +[INFO|modeling_utils.py:2600] 2023-08-22 23:48:15,543 >> loading weights file /data3/litian/Redemption/LLama-2/chat/13B_HF/pytorch_model.bin.index.json +[INFO|modeling_utils.py:1172] 2023-08-22 23:48:15,545 >> Instantiating LlamaForCausalLM model under default dtype torch.float16. +[INFO|modeling_utils.py:2694] 2023-08-22 23:48:15,547 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model +[INFO|configuration_utils.py:599] 2023-08-22 23:48:15,551 >> Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +[2023-08-22 23:48:19,889] [INFO] [partition_parameters.py:326:__exit__] finished initializing model with 13.02B parameters + Loading checkpoint shards: 0%| | 0/3 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. + + Loading checkpoint shards: 100%|██████████| 3/3 [02:26<00:00, 45.94s/it][INFO|modeling_utils.py:3337] 2023-08-22 23:50:46,277 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /data3/litian/Redemption/LLama-2/chat/13B_HF. +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. + Loading checkpoint shards: 100%|██████████| 3/3 [02:26<00:00, 48.77s/it] + Loading checkpoint shards: 100%|██████████| 3/3 [02:26<00:00, 45.93s/it] Loading checkpoint shards: 100%|██████████| 3/3 [02:26<00:00, 48.77s/it] +[INFO|configuration_utils.py:559] 2023-08-22 23:50:46,281 >> loading configuration file /data3/litian/Redemption/LLama-2/chat/13B_HF/generation_config.json +[INFO|configuration_utils.py:599] 2023-08-22 23:50:46,282 >> Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +08/22/2023 23:50:46 - INFO - __main__ - len(tokenizer):49954 +08/22/2023 23:50:46 - INFO - __main__ - resize the embedding size by the size of the tokenizer +08/22/2023 23:50:51 - INFO - __main__ - Init new peft model +08/22/2023 23:50:51 - INFO - __main__ - target_modules: ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj'] +08/22/2023 23:50:51 - INFO - __main__ - lora_rank: 8 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +08/22/2023 23:53:48 - INFO - __main__ - model.modules_to_save: {'embed_tokens', 'lm_head'} +[INFO|trainer.py:565] 2023-08-22 23:53:48,457 >> max_steps is given, it will override any value given in num_train_epochs +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +[INFO|deepspeed.py:291] 2023-08-22 23:53:48,619 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB) +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +trainable params: 1,054,351,360 || all params: 13,742,535,680 || trainable%: 7.672174804933816 +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning + warnings.warn( +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Using /mnt/home/litian/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... +Detected CUDA files, patching ldflags +Emitting ninja build file /mnt/home/litian/.cache/torch_extensions/py39_cu118/cpu_adam/build.ninja... +Building extension module cpu_adam... +Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) +[1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/project/anaconda3_test/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o +[2/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include/TH -isystem /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /data/project/anaconda3_test/envs/LLM/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512__ -D__ENABLE_CUDA__ -c /data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o +[3/3] c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/data/project/anaconda3_test/envs/LLM/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cpu_adam.so +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.578516721725464 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.53711175918579 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.52607488632202 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.544535398483276 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.53598952293396 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.50935387611389 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.56642532348633 seconds +Loading extension module cpu_adam... +Time to load cpu_adam op: 39.65215826034546 seconds +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 1 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 2 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 5 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 7 +Adam Optimizer #0 is created with AVX512 arithmetic capability. +Config: alpha=0.000200, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 +[2023-08-22 23:54:30,677] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 4 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 6 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 0 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 3 +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 5: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 4: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 6: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 7: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +08/22/2023 23:54:30 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 8 nodes. +[2023-08-22 23:54:31,435] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False +[2023-08-22 23:54:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer +[2023-08-22 23:54:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer +[2023-08-22 23:54:31,500] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam +[2023-08-22 23:54:31,500] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= +[2023-08-22 23:54:31,500] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False +[2023-08-22 23:54:31,500] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 3 optimizer +[2023-08-22 23:54:31,684] [INFO] [utils.py:785:see_memory_usage] Stage 3 initialize beginning +[2023-08-22 23:54:31,685] [INFO] [utils.py:786:see_memory_usage] MA 2.0 GB Max_MA 2.06 GB CA 2.07 GB Max_CA 3 GB +[2023-08-22 23:54:31,687] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 82.22 GB, percent = 26.1% +[2023-08-22 23:54:31,698] [INFO] [stage3.py:117:__init__] Reduce bucket size 26214400 +[2023-08-22 23:54:31,698] [INFO] [stage3.py:118:__init__] Prefetch bucket size 23592960 +[2023-08-22 23:54:31,857] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] +[2023-08-22 23:54:31,858] [INFO] [utils.py:786:see_memory_usage] MA 2.0 GB Max_MA 2.0 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:31,860] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 83.16 GB, percent = 26.4% +Parameter Offload: Total persistent parameters: 18437120 in 521 params +[2023-08-22 23:54:32,594] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [end] +[2023-08-22 23:54:32,595] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 2.0 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:32,597] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 83.49 GB, percent = 26.5% +[2023-08-22 23:54:32,749] [INFO] [utils.py:785:see_memory_usage] Before creating fp16 partitions +[2023-08-22 23:54:32,750] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 0.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:32,752] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 83.49 GB, percent = 26.5% +[2023-08-22 23:54:33,176] [INFO] [utils.py:785:see_memory_usage] After creating fp16 partitions: 1 +[2023-08-22 23:54:33,177] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 0.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:33,179] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 87.57 GB, percent = 27.8% +[2023-08-22 23:54:33,356] [INFO] [utils.py:785:see_memory_usage] Before creating fp32 partitions +[2023-08-22 23:54:33,357] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 0.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:33,359] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 89.39 GB, percent = 28.4% +[2023-08-22 23:54:34,039] [INFO] [utils.py:785:see_memory_usage] After creating fp32 partitions +[2023-08-22 23:54:34,040] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 0.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:34,043] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 89.46 GB, percent = 28.4% +[2023-08-22 23:54:34,228] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states +[2023-08-22 23:54:34,229] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 0.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:34,232] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 91.65 GB, percent = 29.1% +[2023-08-22 23:54:36,198] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states +[2023-08-22 23:54:36,199] [INFO] [utils.py:786:see_memory_usage] MA 0.04 GB Max_MA 0.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:36,201] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 101.41 GB, percent = 32.2% +[2023-08-22 23:54:36,202] [INFO] [stage3.py:424:_setup_for_real_optimizer] optimizer state initialized +[2023-08-22 23:54:37,163] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer +[2023-08-22 23:54:37,164] [INFO] [utils.py:786:see_memory_usage] MA 0.09 GB Max_MA 1.04 GB CA 2.07 GB Max_CA 2 GB +[2023-08-22 23:54:37,166] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 103.45 GB, percent = 32.9% +[2023-08-22 23:54:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedCPUAdam +[2023-08-22 23:54:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler +[2023-08-22 23:54:37,168] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None +[2023-08-22 23:54:37,168] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)] +[2023-08-22 23:54:37,173] [INFO] [config.py:960:print] DeepSpeedEngine configuration: +[2023-08-22 23:54:37,173] [INFO] [config.py:964:print] activation_checkpointing_config { + "partition_activations": false, + "contiguous_memory_optimization": false, + "cpu_checkpointing": false, + "number_checkpoints": null, + "synchronize_checkpoint_boundary": false, + "profile": false +} +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] amp_enabled .................. False +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] amp_params ................... False +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] autotuning_config ............ { + "enabled": false, + "start_step": null, + "end_step": null, + "metric_path": null, + "arg_mappings": null, + "metric": "throughput", + "model_info": null, + "results_dir": "autotuning_results", + "exps_dir": "autotuning_exps", + "overwrite": true, + "fast": true, + "start_profile_step": 3, + "end_profile_step": 5, + "tuner_type": "gridsearch", + "tuner_early_stopping": 5, + "tuner_num_trials": 50, + "model_info_path": null, + "mp_size": 1, + "max_train_batch_size": null, + "min_train_batch_size": 1, + "max_train_micro_batch_size_per_gpu": 1.024000e+03, + "min_train_micro_batch_size_per_gpu": 1, + "num_tuning_micro_batch_sizes": 3 +} +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] bfloat16_enabled ............. False +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] checkpoint_parallel_write_pipeline False +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] checkpoint_tag_validation_enabled True +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] checkpoint_tag_validation_fail False +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] comms_config ................. +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] communication_data_type ...... None +[2023-08-22 23:54:37,175] [INFO] [config.py:964:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] curriculum_enabled_legacy .... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] curriculum_params_legacy ..... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] data_efficiency_enabled ...... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] dataloader_drop_last ......... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] disable_allgather ............ False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] dump_state ................... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1e-10} +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_enabled ........... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_gas_boundary_resolution 1 +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_layer_name ........ bert.encoder.layer +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_layer_num ......... 0 +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_max_iter .......... 100 +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_stability ......... 1e-06 +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_tol ............... 0.01 +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] eigenvalue_verbose ........... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] elasticity_enabled ........... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] flops_profiler_config ........ { + "enabled": false, + "recompute_fwd_factor": 0.0, + "profile_step": 1, + "module_depth": -1, + "top_modules": 1, + "detailed": true, + "output_file": null +} +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] fp16_auto_cast ............... False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] fp16_enabled ................. True +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] fp16_master_weights_and_gradients False +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] global_rank .................. 0 +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] grad_accum_dtype ............. None +[2023-08-22 23:54:37,178] [INFO] [config.py:964:print] gradient_accumulation_steps .. 8 +[2023-08-22 23:54:37,180] [INFO] [config.py:964:print] gradient_clipping ............ 1.0 +[2023-08-22 23:54:37,180] [INFO] [config.py:964:print] gradient_predivide_factor .... 1.0 +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] initial_dynamic_scale ........ 65536 +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] load_universal_checkpoint .... False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] loss_scale ................... 0 +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] memory_breakdown ............. False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] mics_hierarchial_params_gather False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] mics_shard_size .............. -1 +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] nebula_config ................ { + "enabled": false, + "persistent_storage_path": null, + "persistent_time_interval": 100, + "num_of_version_in_retention": 2, + "enable_nebula_load": true, + "load_path": null +} +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] optimizer_legacy_fusion ...... False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] optimizer_name ............... None +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] optimizer_params ............. None +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] pld_enabled .................. False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] pld_params ................... False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] prescale_gradients ........... False +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] scheduler_name ............... None +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] scheduler_params ............. None +[2023-08-22 23:54:37,181] [INFO] [config.py:964:print] sparse_attention ............. None +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] sparse_gradients_enabled ..... False +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] steps_per_print .............. inf +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] train_batch_size ............. 128 +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] train_micro_batch_size_per_gpu 2 +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] use_node_local_storage ....... False +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] wall_clock_breakdown ......... False +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] world_size ................... 8 +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] zero_allow_untested_optimizer True +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=26214400 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='cpu', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=True) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=23592960 param_persistence_threshold=51200 model_persistence_threshold=sys.maxsize max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] zero_enabled ................. True +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] zero_force_ds_cpu_optimizer .. True +[2023-08-22 23:54:37,183] [INFO] [config.py:964:print] zero_optimization_stage ...... 3 +[2023-08-22 23:54:37,183] [INFO] [config.py:950:print_user_config] json = { + "fp16": { + "enabled": true, + "loss_scale": 0, + "loss_scale_window": 100, + "initial_scale_power": 16, + "hysteresis": 2, + "min_loss_scale": 1e-10 + }, + "zero_optimization": { + "stage": 3, + "offload_optimizer": { + "device": "cpu", + "pin_memory": true + }, + "offload_param": { + "device": "cpu", + "pin_memory": true + }, + "overlap_comm": true, + "contiguous_gradients": true, + "sub_group_size": 1.000000e+09, + "reduce_bucket_size": 2.621440e+07, + "stage3_prefetch_bucket_size": 2.359296e+07, + "stage3_param_persistence_threshold": 5.120000e+04, + "stage3_max_live_parameters": 1.000000e+09, + "stage3_max_reuse_distance": 1.000000e+09, + "stage3_gather_16bit_weights_on_model_save": false + }, + "gradient_accumulation_steps": 8, + "gradient_clipping": 1.0, + "steps_per_print": inf, + "train_batch_size": 128, + "train_micro_batch_size_per_gpu": 2, + "wall_clock_breakdown": false, + "bf16": { + "enabled": false + }, + "zero_allow_untested_optimizer": true +} +[INFO|trainer.py:1686] 2023-08-22 23:54:37,183 >> ***** Running training ***** +[INFO|trainer.py:1687] 2023-08-22 23:54:37,185 >> Num examples = 737,503 +[INFO|trainer.py:1688] 2023-08-22 23:54:37,185 >> Num Epochs = 4 +[INFO|trainer.py:1689] 2023-08-22 23:54:37,185 >> Instantaneous batch size per device = 2 +[INFO|trainer.py:1692] 2023-08-22 23:54:37,185 >> Total train batch size (w. parallel, distributed & accumulation) = 128 +[INFO|trainer.py:1693] 2023-08-22 23:54:37,185 >> Gradient Accumulation steps = 8 +[INFO|trainer.py:1694] 2023-08-22 23:54:37,185 >> Total optimization steps = 17,285 +[INFO|trainer.py:1695] 2023-08-22 23:54:37,191 >> Number of trainable parameters = 1,054,351,360 + 0%| | 0/17285 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|logging.py:295] 2023-08-22 23:54:37,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[2023-08-22 23:55:09,466] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 0%| | 1/17285 [00:32<154:59:23, 32.28s/it] {'loss': 8.2501, 'learning_rate': 0.0, 'epoch': 0.0} + 0%| | 1/17285 [00:32<154:59:23, 32.28s/it][2023-08-22 23:55:33,607] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 0%| | 2/17285 [00:56<131:59:16, 27.49s/it][2023-08-22 23:56:00,376] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 + 0%| | 3/17285 [01:23<130:24:14, 27.16s/it][2023-08-22 23:56:40,198] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 + 0%| | 4/17285 [02:03<154:22:52, 32.16s/it][2023-08-22 23:57:19,971] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 + 0%| | 5/17285 [02:42<167:32:54, 34.91s/it][2023-08-22 23:57:49,992] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048 + 0%| | 6/17285 [03:12<159:34:06, 33.25s/it] 0%| | 7/17285 [03:43<155:22:12, 32.37s/it][2023-08-22 23:58:49,793] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024 + 0%| | 8/17285 [04:12<150:33:07, 31.37s/it] 0%| | 9/17285 [04:47<155:52:01, 32.48s/it] 0%| | 10/17285 [05:12<144:34:28, 30.13s/it] {'loss': 7.9965, 'learning_rate': 6.936416184971099e-07, 'epoch': 0.0} + 0%| | 10/17285 [05:12<144:34:28, 30.13s/it] 0%| | 11/17285 [05:37<137:39:57, 28.69s/it] 0%| | 12/17285 [06:02<132:02:18, 27.52s/it] 0%| | 13/17285 [06:28<129:37:36, 27.02s/it] 0%| | 14/17285 [06:59<135:25:38, 28.23s/it] 0%| | 15/17285 [07:33<143:41:33, 29.95s/it] 0%| | 16/17285 [08:09<152:39:24, 31.82s/it] 0%| | 17/17285 [08:37<146:34:02, 30.56s/it] 0%| | 18/17285 [09:06<144:28:37, 30.12s/it] 0%| | 19/17285 [09:41<151:33:48, 31.60s/it] 0%| | 20/17285 [10:12<150:18:09, 31.34s/it] {'loss': 8.0268, 'learning_rate': 3.0057803468208094e-06, 'epoch': 0.0} + 0%| | 20/17285 [10:12<150:18:09, 31.34s/it] 0%| | 21/17285 [10:50<160:56:01, 33.56s/it] 0%| | 22/17285 [11:26<163:37:03, 34.12s/it] 0%| | 23/17285 [12:01<165:12:09, 34.45s/it] 0%| | 24/17285 [12:30<157:34:19, 32.86s/it] 0%| | 25/17285 [13:00<153:43:00, 32.06s/it] 0%| | 26/17285 [13:35<157:13:43, 32.80s/it] 0%| | 27/17285 [14:01<148:10:45, 30.91s/it] 0%| | 28/17285 [14:33<149:44:41, 31.24s/it] 0%| | 29/17285 [15:17<167:30:38, 34.95s/it] 0%| | 30/17285 [15:47<160:19:24, 33.45s/it] {'loss': 7.9878, 'learning_rate': 5.317919075144509e-06, 'epoch': 0.01} + 0%| | 30/17285 [15:47<160:19:24, 33.45s/it][2023-08-23 00:11:00,392] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1024, reducing to 512 + 0%| | 31/17285 [16:23<163:35:09, 34.13s/it] 0%| | 32/17285 [17:00<167:34:43, 34.97s/it] 0%| | 33/17285 [17:25<154:15:34, 32.19s/it] 0%| | 34/17285 [18:06<166:47:15, 34.81s/it] 0%| | 35/17285 [18:32<153:25:22, 32.02s/it] 0%| | 36/17285 [18:57<144:22:27, 30.13s/it] 0%| | 37/17285 [19:24<138:40:56, 28.95s/it] 0%| | 38/17285 [19:59<148:13:38, 30.94s/it] 0%| | 39/17285 [20:30<148:11:38, 30.93s/it] 0%| | 40/17285 [20:59<145:41:13, 30.41s/it] {'loss': 7.6933, 'learning_rate': 7.398843930635839e-06, 'epoch': 0.01} + 0%| | 40/17285 [20:59<145:41:13, 30.41s/it] 0%| | 41/17285 [21:31<146:42:42, 30.63s/it] 0%| | 42/17285 [22:09<158:03:42, 33.00s/it] 0%| | 43/17285 [22:47<164:31:13, 34.35s/it] 0%| | 44/17285 [23:25<169:48:10, 35.46s/it] 0%| | 45/17285 [23:56<164:05:50, 34.27s/it] 0%| | 46/17285 [24:29<161:29:29, 33.72s/it] 0%| | 47/17285 [25:05<164:43:31, 34.40s/it] 0%| | 48/17285 [25:34<157:56:58, 32.99s/it] 0%| | 49/17285 [26:17<172:19:56, 35.99s/it] 0%| | 50/17285 [26:51<168:28:21, 35.19s/it] {'loss': 7.4624, 'learning_rate': 9.710982658959537e-06, 'epoch': 0.01} + 0%| | 50/17285 [26:51<168:28:21, 35.19s/it] 0%| | 51/17285 [27:17<155:35:02, 32.50s/it] 0%| | 52/17285 [27:48<153:36:38, 32.09s/it] 0%| | 53/17285 [28:25<161:02:16, 33.64s/it] 0%| | 54/17285 [28:51<150:01:33, 31.34s/it] 0%| | 55/17285 [29:17<142:23:05, 29.75s/it] 0%| | 56/17285 [29:49<145:11:03, 30.34s/it] 0%| | 57/17285 [30:20<147:01:29, 30.72s/it] 0%| | 58/17285 [30:51<146:26:18, 30.60s/it] 0%| | 59/17285 [31:23<148:51:20, 31.11s/it] 0%| | 60/17285 [32:01<158:12:56, 33.07s/it] {'loss': 7.3907, 'learning_rate': 1.2023121387283238e-05, 'epoch': 0.01} + 0%| | 60/17285 [32:01<158:12:56, 33.07s/it] 0%| | 61/17285 [32:28<149:33:48, 31.26s/it] 0%| | 62/17285 [32:59<148:50:28, 31.11s/it] 0%| | 63/17285 [33:32<152:42:18, 31.92s/it] 0%| | 64/17285 [34:04<152:22:14, 31.85s/it] 0%| | 65/17285 [34:37<154:11:52, 32.24s/it] 0%| | 66/17285 [35:11<156:34:03, 32.73s/it] 0%| | 67/17285 [35:45<158:10:28, 33.07s/it] 0%| | 68/17285 [36:17<156:51:21, 32.80s/it] 0%| | 69/17285 [36:48<154:26:15, 32.29s/it] 0%| | 70/17285 [37:18<151:21:37, 31.65s/it] {'loss': 7.0312, 'learning_rate': 1.4335260115606938e-05, 'epoch': 0.01} + 0%| | 70/17285 [37:18<151:21:37, 31.65s/it] 0%| | 71/17285 [37:45<144:27:25, 30.21s/it] 0%| | 72/17285 [38:11<137:47:40, 28.82s/it] 0%| | 73/17285 [38:46<147:39:25, 30.88s/it] 0%| | 74/17285 [39:20<150:45:43, 31.53s/it] 0%| | 75/17285 [39:53<153:41:08, 32.15s/it] 0%| | 76/17285 [40:24<151:39:41, 31.73s/it] 0%| | 77/17285 [40:54<148:57:29, 31.16s/it] 0%| | 78/17285 [41:27<152:35:26, 31.92s/it] 0%| | 79/17285 [42:08<165:39:15, 34.66s/it] 0%| | 80/17285 [42:37<156:41:16, 32.79s/it] {'loss': 6.6125, 'learning_rate': 1.6647398843930635e-05, 'epoch': 0.01} + 0%| | 80/17285 [42:37<156:41:16, 32.79s/it] 0%| | 81/17285 [43:11<158:57:42, 33.26s/it] 0%| | 82/17285 [43:41<154:24:48, 32.31s/it] 0%| | 83/17285 [44:12<151:46:10, 31.76s/it] 0%| | 84/17285 [44:52<163:23:38, 34.20s/it] 0%| | 85/17285 [45:32<172:42:02, 36.15s/it] 0%| | 86/17285 [46:03<164:46:48, 34.49s/it] 1%| | 87/17285 [46:36<162:38:55, 34.05s/it] 1%| | 88/17285 [47:02<150:54:53, 31.59s/it] 1%| | 89/17285 [47:35<152:52:38, 32.01s/it] 1%| | 90/17285 [48:07<153:35:52, 32.16s/it] {'loss': 6.3013, 'learning_rate': 1.8959537572254336e-05, 'epoch': 0.02} + 1%| | 90/17285 [48:07<153:35:52, 32.16s/it] 1%| | 91/17285 [48:43<158:53:05, 33.27s/it] 1%| | 92/17285 [49:28<175:13:10, 36.69s/it] 1%| | 93/17285 [50:01<170:25:38, 35.69s/it] 1%| | 94/17285 [50:26<155:27:16, 32.55s/it] 1%| | 95/17285 [50:55<149:01:19, 31.21s/it] 1%| | 96/17285 [51:18<138:30:44, 29.01s/it] 1%| | 97/17285 [51:50<142:12:26, 29.79s/it] 1%| | 98/17285 [52:17<137:38:23, 28.83s/it] 1%| | 99/17285 [52:48<141:43:20, 29.69s/it] 1%| | 100/17285 [53:15<138:02:29, 28.92s/it] {'loss': 6.068, 'learning_rate': 2.1271676300578036e-05, 'epoch': 0.02} + 1%| | 100/17285 [53:15<138:02:29, 28.92s/it] 1%| | 101/17285 [53:48<143:20:12, 30.03s/it] 1%| | 102/17285 [54:25<153:54:29, 32.25s/it] 1%| | 103/17285 [54:51<144:28:37, 30.27s/it] 1%| | 104/17285 [55:27<151:53:53, 31.83s/it] 1%| | 105/17285 [56:01<156:12:03, 32.73s/it] 1%| | 106/17285 [56:29<148:16:13, 31.07s/it] 1%| | 107/17285 [56:58<145:47:19, 30.55s/it] 1%| | 108/17285 [57:37<158:33:47, 33.23s/it] 1%| | 109/17285 [58:07<153:52:57, 32.25s/it] 1%| | 110/17285 [58:48<165:40:22, 34.73s/it] {'loss': 5.8308, 'learning_rate': 2.3583815028901734e-05, 'epoch': 0.02} + 1%| | 110/17285 [58:48<165:40:22, 34.73s/it] 1%| | 111/17285 [59:27<172:15:10, 36.11s/it] 1%| | 112/17285 [1:00:02<170:24:26, 35.72s/it] 1%| | 113/17285 [1:00:37<169:05:27, 35.45s/it] 1%| | 114/17285 [1:01:09<163:55:12, 34.37s/it] 1%| | 115/17285 [1:01:44<165:02:30, 34.60s/it] 1%| | 116/17285 [1:02:17<163:18:24, 34.24s/it] 1%| | 117/17285 [1:02:56<169:21:52, 35.51s/it] 1%| | 118/17285 [1:03:28<165:15:49, 34.66s/it] 1%| | 119/17285 [1:04:08<171:48:14, 36.03s/it] 1%| | 120/17285 [1:04:40<167:11:23, 35.06s/it] {'loss': 5.7656, 'learning_rate': 2.5895953757225434e-05, 'epoch': 0.02} + 1%| | 120/17285 [1:04:40<167:11:23, 35.06s/it] 1%| | 121/17285 [1:05:11<160:53:47, 33.75s/it] 1%| | 122/17285 [1:05:46<162:23:32, 34.06s/it] 1%| | 123/17285 [1:06:25<168:57:34, 35.44s/it] 1%| | 124/17285 [1:06:54<160:53:01, 33.75s/it] 1%| | 125/17285 [1:07:25<155:50:36, 32.69s/it] 1%| | 126/17285 [1:07:54<151:41:01, 31.82s/it] 1%| | 127/17285 [1:08:32<160:33:32, 33.69s/it] 1%| | 128/17285 [1:09:04<157:36:12, 33.07s/it] 1%| | 129/17285 [1:09:36<155:19:05, 32.59s/it] 1%| | 130/17285 [1:10:07<152:57:40, 32.10s/it] {'loss': 5.5955, 'learning_rate': 2.8208092485549138e-05, 'epoch': 0.02} + 1%| | 130/17285 [1:10:07<152:57:40, 32.10s/it] 1%| | 131/17285 [1:10:42<157:47:16, 33.11s/it] 1%| | 132/17285 [1:11:12<153:36:04, 32.24s/it] 1%| | 133/17285 [1:11:44<153:06:34, 32.14s/it] 1%| | 134/17285 [1:12:14<150:24:47, 31.57s/it] 1%| | 135/17285 [1:12:49<154:38:53, 32.46s/it] 1%| | 136/17285 [1:13:15<146:06:20, 30.67s/it] 1%| | 137/17285 [1:13:49<149:34:48, 31.40s/it] 1%| | 138/17285 [1:14:27<159:58:54, 33.59s/it] 1%| | 139/17285 [1:15:00<158:39:47, 33.31s/it] 1%| | 140/17285 [1:15:29<153:08:43, 32.16s/it] {'loss': 5.3842, 'learning_rate': 3.0520231213872835e-05, 'epoch': 0.02} + 1%| | 140/17285 [1:15:29<153:08:43, 32.16s/it] 1%| | 141/17285 [1:16:06<159:56:58, 33.59s/it] 1%| | 142/17285 [1:16:37<156:02:54, 32.77s/it] 1%| | 143/17285 [1:17:15<163:45:33, 34.39s/it] 1%| | 144/17285 [1:17:50<163:39:06, 34.37s/it] 1%| | 145/17285 [1:18:19<156:34:46, 32.89s/it] 1%| | 146/17285 [1:18:51<155:11:13, 32.60s/it] 1%| | 147/17285 [1:19:19<148:30:00, 31.19s/it] 1%| | 148/17285 [1:19:52<151:49:46, 31.90s/it] 1%| | 149/17285 [1:20:23<149:20:46, 31.38s/it] 1%| | 150/17285 [1:20:53<148:12:29, 31.14s/it] {'loss': 5.2866, 'learning_rate': 3.283236994219653e-05, 'epoch': 0.03} + 1%| | 150/17285 [1:20:53<148:12:29, 31.14s/it] 1%| | 151/17285 [1:21:33<160:18:17, 33.68s/it] 1%| | 152/17285 [1:22:05<158:03:02, 33.21s/it] 1%| | 153/17285 [1:22:38<158:07:37, 33.23s/it] 1%| | 154/17285 [1:23:12<158:40:04, 33.34s/it] 1%| | 155/17285 [1:23:45<159:08:14, 33.44s/it] 1%| | 156/17285 [1:24:12<148:38:53, 31.24s/it] 1%| | 157/17285 [1:24:46<152:49:22, 32.12s/it] 1%| | 158/17285 [1:25:19<153:48:22, 32.33s/it] 1%| | 159/17285 [1:25:54<158:26:30, 33.31s/it] 1%| | 160/17285 [1:26:20<147:25:26, 30.99s/it] {'loss': 5.0532, 'learning_rate': 3.514450867052023e-05, 'epoch': 0.03} + 1%| | 160/17285 [1:26:20<147:25:26, 30.99s/it] 1%| | 161/17285 [1:26:45<139:58:26, 29.43s/it] 1%| | 162/17285 [1:27:16<141:30:35, 29.75s/it] 1%| | 163/17285 [1:27:47<143:54:44, 30.26s/it] 1%| | 164/17285 [1:28:15<139:50:51, 29.41s/it] 1%| | 165/17285 [1:28:44<139:35:51, 29.35s/it] 1%| | 166/17285 [1:29:11<136:29:08, 28.70s/it] 1%| | 167/17285 [1:29:59<164:14:03, 34.54s/it] 1%| | 168/17285 [1:30:30<158:55:01, 33.42s/it] 1%| | 169/17285 [1:30:57<150:01:15, 31.55s/it] 1%| | 170/17285 [1:31:34<157:35:18, 33.15s/it] {'loss': 4.9624, 'learning_rate': 3.7456647398843934e-05, 'epoch': 0.03} + 1%| | 170/17285 [1:31:34<157:35:18, 33.15s/it] 1%| | 171/17285 [1:32:04<152:12:47, 32.02s/it] 1%| | 172/17285 [1:32:30<144:44:49, 30.45s/it] 1%| | 173/17285 [1:33:02<145:49:15, 30.68s/it] 1%| | 174/17285 [1:33:27<138:07:35, 29.06s/it] 1%| | 175/17285 [1:34:00<143:21:13, 30.16s/it] 1%| | 176/17285 [1:34:26<137:55:20, 29.02s/it] 1%| | 177/17285 [1:35:02<148:17:15, 31.20s/it] 1%| | 178/17285 [1:35:28<140:59:42, 29.67s/it] 1%| | 179/17285 [1:35:56<137:28:21, 28.93s/it] 1%| | 180/17285 [1:36:25<137:53:28, 29.02s/it] {'loss': 4.8342, 'learning_rate': 3.976878612716764e-05, 'epoch': 0.03} + 1%| | 180/17285 [1:36:25<137:53:28, 29.02s/it] 1%| | 181/17285 [1:36:58<143:21:10, 30.17s/it] 1%| | 182/17285 [1:37:27<141:32:00, 29.79s/it] 1%| | 183/17285 [1:37:56<141:18:00, 29.74s/it] 1%| | 184/17285 [1:38:26<141:51:36, 29.86s/it] 1%| | 185/17285 [1:39:00<146:32:21, 30.85s/it] 1%| | 186/17285 [1:39:24<137:02:22, 28.85s/it] 1%| | 187/17285 [1:39:50<132:41:21, 27.94s/it] 1%| | 188/17285 [1:40:26<144:08:51, 30.35s/it] 1%| | 189/17285 [1:40:51<136:52:21, 28.82s/it] 1%| | 190/17285 [1:41:30<151:00:44, 31.80s/it] {'loss': 4.6055, 'learning_rate': 4.2080924855491335e-05, 'epoch': 0.03} + 1%| | 190/17285 [1:41:30<151:00:44, 31.80s/it] 1%| | 191/17285 [1:42:07<158:25:26, 33.36s/it] 1%| | 192/17285 [1:42:37<154:23:18, 32.52s/it] 1%| | 193/17285 [1:43:08<151:45:33, 31.96s/it] 1%| | 194/17285 [1:43:40<151:33:21, 31.92s/it] 1%| | 195/17285 [1:44:07<145:19:21, 30.61s/it] 1%| | 196/17285 [1:44:43<152:48:05, 32.19s/it] 1%| | 197/17285 [1:45:09<144:13:32, 30.38s/it] 1%| | 198/17285 [1:45:43<148:32:00, 31.29s/it] 1%| | 199/17285 [1:46:22<160:36:46, 33.84s/it] 1%| | 200/17285 [1:46:52<154:41:27, 32.60s/it] {'loss': 4.5276, 'learning_rate': 4.439306358381503e-05, 'epoch': 0.03} + 1%| | 200/17285 [1:46:52<154:41:27, 32.60s/it] 1%| | 201/17285 [1:47:32<164:56:49, 34.76s/it] 1%| | 202/17285 [1:48:06<163:22:45, 34.43s/it] 1%| | 203/17285 [1:48:39<162:37:47, 34.27s/it] 1%| | 204/17285 [1:49:14<162:43:11, 34.29s/it] 1%| | 205/17285 [1:49:51<167:03:46, 35.21s/it] 1%| | 206/17285 [1:50:17<153:10:54, 32.29s/it] 1%| | 207/17285 [1:50:48<152:20:04, 32.11s/it] 1%| | 208/17285 [1:51:24<158:01:11, 33.31s/it] 1%| | 209/17285 [1:52:00<161:00:15, 33.94s/it] 1%| | 210/17285 [1:52:28<152:20:44, 32.12s/it] {'loss': 4.3676, 'learning_rate': 4.670520231213873e-05, 'epoch': 0.04} + 1%| | 210/17285 [1:52:28<152:20:44, 32.12s/it] 1%| | 211/17285 [1:53:11<168:31:52, 35.53s/it] 1%| | 212/17285 [1:53:38<155:49:44, 32.86s/it] 1%| | 213/17285 [1:54:05<148:17:29, 31.27s/it] 1%| | 214/17285 [1:54:35<146:27:12, 30.88s/it] 1%| | 215/17285 [1:55:00<137:40:28, 29.04s/it] 1%| | 216/17285 [1:55:27<134:23:47, 28.35s/it] 1%|▏ | 217/17285 [1:55:55<133:27:01, 28.15s/it] 1%|▏ | 218/17285 [1:56:24<134:44:00, 28.42s/it] 1%|▏ | 219/17285 [1:56:50<131:43:36, 27.79s/it] 1%|▏ | 220/17285 [1:57:25<142:18:06, 30.02s/it] {'loss': 4.2029, 'learning_rate': 4.9017341040462426e-05, 'epoch': 0.04} + 1%|▏ | 220/17285 [1:57:25<142:18:06, 30.02s/it] 1%|▏ | 221/17285 [1:58:02<151:48:05, 32.03s/it] 1%|▏ | 222/17285 [1:58:29<145:26:20, 30.69s/it] 1%|▏ | 223/17285 [1:59:01<146:55:12, 31.00s/it] 1%|▏ | 224/17285 [1:59:32<146:28:11, 30.91s/it] 1%|▏ | 225/17285 [2:00:08<153:41:02, 32.43s/it] 1%|▏ | 226/17285 [2:00:37<149:05:38, 31.46s/it] 1%|▏ | 227/17285 [2:01:17<160:59:21, 33.98s/it] 1%|▏ | 228/17285 [2:01:51<160:40:09, 33.91s/it] 1%|▏ | 229/17285 [2:02:28<166:08:27, 35.07s/it] 1%|▏ | 230/17285 [2:03:01<163:00:37, 34.41s/it] {'loss': 4.0336, 'learning_rate': 5.1329479768786124e-05, 'epoch': 0.04} + 1%|▏ | 230/17285 [2:03:01<163:00:37, 34.41s/it] 1%|▏ | 231/17285 [2:03:35<161:26:44, 34.08s/it] 1%|▏ | 232/17285 [2:04:04<154:47:43, 32.68s/it] 1%|▏ | 233/17285 [2:04:42<161:49:55, 34.17s/it] 1%|▏ | 234/17285 [2:05:08<150:27:17, 31.77s/it] 1%|▏ | 235/17285 [2:05:42<154:16:56, 32.58s/it] 1%|▏ | 236/17285 [2:06:12<150:30:28, 31.78s/it] 1%|▏ | 237/17285 [2:06:37<140:14:08, 29.61s/it] 1%|▏ | 238/17285 [2:07:10<146:04:06, 30.85s/it] 1%|▏ | 239/17285 [2:07:48<155:51:39, 32.92s/it] 1%|▏ | 240/17285 [2:08:18<151:39:32, 32.03s/it] {'loss': 3.8993, 'learning_rate': 5.364161849710983e-05, 'epoch': 0.04} + 1%|▏ | 240/17285 [2:08:18<151:39:32, 32.03s/it] 1%|▏ | 241/17285 [2:08:44<142:57:41, 30.20s/it] 1%|▏ | 242/17285 [2:09:16<145:02:11, 30.64s/it] 1%|▏ | 243/17285 [2:09:47<146:20:32, 30.91s/it] 1%|▏ | 244/17285 [2:10:15<142:26:32, 30.09s/it] 1%|▏ | 245/17285 [2:10:48<146:22:10, 30.92s/it] 1%|▏ | 246/17285 [2:11:15<139:40:26, 29.51s/it] 1%|▏ | 247/17285 [2:11:49<146:41:00, 30.99s/it] 1%|▏ | 248/17285 [2:12:18<144:31:22, 30.54s/it] 1%|▏ | 249/17285 [2:12:48<142:26:14, 30.10s/it] 1%|▏ | 250/17285 [2:13:15<139:07:47, 29.40s/it] {'loss': 3.834, 'learning_rate': 5.595375722543353e-05, 'epoch': 0.04} + 1%|▏ | 250/17285 [2:13:15<139:07:47, 29.40s/it] 1%|▏ | 251/17285 [2:13:47<142:07:00, 30.04s/it] 1%|▏ | 252/17285 [2:14:14<137:24:32, 29.04s/it] 1%|▏ | 253/17285 [2:14:44<140:01:01, 29.59s/it] 1%|▏ | 254/17285 [2:15:30<162:17:19, 34.30s/it] 1%|▏ | 255/17285 [2:15:59<155:04:25, 32.78s/it] 1%|▏ | 256/17285 [2:16:33<156:54:50, 33.17s/it] 1%|▏ | 257/17285 [2:17:03<152:48:45, 32.31s/it] 1%|▏ | 258/17285 [2:17:32<147:19:17, 31.15s/it] 1%|▏ | 259/17285 [2:18:09<155:16:01, 32.83s/it] 2%|▏ | 260/17285 [2:18:39<151:48:34, 32.10s/it] {'loss': 3.7466, 'learning_rate': 5.8265895953757235e-05, 'epoch': 0.05} + 2%|▏ | 260/17285 [2:18:39<151:48:34, 32.10s/it] 2%|▏ | 261/17285 [2:19:12<152:28:57, 32.24s/it] 2%|▏ | 262/17285 [2:19:44<152:11:33, 32.19s/it] 2%|▏ | 263/17285 [2:20:19<156:37:15, 33.12s/it] 2%|▏ | 264/17285 [2:20:46<147:41:55, 31.24s/it] 2%|▏ | 265/17285 [2:21:16<146:31:05, 30.99s/it] 2%|▏ | 266/17285 [2:21:52<152:55:22, 32.35s/it] 2%|▏ | 267/17285 [2:22:25<154:13:18, 32.62s/it] 2%|▏ | 268/17285 [2:22:57<153:42:25, 32.52s/it] 2%|▏ | 269/17285 [2:23:31<156:12:34, 33.05s/it] 2%|▏ | 270/17285 [2:24:05<156:42:18, 33.16s/it] {'loss': 3.6144, 'learning_rate': 6.057803468208093e-05, 'epoch': 0.05} + 2%|▏ | 270/17285 [2:24:05<156:42:18, 33.16s/it] 2%|▏ | 271/17285 [2:24:47<168:51:30, 35.73s/it] 2%|▏ | 272/17285 [2:25:22<168:38:31, 35.69s/it] 2%|▏ | 273/17285 [2:25:48<155:14:05, 32.85s/it] 2%|▏ | 274/17285 [2:26:20<152:48:57, 32.34s/it] 2%|▏ | 275/17285 [2:26:48<146:35:56, 31.03s/it] 2%|▏ | 276/17285 [2:27:21<150:03:26, 31.76s/it] 2%|▏ | 277/17285 [2:27:51<147:31:53, 31.23s/it] 2%|▏ | 278/17285 [2:28:18<141:55:34, 30.04s/it] 2%|▏ | 279/17285 [2:28:51<146:18:13, 30.97s/it] 2%|▏ | 280/17285 [2:29:16<136:35:24, 28.92s/it] {'loss': 3.4977, 'learning_rate': 6.289017341040462e-05, 'epoch': 0.05} + 2%|▏ | 280/17285 [2:29:16<136:35:24, 28.92s/it] 2%|▏ | 281/17285 [2:29:44<135:36:00, 28.71s/it] 2%|▏ | 282/17285 [2:30:15<138:36:41, 29.35s/it] 2%|▏ | 283/17285 [2:30:43<136:39:34, 28.94s/it] 2%|▏ | 284/17285 [2:31:11<135:57:42, 28.79s/it] 2%|▏ | 285/17285 [2:31:45<143:09:42, 30.32s/it] 2%|▏ | 286/17285 [2:32:15<143:14:42, 30.34s/it] 2%|▏ | 287/17285 [2:32:42<137:25:09, 29.10s/it] 2%|▏ | 288/17285 [2:33:08<134:11:00, 28.42s/it] 2%|▏ | 289/17285 [2:33:38<136:28:34, 28.91s/it] 2%|▏ | 290/17285 [2:34:12<142:30:54, 30.19s/it] {'loss': 3.4428, 'learning_rate': 6.520231213872833e-05, 'epoch': 0.05} + 2%|▏ | 290/17285 [2:34:12<142:30:54, 30.19s/it] 2%|▏ | 291/17285 [2:34:55<161:18:32, 34.17s/it] 2%|▏ | 292/17285 [2:35:26<156:59:07, 33.26s/it] 2%|▏ | 293/17285 [2:36:04<163:32:10, 34.65s/it] 2%|▏ | 294/17285 [2:36:39<164:10:11, 34.78s/it] 2%|▏ | 295/17285 [2:37:11<159:42:28, 33.84s/it] 2%|▏ | 296/17285 [2:37:41<155:03:10, 32.86s/it] 2%|▏ | 297/17285 [2:38:12<151:31:30, 32.11s/it] 2%|▏ | 298/17285 [2:38:46<154:51:39, 32.82s/it] 2%|▏ | 299/17285 [2:39:30<169:44:46, 35.98s/it] 2%|▏ | 300/17285 [2:40:05<169:12:41, 35.86s/it] {'loss': 3.2823, 'learning_rate': 6.751445086705203e-05, 'epoch': 0.05} + 2%|▏ | 300/17285 [2:40:05<169:12:41, 35.86s/it] 2%|▏ | 301/17285 [2:40:32<156:28:48, 33.17s/it] 2%|▏ | 302/17285 [2:41:05<156:54:42, 33.26s/it] 2%|▏ | 303/17285 [2:41:37<154:16:39, 32.71s/it] 2%|▏ | 304/17285 [2:42:10<154:41:56, 32.80s/it] 2%|▏ | 305/17285 [2:42:49<163:07:56, 34.59s/it] 2%|▏ | 306/17285 [2:43:27<168:36:19, 35.75s/it] 2%|▏ | 307/17285 [2:43:52<153:41:08, 32.59s/it] 2%|▏ | 308/17285 [2:44:25<153:53:49, 32.63s/it] 2%|▏ | 309/17285 [2:44:57<153:12:15, 32.49s/it] 2%|▏ | 310/17285 [2:45:25<146:38:38, 31.10s/it] {'loss': 3.2296, 'learning_rate': 6.982658959537573e-05, 'epoch': 0.05} + 2%|▏ | 310/17285 [2:45:25<146:38:38, 31.10s/it] 2%|▏ | 311/17285 [2:45:59<150:56:55, 32.01s/it] 2%|▏ | 312/17285 [2:46:39<162:14:18, 34.41s/it] 2%|▏ | 313/17285 [2:47:10<156:26:43, 33.18s/it] 2%|▏ | 314/17285 [2:47:43<156:41:54, 33.24s/it] 2%|▏ | 315/17285 [2:48:13<152:48:59, 32.42s/it] 2%|▏ | 316/17285 [2:48:42<147:51:45, 31.37s/it] 2%|▏ | 317/17285 [2:49:09<140:49:51, 29.88s/it] 2%|▏ | 318/17285 [2:49:40<142:30:11, 30.24s/it] 2%|▏ | 319/17285 [2:50:18<154:02:36, 32.69s/it] 2%|▏ | 320/17285 [2:50:52<154:54:02, 32.87s/it] {'loss': 3.1029, 'learning_rate': 7.213872832369943e-05, 'epoch': 0.06} + 2%|▏ | 320/17285 [2:50:52<154:54:02, 32.87s/it] 2%|▏ | 321/17285 [2:51:17<145:00:18, 30.77s/it] 2%|▏ | 322/17285 [2:51:57<157:07:16, 33.35s/it] 2%|▏ | 323/17285 [2:52:27<152:53:39, 32.45s/it] 2%|▏ | 324/17285 [2:52:52<142:06:50, 30.16s/it] 2%|▏ | 325/17285 [2:53:22<142:17:23, 30.20s/it] 2%|▏ | 326/17285 [2:53:59<151:11:53, 32.10s/it] 2%|▏ | 327/17285 [2:54:32<152:59:55, 32.48s/it] 2%|▏ | 328/17285 [2:55:00<146:36:59, 31.13s/it] 2%|▏ | 329/17285 [2:55:34<150:46:38, 32.01s/it] 2%|▏ | 330/17285 [2:56:06<150:38:14, 31.98s/it] {'loss': 3.1717, 'learning_rate': 7.445086705202312e-05, 'epoch': 0.06} + 2%|▏ | 330/17285 [2:56:06<150:38:14, 31.98s/it] 2%|▏ | 331/17285 [2:56:35<146:06:10, 31.02s/it] 2%|▏ | 332/17285 [2:57:08<149:34:22, 31.76s/it] 2%|▏ | 333/17285 [2:57:39<147:57:01, 31.42s/it] 2%|▏ | 334/17285 [2:58:08<144:04:53, 30.60s/it] 2%|▏ | 335/17285 [2:58:40<147:11:43, 31.26s/it] 2%|▏ | 336/17285 [2:59:08<141:22:02, 30.03s/it] 2%|▏ | 337/17285 [2:59:48<155:34:18, 33.05s/it] 2%|▏ | 338/17285 [3:00:13<144:50:46, 30.77s/it] 2%|▏ | 339/17285 [3:00:42<142:33:32, 30.29s/it] 2%|▏ | 340/17285 [3:01:12<141:31:12, 30.07s/it] {'loss': 3.0968, 'learning_rate': 7.676300578034682e-05, 'epoch': 0.06} + 2%|▏ | 340/17285 [3:01:12<141:31:12, 30.07s/it] 2%|▏ | 341/17285 [3:01:54<158:18:32, 33.64s/it] 2%|▏ | 342/17285 [3:02:24<152:42:21, 32.45s/it] 2%|▏ | 343/17285 [3:02:54<149:29:40, 31.77s/it] 2%|▏ | 344/17285 [3:03:28<152:38:28, 32.44s/it] 2%|▏ | 345/17285 [3:03:59<151:41:40, 32.24s/it] 2%|▏ | 346/17285 [3:04:25<142:34:13, 30.30s/it] 2%|▏ | 347/17285 [3:05:02<151:34:32, 32.22s/it] 2%|▏ | 348/17285 [3:05:38<157:09:46, 33.41s/it] 2%|▏ | 349/17285 [3:06:12<157:17:35, 33.44s/it] 2%|▏ | 350/17285 [3:06:39<148:08:48, 31.49s/it] {'loss': 2.9926, 'learning_rate': 7.907514450867053e-05, 'epoch': 0.06} + 2%|▏ | 350/17285 [3:06:39<148:08:48, 31.49s/it] 2%|▏ | 351/17285 [3:07:09<146:28:52, 31.14s/it] 2%|▏ | 352/17285 [3:07:40<146:15:39, 31.10s/it] 2%|▏ | 353/17285 [3:08:15<151:37:40, 32.24s/it] 2%|▏ | 354/17285 [3:08:47<151:50:24, 32.29s/it] 2%|▏ | 355/17285 [3:09:14<144:34:19, 30.74s/it] 2%|▏ | 356/17285 [3:09:39<135:45:05, 28.87s/it] 2%|▏ | 357/17285 [3:10:10<138:28:19, 29.45s/it] 2%|▏ | 358/17285 [3:10:39<137:59:01, 29.35s/it] 2%|▏ | 359/17285 [3:11:19<152:40:51, 32.47s/it] 2%|▏ | 360/17285 [3:11:50<151:36:46, 32.25s/it] {'loss': 3.0021, 'learning_rate': 8.138728323699423e-05, 'epoch': 0.06} + 2%|▏ | 360/17285 [3:11:50<151:36:46, 32.25s/it] 2%|▏ | 361/17285 [3:12:22<151:33:57, 32.24s/it] 2%|▏ | 362/17285 [3:12:58<155:32:21, 33.09s/it] 2%|▏ | 363/17285 [3:13:27<149:54:07, 31.89s/it] 2%|▏ | 364/17285 [3:14:02<154:54:28, 32.96s/it] 2%|▏ | 365/17285 [3:14:28<145:10:03, 30.89s/it] 2%|▏ | 366/17285 [3:15:03<150:27:11, 32.01s/it] 2%|▏ | 367/17285 [3:15:28<140:49:06, 29.96s/it] 2%|▏ | 368/17285 [3:15:59<142:06:18, 30.24s/it] 2%|▏ | 369/17285 [3:16:39<156:38:44, 33.34s/it] 2%|▏ | 370/17285 [3:17:12<156:07:47, 33.23s/it] {'loss': 2.891, 'learning_rate': 8.369942196531792e-05, 'epoch': 0.06} + 2%|▏ | 370/17285 [3:17:12<156:07:47, 33.23s/it] 2%|▏ | 371/17285 [3:17:41<148:58:49, 31.71s/it] 2%|▏ | 372/17285 [3:18:09<143:50:02, 30.62s/it] 2%|▏ | 373/17285 [3:18:36<138:48:47, 29.55s/it] 2%|▏ | 374/17285 [3:19:08<143:02:39, 30.45s/it] 2%|▏ | 375/17285 [3:19:45<152:02:43, 32.37s/it] 2%|▏ | 376/17285 [3:20:19<154:11:11, 32.83s/it] 2%|▏ | 377/17285 [3:20:47<147:24:56, 31.39s/it] 2%|▏ | 378/17285 [3:21:32<166:10:28, 35.38s/it] 2%|▏ | 379/17285 [3:22:02<158:21:46, 33.72s/it] 2%|▏ | 380/17285 [3:22:35<158:11:50, 33.69s/it] {'loss': 2.8498, 'learning_rate': 8.601156069364162e-05, 'epoch': 0.07} + 2%|▏ | 380/17285 [3:22:35<158:11:50, 33.69s/it] 2%|▏ | 381/17285 [3:23:00<146:15:39, 31.15s/it] 2%|▏ | 382/17285 [3:23:33<148:52:17, 31.71s/it] 2%|▏ | 383/17285 [3:24:06<150:10:59, 31.99s/it] 2%|▏ | 384/17285 [3:24:46<161:48:00, 34.46s/it] 2%|▏ | 385/17285 [3:25:19<159:35:25, 34.00s/it] 2%|▏ | 386/17285 [3:25:49<154:09:29, 32.84s/it] 2%|▏ | 387/17285 [3:26:20<150:42:23, 32.11s/it] 2%|▏ | 388/17285 [3:26:55<155:05:50, 33.04s/it] 2%|▏ | 389/17285 [3:27:37<168:07:13, 35.82s/it] 2%|▏ | 390/17285 [3:28:10<164:26:21, 35.04s/it] {'loss': 2.8172, 'learning_rate': 8.832369942196532e-05, 'epoch': 0.07} + 2%|▏ | 390/17285 [3:28:10<164:26:21, 35.04s/it] 2%|▏ | 391/17285 [3:28:40<156:52:30, 33.43s/it] 2%|▏ | 392/17285 [3:29:13<155:41:43, 33.18s/it] 2%|▏ | 393/17285 [3:29:45<155:05:27, 33.05s/it] 2%|▏ | 394/17285 [3:30:19<155:20:27, 33.11s/it] 2%|▏ | 395/17285 [3:30:44<143:46:21, 30.64s/it] 2%|▏ | 396/17285 [3:31:18<149:14:20, 31.81s/it] 2%|▏ | 397/17285 [3:31:58<160:39:10, 34.25s/it] 2%|▏ | 398/17285 [3:32:30<157:27:53, 33.57s/it] 2%|▏ | 399/17285 [3:33:01<153:25:18, 32.71s/it] 2%|▏ | 400/17285 [3:33:30<148:35:21, 31.68s/it] {'loss': 2.8302, 'learning_rate': 9.063583815028902e-05, 'epoch': 0.07} + 2%|▏ | 400/17285 [3:33:30<148:35:21, 31.68s/it] 2%|▏ | 401/17285 [3:34:02<148:52:56, 31.74s/it] 2%|▏ | 402/17285 [3:34:33<147:27:37, 31.44s/it] 2%|▏ | 403/17285 [3:35:00<142:01:28, 30.29s/it] 2%|▏ | 404/17285 [3:35:32<144:38:32, 30.85s/it] 2%|▏ | 405/17285 [3:36:02<142:53:49, 30.48s/it] 2%|▏ | 406/17285 [3:36:40<153:03:21, 32.64s/it] 2%|▏ | 407/17285 [3:37:10<149:18:22, 31.85s/it] 2%|▏ | 408/17285 [3:37:41<148:20:04, 31.64s/it] 2%|▏ | 409/17285 [3:38:15<151:16:24, 32.27s/it] 2%|▏ | 410/17285 [3:38:49<154:26:50, 32.95s/it] {'loss': 2.7333, 'learning_rate': 9.294797687861271e-05, 'epoch': 0.07} + 2%|▏ | 410/17285 [3:38:49<154:26:50, 32.95s/it] 2%|▏ | 411/17285 [3:39:16<146:15:36, 31.20s/it] 2%|▏ | 412/17285 [3:39:41<137:23:29, 29.31s/it] 2%|▏ | 413/17285 [3:40:13<141:03:12, 30.10s/it] 2%|▏ | 414/17285 [3:40:42<139:15:10, 29.71s/it] 2%|▏ | 415/17285 [3:41:18<148:19:47, 31.65s/it] 2%|▏ | 416/17285 [3:41:44<140:37:29, 30.01s/it] 2%|▏ | 417/17285 [3:42:13<138:17:21, 29.51s/it] 2%|▏ | 418/17285 [3:42:48<146:36:54, 31.29s/it] 2%|▏ | 419/17285 [3:43:19<146:23:32, 31.25s/it] 2%|▏ | 420/17285 [3:43:47<141:37:46, 30.23s/it] {'loss': 2.7135, 'learning_rate': 9.526011560693642e-05, 'epoch': 0.07} + 2%|▏ | 420/17285 [3:43:47<141:37:46, 30.23s/it] 2%|▏ | 421/17285 [3:44:16<139:20:56, 29.75s/it] 2%|▏ | 422/17285 [3:44:41<132:44:49, 28.34s/it] 2%|▏ | 423/17285 [3:45:23<151:52:20, 32.42s/it] 2%|▏ | 424/17285 [3:45:49<143:27:18, 30.63s/it] 2%|▏ | 425/17285 [3:46:19<141:41:48, 30.26s/it] 2%|▏ | 426/17285 [3:46:52<145:34:35, 31.09s/it] 2%|▏ | 427/17285 [3:47:17<137:58:50, 29.47s/it] 2%|▏ | 428/17285 [3:47:43<132:29:13, 28.29s/it] 2%|▏ | 429/17285 [3:48:08<127:34:00, 27.24s/it] 2%|▏ | 430/17285 [3:48:34<126:03:06, 26.92s/it] {'loss': 2.6811, 'learning_rate': 9.757225433526012e-05, 'epoch': 0.07} + 2%|▏ | 430/17285 [3:48:34<126:03:06, 26.92s/it] 2%|▏ | 431/17285 [3:49:08<135:53:11, 29.03s/it] 2%|▏ | 432/17285 [3:49:42<143:22:55, 30.63s/it] 3%|▎ | 433/17285 [3:50:11<141:38:10, 30.26s/it] 3%|▎ | 434/17285 [3:50:42<141:55:01, 30.32s/it] 3%|▎ | 435/17285 [3:51:17<149:13:00, 31.88s/it] 3%|▎ | 436/17285 [3:51:46<144:13:04, 30.81s/it] 3%|▎ | 437/17285 [3:52:19<147:35:34, 31.54s/it] 3%|▎ | 438/17285 [3:52:57<157:13:08, 33.60s/it] 3%|▎ | 439/17285 [3:53:28<153:14:14, 32.75s/it] 3%|▎ | 440/17285 [3:54:05<159:33:34, 34.10s/it] {'loss': 2.6537, 'learning_rate': 9.988439306358382e-05, 'epoch': 0.08} + 3%|▎ | 440/17285 [3:54:05<159:33:34, 34.10s/it] 3%|▎ | 441/17285 [3:54:42<162:40:54, 34.77s/it] 3%|▎ | 442/17285 [3:55:14<159:17:19, 34.05s/it] 3%|▎ | 443/17285 [3:55:51<162:51:00, 34.81s/it] 3%|▎ | 444/17285 [3:56:27<164:52:11, 35.24s/it] 3%|▎ | 445/17285 [3:56:55<154:24:55, 33.01s/it] 3%|▎ | 446/17285 [3:57:25<150:07:19, 32.09s/it] 3%|▎ | 447/17285 [3:58:05<161:20:59, 34.50s/it] 3%|▎ | 448/17285 [3:58:37<158:38:19, 33.92s/it] 3%|▎ | 449/17285 [3:59:07<152:40:44, 32.65s/it] 3%|▎ | 450/17285 [3:59:40<153:37:08, 32.85s/it] {'loss': 2.6031, 'learning_rate': 0.00010219653179190752, 'epoch': 0.08} + 3%|▎ | 450/17285 [3:59:40<153:37:08, 32.85s/it] 3%|▎ | 451/17285 [4:00:08<145:36:01, 31.14s/it] 3%|▎ | 452/17285 [4:00:38<144:38:22, 30.93s/it] 3%|▎ | 453/17285 [4:01:06<140:21:17, 30.02s/it] 3%|▎ | 454/17285 [4:01:34<137:24:59, 29.39s/it] 3%|▎ | 455/17285 [4:02:07<142:06:40, 30.40s/it] 3%|▎ | 456/17285 [4:02:31<134:24:28, 28.75s/it] 3%|▎ | 457/17285 [4:03:02<136:40:34, 29.24s/it] 3%|▎ | 458/17285 [4:03:31<136:38:10, 29.23s/it] 3%|▎ | 459/17285 [4:04:11<151:52:45, 32.50s/it] 3%|▎ | 460/17285 [4:04:38<144:37:42, 30.95s/it] {'loss': 2.6037, 'learning_rate': 0.00010450867052023121, 'epoch': 0.08} + 3%|▎ | 460/17285 [4:04:38<144:37:42, 30.95s/it] 3%|▎ | 461/17285 [4:05:04<137:29:24, 29.42s/it] 3%|▎ | 462/17285 [4:05:34<138:00:43, 29.53s/it] 3%|▎ | 463/17285 [4:06:04<138:21:33, 29.61s/it] 3%|▎ | 464/17285 [4:06:33<137:23:47, 29.41s/it] 3%|▎ | 465/17285 [4:07:13<151:59:39, 32.53s/it] 3%|▎ | 466/17285 [4:07:53<163:11:34, 34.93s/it] 3%|▎ | 467/17285 [4:08:25<158:15:00, 33.87s/it] 3%|▎ | 468/17285 [4:08:59<158:32:14, 33.94s/it] 3%|▎ | 469/17285 [4:09:33<158:34:04, 33.95s/it] 3%|▎ | 470/17285 [4:10:10<163:11:59, 34.94s/it] {'loss': 2.5387, 'learning_rate': 0.00010682080924855491, 'epoch': 0.08} + 3%|▎ | 470/17285 [4:10:10<163:11:59, 34.94s/it] 3%|▎ | 471/17285 [4:10:36<150:39:58, 32.26s/it] 3%|▎ | 472/17285 [4:11:11<154:54:44, 33.17s/it] 3%|▎ | 473/17285 [4:11:38<145:52:57, 31.24s/it] 3%|▎ | 474/17285 [4:12:14<152:51:14, 32.73s/it] 3%|▎ | 475/17285 [4:12:46<152:00:19, 32.55s/it] 3%|▎ | 476/17285 [4:13:12<142:24:48, 30.50s/it] 3%|▎ | 477/17285 [4:13:47<148:21:59, 31.78s/it] 3%|▎ | 478/17285 [4:14:18<147:54:34, 31.68s/it] 3%|▎ | 479/17285 [4:14:46<142:53:22, 30.61s/it] 3%|▎ | 480/17285 [4:15:14<138:46:22, 29.73s/it] {'loss': 2.5393, 'learning_rate': 0.00010913294797687861, 'epoch': 0.08} + 3%|▎ | 480/17285 [4:15:14<138:46:22, 29.73s/it] 3%|▎ | 481/17285 [4:15:48<144:36:02, 30.98s/it] 3%|▎ | 482/17285 [4:16:24<152:10:20, 32.60s/it] 3%|▎ | 483/17285 [4:16:57<152:21:02, 32.64s/it] 3%|▎ | 484/17285 [4:17:26<147:04:51, 31.52s/it] 3%|▎ | 485/17285 [4:18:00<150:48:03, 32.31s/it] 3%|▎ | 486/17285 [4:18:27<143:11:38, 30.69s/it] 3%|▎ | 487/17285 [4:18:53<137:09:39, 29.40s/it] 3%|▎ | 488/17285 [4:19:19<132:14:55, 28.34s/it] 3%|▎ | 489/17285 [4:19:53<140:13:18, 30.05s/it] 3%|▎ | 490/17285 [4:20:21<136:44:54, 29.31s/it] {'loss': 2.5387, 'learning_rate': 0.00011144508670520233, 'epoch': 0.09} + 3%|▎ | 490/17285 [4:20:21<136:44:54, 29.31s/it] 3%|▎ | 491/17285 [4:20:51<138:10:06, 29.62s/it] 3%|▎ | 492/17285 [4:21:24<143:13:16, 30.70s/it] 3%|▎ | 493/17285 [4:21:50<135:41:03, 29.09s/it] 3%|▎ | 494/17285 [4:22:19<135:33:45, 29.06s/it] 3%|▎ | 495/17285 [4:22:47<133:54:59, 28.71s/it] 3%|▎ | 496/17285 [4:23:24<146:13:23, 31.35s/it] 3%|▎ | 497/17285 [4:23:52<141:12:59, 30.28s/it] 3%|▎ | 498/17285 [4:24:26<146:52:25, 31.50s/it] 3%|▎ | 499/17285 [4:25:00<150:35:08, 32.30s/it] 3%|▎ | 500/17285 [4:25:34<152:09:03, 32.63s/it] {'loss': 2.4848, 'learning_rate': 0.00011375722543352603, 'epoch': 0.09} + 3%|▎ | 500/17285 [4:25:34<152:09:03, 32.63s/it] 3%|▎ | 501/17285 [4:26:04<148:43:35, 31.90s/it] 3%|▎ | 502/17285 [4:26:41<155:51:23, 33.43s/it] 3%|▎ | 503/17285 [4:27:11<150:55:47, 32.38s/it] 3%|▎ | 504/17285 [4:27:43<149:42:28, 32.12s/it] 3%|▎ | 505/17285 [4:28:13<147:16:30, 31.60s/it] 3%|▎ | 506/17285 [4:28:38<137:50:07, 29.57s/it] 3%|▎ | 507/17285 [4:29:14<146:53:16, 31.52s/it] 3%|▎ | 508/17285 [4:29:50<153:16:52, 32.89s/it] 3%|▎ | 509/17285 [4:30:23<153:00:08, 32.83s/it] 3%|▎ | 510/17285 [4:31:00<159:25:42, 34.21s/it] {'loss': 2.4773, 'learning_rate': 0.00011606936416184973, 'epoch': 0.09} + 3%|▎ | 510/17285 [4:31:00<159:25:42, 34.21s/it] 3%|▎ | 511/17285 [4:31:29<152:13:17, 32.67s/it] 3%|▎ | 512/17285 [4:32:01<151:44:04, 32.57s/it] 3%|▎ | 513/17285 [4:32:29<144:41:41, 31.06s/it] 3%|▎ | 514/17285 [4:33:11<159:26:43, 34.23s/it] 3%|▎ | 515/17285 [4:33:36<147:34:05, 31.68s/it] 3%|▎ | 516/17285 [4:34:07<145:29:24, 31.23s/it] 3%|▎ | 517/17285 [4:34:38<145:11:48, 31.17s/it] 3%|▎ | 518/17285 [4:35:11<148:01:55, 31.78s/it] 3%|▎ | 519/17285 [4:35:44<149:28:42, 32.10s/it] 3%|▎ | 520/17285 [4:36:21<157:03:21, 33.73s/it] {'loss': 2.4453, 'learning_rate': 0.00011838150289017342, 'epoch': 0.09} + 3%|▎ | 520/17285 [4:36:21<157:03:21, 33.73s/it] 3%|▎ | 521/17285 [4:36:52<152:50:20, 32.82s/it] 3%|▎ | 522/17285 [4:37:21<147:46:23, 31.74s/it] 3%|▎ | 523/17285 [4:38:01<159:32:13, 34.26s/it] 3%|▎ | 524/17285 [4:38:34<157:34:30, 33.84s/it] 3%|▎ | 525/17285 [4:39:04<152:13:58, 32.70s/it] 3%|▎ | 526/17285 [4:39:38<153:46:45, 33.03s/it] 3%|▎ | 527/17285 [4:40:09<150:32:39, 32.34s/it] 3%|▎ | 528/17285 [4:40:43<153:09:15, 32.90s/it] 3%|▎ | 529/17285 [4:41:17<155:27:48, 33.40s/it] 3%|▎ | 530/17285 [4:41:44<145:54:39, 31.35s/it] {'loss': 2.3941, 'learning_rate': 0.00012069364161849712, 'epoch': 0.09} + 3%|▎ | 530/17285 [4:41:44<145:54:39, 31.35s/it] 3%|▎ | 531/17285 [4:42:18<149:38:33, 32.15s/it] 3%|▎ | 532/17285 [4:42:46<144:17:10, 31.01s/it] 3%|▎ | 533/17285 [4:43:24<153:54:47, 33.08s/it] 3%|▎ | 534/17285 [4:43:55<151:02:45, 32.46s/it] 3%|▎ | 535/17285 [4:44:37<163:59:49, 35.25s/it] 3%|▎ | 536/17285 [4:45:06<154:44:03, 33.26s/it] 3%|▎ | 537/17285 [4:45:35<149:02:03, 32.04s/it] 3%|▎ | 538/17285 [4:46:06<147:32:46, 31.72s/it] 3%|▎ | 539/17285 [4:46:50<164:30:24, 35.37s/it] 3%|▎ | 540/17285 [4:47:19<155:29:57, 33.43s/it] {'loss': 2.431, 'learning_rate': 0.00012300578034682083, 'epoch': 0.09} + 3%|▎ | 540/17285 [4:47:19<155:29:57, 33.43s/it] 3%|▎ | 541/17285 [4:47:55<159:26:44, 34.28s/it] 3%|▎ | 542/17285 [4:48:26<155:11:59, 33.37s/it] 3%|▎ | 543/17285 [4:49:03<160:25:20, 34.50s/it] 3%|▎ | 544/17285 [4:49:38<160:26:12, 34.50s/it] 3%|▎ | 545/17285 [4:50:11<158:27:14, 34.08s/it] 3%|▎ | 546/17285 [4:50:45<158:15:20, 34.04s/it] 3%|▎ | 547/17285 [4:51:15<152:30:20, 32.80s/it] 3%|▎ | 548/17285 [4:51:45<149:11:12, 32.09s/it] 3%|▎ | 549/17285 [4:52:19<151:18:35, 32.55s/it] 3%|▎ | 550/17285 [4:52:45<142:52:48, 30.74s/it] {'loss': 2.4208, 'learning_rate': 0.00012531791907514453, 'epoch': 0.1} + 3%|▎ | 550/17285 [4:52:45<142:52:48, 30.74s/it] 3%|▎ | 551/17285 [4:53:14<140:48:32, 30.29s/it] 3%|▎ | 552/17285 [4:53:42<136:53:59, 29.45s/it] 3%|▎ | 553/17285 [4:54:13<139:05:00, 29.92s/it] 3%|▎ | 554/17285 [4:54:45<142:38:47, 30.69s/it] 3%|▎ | 555/17285 [4:55:22<150:27:36, 32.38s/it] 3%|▎ | 556/17285 [4:55:54<149:48:50, 32.24s/it] 3%|▎ | 557/17285 [4:56:32<157:43:20, 33.94s/it] 3%|▎ | 558/17285 [4:57:02<152:23:42, 32.80s/it] 3%|▎ | 559/17285 [4:57:29<145:15:24, 31.26s/it] 3%|▎ | 560/17285 [4:57:55<137:28:27, 29.59s/it] {'loss': 2.4313, 'learning_rate': 0.00012763005780346823, 'epoch': 0.1} + 3%|▎ | 560/17285 [4:57:55<137:28:27, 29.59s/it] 3%|▎ | 561/17285 [4:58:25<137:45:37, 29.65s/it] 3%|▎ | 562/17285 [4:59:08<156:04:40, 33.60s/it][2023-08-23 04:54:26,350] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, but hysteresis is 2. Reducing hysteresis to 1 + 3%|▎ | 563/17285 [4:59:49<166:18:20, 35.80s/it] 3%|▎ | 564/17285 [5:00:15<153:04:10, 32.96s/it] 3%|▎ | 565/17285 [5:00:56<164:04:22, 35.33s/it] 3%|▎ | 566/17285 [5:01:28<159:07:44, 34.26s/it] 3%|▎ | 567/17285 [5:02:00<156:39:03, 33.73s/it] 3%|▎ | 568/17285 [5:02:35<158:31:05, 34.14s/it] 3%|▎ | 569/17285 [5:03:05<152:38:48, 32.87s/it] 3%|▎ | 570/17285 [5:03:44<161:19:52, 34.75s/it] {'loss': 2.3427, 'learning_rate': 0.00012971098265895952, 'epoch': 0.1} + 3%|▎ | 570/17285 [5:03:44<161:19:52, 34.75s/it] 3%|▎ | 571/17285 [5:04:23<166:23:24, 35.84s/it] 3%|▎ | 572/17285 [5:04:54<159:29:44, 34.36s/it] 3%|▎ | 573/17285 [5:05:26<156:18:54, 33.67s/it] 3%|▎ | 574/17285 [5:06:07<166:59:00, 35.97s/it] 3%|▎ | 575/17285 [5:06:48<173:56:08, 37.47s/it] 3%|▎ | 576/17285 [5:07:22<168:44:19, 36.36s/it] 3%|▎ | 577/17285 [5:07:50<157:09:35, 33.86s/it] 3%|▎ | 578/17285 [5:08:24<157:33:26, 33.95s/it] 3%|▎ | 579/17285 [5:08:50<146:42:15, 31.61s/it] 3%|▎ | 580/17285 [5:09:21<145:33:08, 31.37s/it] {'loss': 2.3415, 'learning_rate': 0.00013202312138728322, 'epoch': 0.1} + 3%|▎ | 580/17285 [5:09:21<145:33:08, 31.37s/it] 3%|▎ | 581/17285 [5:09:51<143:20:30, 30.89s/it] 3%|▎ | 582/17285 [5:10:18<138:40:10, 29.89s/it] 3%|▎ | 583/17285 [5:10:49<140:10:56, 30.22s/it] 3%|▎ | 584/17285 [5:11:14<132:03:10, 28.46s/it] 3%|▎ | 585/17285 [5:11:45<135:58:43, 29.31s/it] 3%|▎ | 586/17285 [5:12:20<144:42:20, 31.20s/it] 3%|▎ | 587/17285 [5:13:02<159:42:40, 34.43s/it] 3%|▎ | 588/17285 [5:13:30<150:55:29, 32.54s/it] 3%|▎ | 589/17285 [5:14:09<158:45:00, 34.23s/it] 3%|▎ | 590/17285 [5:14:36<149:27:38, 32.23s/it] {'loss': 2.2621, 'learning_rate': 0.00013433526011560694, 'epoch': 0.1} + 3%|▎ | 590/17285 [5:14:36<149:27:38, 32.23s/it] 3%|▎ | 591/17285 [5:15:09<149:39:27, 32.27s/it] 3%|▎ | 592/17285 [5:15:39<147:26:11, 31.80s/it] 3%|▎ | 593/17285 [5:16:09<143:50:37, 31.02s/it] 3%|▎ | 594/17285 [5:16:53<162:31:37, 35.05s/it] 3%|▎ | 595/17285 [5:17:28<162:57:13, 35.15s/it] 3%|▎ | 596/17285 [5:18:04<163:56:19, 35.36s/it] 3%|▎ | 597/17285 [5:18:31<152:40:08, 32.93s/it] 3%|▎ | 598/17285 [5:19:07<156:43:40, 33.81s/it] 3%|▎ | 599/17285 [5:19:43<159:03:25, 34.32s/it] 3%|▎ | 600/17285 [5:20:19<161:29:06, 34.84s/it] {'loss': 2.3606, 'learning_rate': 0.00013664739884393064, 'epoch': 0.1} + 3%|▎ | 600/17285 [5:20:19<161:29:06, 34.84s/it] 3%|▎ | 601/17285 [5:20:50<155:51:25, 33.63s/it] 3%|▎ | 602/17285 [5:21:21<152:20:28, 32.87s/it] 3%|▎ | 603/17285 [5:21:47<143:35:11, 30.99s/it] 3%|▎ | 604/17285 [5:22:20<146:30:19, 31.62s/it] 4%|▎ | 605/17285 [5:22:55<150:54:56, 32.57s/it] 4%|▎ | 606/17285 [5:23:27<150:06:55, 32.40s/it] 4%|▎ | 607/17285 [5:23:59<148:45:05, 32.11s/it] 4%|▎ | 608/17285 [5:24:25<141:08:33, 30.47s/it] 4%|▎ | 609/17285 [5:24:57<142:37:12, 30.79s/it] 4%|▎ | 610/17285 [5:25:26<139:50:29, 30.19s/it] {'loss': 2.3175, 'learning_rate': 0.00013895953757225434, 'epoch': 0.11} + 4%|▎ | 610/17285 [5:25:26<139:50:29, 30.19s/it] 4%|▎ | 611/17285 [5:25:57<142:01:54, 30.67s/it] 4%|▎ | 612/17285 [5:26:24<135:53:44, 29.34s/it] 4%|▎ | 613/17285 [5:26:54<137:23:44, 29.67s/it] 4%|▎ | 614/17285 [5:27:19<130:27:02, 28.17s/it] 4%|▎ | 615/17285 [5:27:46<129:05:35, 27.88s/it] 4%|▎ | 616/17285 [5:28:21<139:00:13, 30.02s/it] 4%|▎ | 617/17285 [5:28:50<137:32:29, 29.71s/it] 4%|▎ | 618/17285 [5:29:30<152:05:13, 32.85s/it] 4%|▎ | 619/17285 [5:30:05<155:02:34, 33.49s/it] 4%|▎ | 620/17285 [5:30:35<150:12:09, 32.45s/it] {'loss': 2.2297, 'learning_rate': 0.00014127167630057804, 'epoch': 0.11} + 4%|▎ | 620/17285 [5:30:35<150:12:09, 32.45s/it] 4%|▎ | 621/17285 [5:31:07<149:22:45, 32.27s/it] 4%|▎ | 622/17285 [5:31:38<147:42:37, 31.91s/it] 4%|▎ | 623/17285 [5:32:08<145:14:57, 31.38s/it] 4%|▎ | 624/17285 [5:32:49<158:05:42, 34.16s/it] 4%|▎ | 625/17285 [5:33:16<148:08:39, 32.01s/it] 4%|▎ | 626/17285 [5:33:58<161:55:11, 34.99s/it] 4%|▎ | 627/17285 [5:34:32<160:22:23, 34.66s/it] 4%|▎ | 628/17285 [5:35:14<171:06:35, 36.98s/it] 4%|▎ | 629/17285 [5:35:57<179:47:30, 38.86s/it] 4%|▎ | 630/17285 [5:36:26<165:42:19, 35.82s/it] {'loss': 2.1856, 'learning_rate': 0.00014358381502890176, 'epoch': 0.11} + 4%|▎ | 630/17285 [5:36:26<165:42:19, 35.82s/it] 4%|▎ | 631/17285 [5:37:01<164:50:22, 35.63s/it] 4%|▎ | 632/17285 [5:37:39<168:15:34, 36.37s/it] 4%|▎ | 633/17285 [5:38:15<167:10:12, 36.14s/it] 4%|▎ | 634/17285 [5:38:53<170:22:22, 36.84s/it] 4%|▎ | 635/17285 [5:39:28<167:03:50, 36.12s/it] 4%|▎ | 636/17285 [5:39:58<158:56:40, 34.37s/it] 4%|▎ | 637/17285 [5:40:26<150:31:19, 32.55s/it] 4%|▎ | 638/17285 [5:41:05<158:18:02, 34.23s/it] 4%|▎ | 639/17285 [5:41:35<153:12:25, 33.13s/it] 4%|▎ | 640/17285 [5:42:12<158:36:19, 34.30s/it] {'loss': 2.2633, 'learning_rate': 0.00014589595375722546, 'epoch': 0.11} + 4%|▎ | 640/17285 [5:42:12<158:36:19, 34.30s/it] 4%|▎ | 641/17285 [5:42:45<156:49:09, 33.92s/it] 4%|▎ | 642/17285 [5:43:14<149:46:32, 32.40s/it] 4%|▎ | 643/17285 [5:43:44<146:35:20, 31.71s/it] 4%|▎ | 644/17285 [5:44:08<135:39:33, 29.35s/it] 4%|▎ | 645/17285 [5:44:45<146:45:21, 31.75s/it] 4%|▎ | 646/17285 [5:45:20<150:04:41, 32.47s/it] 4%|▎ | 647/17285 [5:45:57<157:19:16, 34.04s/it] 4%|▎ | 648/17285 [5:46:23<145:40:44, 31.52s/it] 4%|▍ | 649/17285 [5:46:50<138:57:22, 30.07s/it] 4%|▍ | 650/17285 [5:47:13<129:54:56, 28.12s/it] {'loss': 2.2474, 'learning_rate': 0.00014820809248554915, 'epoch': 0.11} + 4%|▍ | 650/17285 [5:47:13<129:54:56, 28.12s/it] 4%|▍ | 651/17285 [5:47:43<132:20:26, 28.64s/it] 4%|▍ | 652/17285 [5:48:20<143:25:56, 31.04s/it] 4%|▍ | 653/17285 [5:48:54<147:34:31, 31.94s/it] 4%|▍ | 654/17285 [5:49:19<138:36:23, 30.00s/it] 4%|▍ | 655/17285 [5:49:45<132:35:26, 28.70s/it] 4%|▍ | 656/17285 [5:50:14<132:47:33, 28.75s/it] 4%|▍ | 657/17285 [5:50:45<136:02:18, 29.45s/it] 4%|▍ | 658/17285 [5:51:29<155:51:53, 33.75s/it] 4%|▍ | 659/17285 [5:52:00<152:42:15, 33.06s/it] 4%|▍ | 660/17285 [5:52:32<150:46:47, 32.65s/it] {'loss': 2.2024, 'learning_rate': 0.00015052023121387285, 'epoch': 0.11} + 4%|▍ | 660/17285 [5:52:32<150:46:47, 32.65s/it] 4%|▍ | 661/17285 [5:52:59<143:45:45, 31.13s/it] 4%|▍ | 662/17285 [5:53:34<149:04:46, 32.29s/it] 4%|▍ | 663/17285 [5:54:14<159:05:42, 34.46s/it] 4%|▍ | 664/17285 [5:54:45<154:49:44, 33.53s/it] 4%|▍ | 665/17285 [5:55:17<151:49:00, 32.88s/it] 4%|▍ | 666/17285 [5:55:49<151:08:12, 32.74s/it] 4%|▍ | 667/17285 [5:56:19<147:58:30, 32.06s/it] 4%|▍ | 668/17285 [5:56:50<146:28:56, 31.73s/it] 4%|▍ | 669/17285 [5:57:23<147:21:43, 31.93s/it][2023-08-23 05:52:26,965] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1 + 4%|▍ | 670/17285 [5:57:49<139:49:18, 30.30s/it] {'loss': 2.1947, 'learning_rate': 0.00015260115606936415, 'epoch': 0.12} + 4%|▍ | 670/17285 [5:57:49<139:49:18, 30.30s/it] 4%|▍ | 671/17285 [5:58:18<137:54:14, 29.88s/it] 4%|▍ | 672/17285 [5:58:47<136:22:44, 29.55s/it] 4%|▍ | 673/17285 [5:59:15<134:13:51, 29.09s/it] 4%|▍ | 674/17285 [5:59:46<137:05:49, 29.71s/it] 4%|▍ | 675/17285 [6:00:13<132:28:23, 28.71s/it] 4%|▍ | 676/17285 [6:00:52<146:52:50, 31.84s/it] 4%|▍ | 677/17285 [6:01:35<162:26:39, 35.21s/it] 4%|▍ | 678/17285 [6:02:07<158:46:03, 34.42s/it] 4%|▍ | 679/17285 [6:02:45<162:58:44, 35.33s/it] 4%|▍ | 680/17285 [6:03:13<153:25:21, 33.26s/it] {'loss': 2.234, 'learning_rate': 0.00015491329479768785, 'epoch': 0.12} + 4%|▍ | 680/17285 [6:03:13<153:25:21, 33.26s/it] 4%|▍ | 681/17285 [6:03:43<148:49:56, 32.27s/it] 4%|▍ | 682/17285 [6:04:13<145:14:18, 31.49s/it] 4%|▍ | 683/17285 [6:04:43<142:43:02, 30.95s/it] 4%|▍ | 684/17285 [6:05:22<154:01:33, 33.40s/it] 4%|▍ | 685/17285 [6:05:47<143:18:00, 31.08s/it] 4%|▍ | 686/17285 [6:06:17<141:14:34, 30.63s/it] 4%|▍ | 687/17285 [6:06:44<136:46:54, 29.67s/it] 4%|▍ | 688/17285 [6:07:21<145:53:07, 31.64s/it] 4%|▍ | 689/17285 [6:07:50<142:43:12, 30.96s/it] 4%|▍ | 690/17285 [6:08:27<150:56:31, 32.74s/it] {'loss': 2.2061, 'learning_rate': 0.00015722543352601157, 'epoch': 0.12} + 4%|▍ | 690/17285 [6:08:27<150:56:31, 32.74s/it] 4%|▍ | 691/17285 [6:08:59<150:08:36, 32.57s/it] 4%|▍ | 692/17285 [6:09:40<161:31:23, 35.04s/it] 4%|▍ | 693/17285 [6:10:15<161:11:37, 34.97s/it] 4%|▍ | 694/17285 [6:10:48<158:59:32, 34.50s/it] 4%|▍ | 695/17285 [6:11:28<166:17:13, 36.08s/it] 4%|▍ | 696/17285 [6:12:01<162:41:52, 35.31s/it] 4%|▍ | 697/17285 [6:12:31<154:22:47, 33.50s/it] 4%|▍ | 698/17285 [6:13:02<150:59:24, 32.77s/it] 4%|▍ | 699/17285 [6:13:29<143:56:05, 31.24s/it][2023-08-23 06:08:48,100] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 + 4%|▍ | 700/17285 [6:14:10<157:32:16, 34.20s/it] {'loss': 2.1553, 'learning_rate': 0.0001593063583815029, 'epoch': 0.12} + 4%|▍ | 700/17285 [6:14:10<157:32:16, 34.20s/it] 4%|▍ | 701/17285 [6:14:47<161:02:00, 34.96s/it] 4%|▍ | 702/17285 [6:15:23<161:38:41, 35.09s/it] 4%|▍ | 703/17285 [6:15:52<154:11:13, 33.47s/it] 4%|▍ | 704/17285 [6:16:25<153:23:52, 33.31s/it] 4%|▍ | 705/17285 [6:16:58<153:23:33, 33.31s/it] 4%|▍ | 706/17285 [6:17:31<151:53:25, 32.98s/it] 4%|▍ | 707/17285 [6:18:02<149:35:29, 32.48s/it] 4%|▍ | 708/17285 [6:18:32<146:23:43, 31.79s/it] 4%|▍ | 709/17285 [6:19:00<140:23:03, 30.49s/it] 4%|▍ | 710/17285 [6:19:31<141:28:56, 30.73s/it] {'loss': 2.2286, 'learning_rate': 0.0001616184971098266, 'epoch': 0.12} + 4%|▍ | 710/17285 [6:19:31<141:28:56, 30.73s/it] 4%|▍ | 711/17285 [6:19:57<134:54:56, 29.30s/it] 4%|▍ | 712/17285 [6:20:30<140:10:10, 30.45s/it] 4%|▍ | 713/17285 [6:21:02<141:47:11, 30.80s/it] 4%|▍ | 714/17285 [6:21:49<164:37:40, 35.76s/it] 4%|▍ | 715/17285 [6:22:24<164:06:47, 35.66s/it] 4%|▍ | 716/17285 [6:22:55<157:47:53, 34.29s/it] 4%|▍ | 717/17285 [6:23:22<147:27:54, 32.04s/it] 4%|▍ | 718/17285 [6:23:58<153:00:03, 33.25s/it] 4%|▍ | 719/17285 [6:24:27<146:01:51, 31.73s/it] 4%|▍ | 720/17285 [6:24:58<145:38:16, 31.65s/it] {'loss': 2.161, 'learning_rate': 0.0001639306358381503, 'epoch': 0.12} + 4%|▍ | 720/17285 [6:24:58<145:38:16, 31.65s/it] 4%|▍ | 721/17285 [6:25:23<137:05:16, 29.79s/it] 4%|▍ | 722/17285 [6:25:55<139:15:01, 30.27s/it] 4%|▍ | 723/17285 [6:26:26<140:25:10, 30.52s/it] 4%|▍ | 724/17285 [6:26:55<138:27:27, 30.10s/it] 4%|▍ | 725/17285 [6:27:27<141:22:31, 30.73s/it] 4%|▍ | 726/17285 [6:27:54<136:14:00, 29.62s/it] 4%|▍ | 727/17285 [6:28:33<148:50:40, 32.36s/it] 4%|▍ | 728/17285 [6:29:02<144:37:51, 31.45s/it] 4%|▍ | 729/17285 [6:29:33<142:56:14, 31.08s/it] 4%|▍ | 730/17285 [6:30:04<143:10:04, 31.13s/it] {'loss': 2.1628, 'learning_rate': 0.000166242774566474, 'epoch': 0.13} + 4%|▍ | 730/17285 [6:30:04<143:10:04, 31.13s/it] 4%|▍ | 731/17285 [6:30:40<150:11:00, 32.66s/it] 4%|▍ | 732/17285 [6:31:07<142:23:36, 30.97s/it] 4%|▍ | 733/17285 [6:31:37<141:13:44, 30.72s/it] 4%|▍ | 734/17285 [6:32:12<146:09:49, 31.79s/it] 4%|▍ | 735/17285 [6:32:44<146:59:53, 31.98s/it] 4%|▍ | 736/17285 [6:33:15<145:31:39, 31.66s/it] 4%|▍ | 737/17285 [6:33:45<142:58:49, 31.11s/it] 4%|▍ | 738/17285 [6:34:22<151:41:54, 33.00s/it] 4%|▍ | 739/17285 [6:34:52<146:52:37, 31.96s/it] 4%|▍ | 740/17285 [6:35:24<147:29:30, 32.09s/it] {'loss': 2.1371, 'learning_rate': 0.00016855491329479768, 'epoch': 0.13} + 4%|▍ | 740/17285 [6:35:24<147:29:30, 32.09s/it] 4%|▍ | 741/17285 [6:35:55<145:35:59, 31.68s/it] 4%|▍ | 742/17285 [6:36:26<144:29:20, 31.44s/it] 4%|▍ | 743/17285 [6:37:00<148:18:56, 32.28s/it] 4%|▍ | 744/17285 [6:37:27<141:38:16, 30.83s/it] 4%|▍ | 745/17285 [6:37:55<137:24:12, 29.91s/it] 4%|▍ | 746/17285 [6:38:22<132:57:24, 28.94s/it] 4%|▍ | 747/17285 [6:38:52<135:12:23, 29.43s/it] 4%|▍ | 748/17285 [6:39:18<130:21:15, 28.38s/it][2023-08-23 06:34:26,789] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 + 4%|▍ | 749/17285 [6:39:49<133:45:30, 29.12s/it] 4%|▍ | 750/17285 [6:40:18<132:56:35, 28.94s/it] {'loss': 2.2181, 'learning_rate': 0.00017063583815028904, 'epoch': 0.13} + 4%|▍ | 750/17285 [6:40:18<132:56:35, 28.94s/it] 4%|▍ | 751/17285 [6:40:53<142:09:20, 30.95s/it] 4%|▍ | 752/17285 [6:41:22<139:36:13, 30.40s/it] 4%|▍ | 753/17285 [6:41:57<145:12:07, 31.62s/it] 4%|▍ | 754/17285 [6:42:30<146:57:56, 32.01s/it] 4%|▍ | 755/17285 [6:43:05<151:50:16, 33.07s/it] 4%|▍ | 756/17285 [6:43:44<159:36:12, 34.76s/it] 4%|▍ | 757/17285 [6:44:11<148:58:29, 32.45s/it] 4%|▍ | 758/17285 [6:44:44<149:13:10, 32.50s/it] 4%|▍ | 759/17285 [6:45:12<142:58:25, 31.15s/it][2023-08-23 06:40:24,371] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 + 4%|▍ | 760/17285 [6:45:47<148:17:01, 32.30s/it] {'loss': 2.154, 'learning_rate': 0.00017271676300578033, 'epoch': 0.13} + 4%|▍ | 760/17285 [6:45:47<148:17:01, 32.30s/it] 4%|▍ | 761/17285 [6:46:13<140:25:10, 30.59s/it] 4%|▍ | 762/17285 [6:46:38<131:48:24, 28.72s/it] 4%|▍ | 763/17285 [6:47:13<141:07:02, 30.75s/it] 4%|▍ | 764/17285 [6:47:43<139:56:58, 30.50s/it] 4%|▍ | 765/17285 [6:48:10<135:04:56, 29.44s/it] 4%|▍ | 766/17285 [6:48:43<140:34:47, 30.64s/it] 4%|▍ | 767/17285 [6:49:13<139:09:41, 30.33s/it] 4%|▍ | 768/17285 [6:49:53<152:02:52, 33.14s/it] 4%|▍ | 769/17285 [6:50:24<149:35:52, 32.61s/it] 4%|▍ | 770/17285 [6:50:55<147:47:15, 32.22s/it] {'loss': 2.1695, 'learning_rate': 0.00017502890173410406, 'epoch': 0.13} + 4%|▍ | 770/17285 [6:50:55<147:47:15, 32.22s/it] 4%|▍ | 771/17285 [6:51:24<142:30:12, 31.07s/it] 4%|▍ | 772/17285 [6:51:55<143:11:34, 31.22s/it] 4%|▍ | 773/17285 [6:52:35<154:54:35, 33.77s/it] 4%|▍ | 774/17285 [6:52:59<141:12:50, 30.79s/it] 4%|▍ | 775/17285 [6:53:31<143:23:38, 31.27s/it] 4%|▍ | 776/17285 [6:54:07<149:18:13, 32.56s/it] 4%|▍ | 777/17285 [6:54:36<144:38:40, 31.54s/it] 5%|▍ | 778/17285 [6:55:09<146:37:57, 31.98s/it] 5%|▍ | 779/17285 [6:55:39<143:51:35, 31.38s/it] 5%|▍ | 780/17285 [6:56:15<150:05:00, 32.74s/it] {'loss': 2.1685, 'learning_rate': 0.00017734104046242776, 'epoch': 0.14} + 5%|▍ | 780/17285 [6:56:15<150:05:00, 32.74s/it] 5%|▍ | 781/17285 [6:56:45<146:33:15, 31.97s/it] 5%|▍ | 782/17285 [6:57:10<136:51:35, 29.85s/it] 5%|▍ | 783/17285 [6:57:50<150:07:30, 32.75s/it] 5%|▍ | 784/17285 [6:58:14<139:03:09, 30.34s/it] 5%|▍ | 785/17285 [6:58:43<136:20:31, 29.75s/it] 5%|▍ | 786/17285 [6:59:16<140:45:19, 30.71s/it] 5%|▍ | 787/17285 [6:59:44<137:35:54, 30.03s/it] 5%|▍ | 788/17285 [7:00:14<137:33:28, 30.02s/it] 5%|▍ | 789/17285 [7:00:44<137:03:25, 29.91s/it] 5%|▍ | 790/17285 [7:01:11<134:07:00, 29.27s/it] {'loss': 2.192, 'learning_rate': 0.00017965317919075145, 'epoch': 0.14} + 5%|▍ | 790/17285 [7:01:11<134:07:00, 29.27s/it] 5%|▍ | 791/17285 [7:01:39<131:49:49, 28.77s/it] 5%|▍ | 792/17285 [7:02:09<132:49:17, 28.99s/it] 5%|▍ | 793/17285 [7:02:41<137:04:24, 29.92s/it] 5%|▍ | 794/17285 [7:03:14<141:18:09, 30.85s/it] 5%|▍ | 795/17285 [7:03:43<138:59:25, 30.34s/it] 5%|▍ | 796/17285 [7:04:17<144:26:01, 31.53s/it] 5%|▍ | 797/17285 [7:04:45<139:57:59, 30.56s/it] 5%|▍ | 798/17285 [7:05:14<136:53:35, 29.89s/it] 5%|▍ | 799/17285 [7:05:43<135:40:25, 29.63s/it] 5%|▍ | 800/17285 [7:06:19<144:19:40, 31.52s/it] {'loss': 2.1567, 'learning_rate': 0.00018196531791907515, 'epoch': 0.14} + 5%|▍ | 800/17285 [7:06:19<144:19:40, 31.52s/it] 5%|▍ | 801/17285 [7:06:50<144:00:39, 31.45s/it] 5%|▍ | 802/17285 [7:07:20<141:30:45, 30.91s/it] 5%|▍ | 803/17285 [7:07:47<136:32:55, 29.82s/it] 5%|▍ | 804/17285 [7:08:13<131:09:12, 28.65s/it] 5%|▍ | 805/17285 [7:08:48<140:17:52, 30.65s/it] 5%|▍ | 806/17285 [7:09:25<148:10:22, 32.37s/it] 5%|▍ | 807/17285 [7:10:03<156:53:38, 34.28s/it] 5%|▍ | 808/17285 [7:10:38<157:12:39, 34.35s/it] 5%|▍ | 809/17285 [7:11:09<152:34:15, 33.34s/it] 5%|▍ | 810/17285 [7:11:40<149:08:27, 32.59s/it] {'loss': 2.0987, 'learning_rate': 0.00018427745664739887, 'epoch': 0.14} + 5%|▍ | 810/17285 [7:11:40<149:08:27, 32.59s/it] 5%|▍ | 811/17285 [7:12:27<169:29:40, 37.04s/it] 5%|▍ | 812/17285 [7:12:58<161:46:58, 35.36s/it] 5%|▍ | 813/17285 [7:13:36<164:19:23, 35.91s/it] 5%|▍ | 814/17285 [7:14:11<163:23:21, 35.71s/it] 5%|▍ | 815/17285 [7:14:36<148:45:01, 32.51s/it] 5%|▍ | 816/17285 [7:15:02<139:47:22, 30.56s/it] 5%|▍ | 817/17285 [7:15:26<131:14:15, 28.69s/it] 5%|▍ | 818/17285 [7:15:55<130:37:07, 28.56s/it] 5%|▍ | 819/17285 [7:16:19<125:41:13, 27.48s/it] 5%|▍ | 820/17285 [7:16:51<131:47:54, 28.82s/it] {'loss': 2.1687, 'learning_rate': 0.00018658959537572257, 'epoch': 0.14} + 5%|▍ | 820/17285 [7:16:51<131:47:54, 28.82s/it] 5%|▍ | 821/17285 [7:17:20<132:01:21, 28.87s/it] 5%|▍ | 822/17285 [7:17:57<142:08:24, 31.08s/it] 5%|▍ | 823/17285 [7:18:25<138:51:27, 30.37s/it] 5%|▍ | 824/17285 [7:18:54<136:50:54, 29.93s/it] 5%|▍ | 825/17285 [7:19:25<137:52:18, 30.15s/it] 5%|▍ | 826/17285 [7:19:58<141:36:01, 30.97s/it] 5%|▍ | 827/17285 [7:20:28<139:50:46, 30.59s/it] 5%|▍ | 828/17285 [7:21:02<145:02:53, 31.73s/it] 5%|▍ | 829/17285 [7:21:30<140:22:42, 30.71s/it] 5%|▍ | 830/17285 [7:21:55<132:49:23, 29.06s/it] {'loss': 2.0736, 'learning_rate': 0.00018890173410404627, 'epoch': 0.14} + 5%|▍ | 830/17285 [7:21:55<132:49:23, 29.06s/it] 5%|▍ | 831/17285 [7:22:33<145:05:45, 31.75s/it] 5%|▍ | 832/17285 [7:23:01<139:44:27, 30.58s/it] 5%|▍ | 833/17285 [7:23:37<146:16:10, 32.01s/it] 5%|▍ | 834/17285 [7:24:12<151:18:35, 33.11s/it] 5%|▍ | 835/17285 [7:24:50<156:57:10, 34.35s/it] 5%|▍ | 836/17285 [7:25:19<149:56:50, 32.82s/it] 5%|▍ | 837/17285 [7:25:56<155:35:53, 34.06s/it] 5%|▍ | 838/17285 [7:26:23<146:43:26, 32.12s/it] 5%|▍ | 839/17285 [7:27:03<156:57:02, 34.36s/it] 5%|▍ | 840/17285 [7:27:40<160:38:54, 35.17s/it] {'loss': 2.0683, 'learning_rate': 0.00019121387283236997, 'epoch': 0.15} + 5%|▍ | 840/17285 [7:27:40<160:38:54, 35.17s/it] 5%|▍ | 841/17285 [7:28:07<148:49:45, 32.58s/it] 5%|▍ | 842/17285 [7:28:41<150:49:35, 33.02s/it] 5%|▍ | 843/17285 [7:29:13<149:19:34, 32.70s/it] 5%|▍ | 844/17285 [7:29:44<148:16:32, 32.47s/it] 5%|▍ | 845/17285 [7:30:22<155:17:22, 34.01s/it] 5%|▍ | 846/17285 [7:30:56<154:40:45, 33.87s/it] 5%|▍ | 847/17285 [7:31:28<152:04:27, 33.30s/it] 5%|▍ | 848/17285 [7:31:59<149:26:45, 32.73s/it] 5%|▍ | 849/17285 [7:32:29<145:32:34, 31.88s/it] 5%|▍ | 850/17285 [7:33:04<149:52:55, 32.83s/it] {'loss': 2.0872, 'learning_rate': 0.00019352601156069366, 'epoch': 0.15} + 5%|▍ | 850/17285 [7:33:04<149:52:55, 32.83s/it] 5%|▍ | 851/17285 [7:33:36<149:12:46, 32.69s/it] 5%|▍ | 852/17285 [7:34:02<140:10:28, 30.71s/it] 5%|▍ | 853/17285 [7:34:30<136:17:37, 29.86s/it] 5%|▍ | 854/17285 [7:34:56<130:07:14, 28.51s/it] 5%|▍ | 855/17285 [7:35:29<137:17:26, 30.08s/it] 5%|▍ | 856/17285 [7:36:08<149:28:34, 32.75s/it] 5%|▍ | 857/17285 [7:36:44<152:49:09, 33.49s/it] 5%|▍ | 858/17285 [7:37:20<157:28:36, 34.51s/it] 5%|▍ | 859/17285 [7:37:52<153:06:46, 33.56s/it] 5%|▍ | 860/17285 [7:38:24<151:34:56, 33.22s/it] {'loss': 2.1029, 'learning_rate': 0.00019583815028901736, 'epoch': 0.15} + 5%|▍ | 860/17285 [7:38:24<151:34:56, 33.22s/it] 5%|▍ | 861/17285 [7:38:55<148:19:09, 32.51s/it] 5%|▍ | 862/17285 [7:39:30<151:45:56, 33.27s/it] 5%|▍ | 863/17285 [7:40:05<154:18:55, 33.83s/it] 5%|▍ | 864/17285 [7:40:38<152:22:50, 33.41s/it] 5%|▌ | 865/17285 [7:41:08<148:01:06, 32.45s/it] 5%|▌ | 866/17285 [7:41:38<144:37:05, 31.71s/it] 5%|▌ | 867/17285 [7:42:05<138:50:32, 30.44s/it] 5%|▌ | 868/17285 [7:42:31<132:01:45, 28.95s/it] 5%|▌ | 869/17285 [7:43:05<139:15:35, 30.54s/it] 5%|▌ | 870/17285 [7:43:31<132:49:44, 29.13s/it] {'loss': 2.0301, 'learning_rate': 0.00019815028901734106, 'epoch': 0.15} + 5%|▌ | 870/17285 [7:43:31<132:49:44, 29.13s/it] 5%|▌ | 871/17285 [7:44:12<149:03:48, 32.69s/it] 5%|▌ | 872/17285 [7:44:38<140:02:03, 30.71s/it] 5%|▌ | 873/17285 [7:45:08<139:06:04, 30.51s/it] 5%|▌ | 874/17285 [7:45:42<144:24:57, 31.68s/it] 5%|▌ | 875/17285 [7:46:09<137:14:55, 30.11s/it] 5%|▌ | 876/17285 [7:46:38<135:15:33, 29.67s/it] 5%|▌ | 877/17285 [7:47:12<141:09:31, 30.97s/it] 5%|▌ | 878/17285 [7:47:39<135:54:08, 29.82s/it] 5%|▌ | 879/17285 [7:48:19<149:49:43, 32.88s/it] 5%|▌ | 880/17285 [7:48:53<151:43:40, 33.30s/it] {'loss': 2.0957, 'learning_rate': 0.00019999999267878048, 'epoch': 0.15} + 5%|▌ | 880/17285 [7:48:53<151:43:40, 33.30s/it] 5%|▌ | 881/17285 [7:49:21<143:52:09, 31.57s/it] 5%|▌ | 882/17285 [7:49:55<147:47:11, 32.43s/it] 5%|▌ | 883/17285 [7:50:30<151:16:53, 33.20s/it] 5%|▌ | 884/17285 [7:51:05<154:18:01, 33.87s/it] 5%|▌ | 885/17285 [7:51:33<146:00:47, 32.05s/it] 5%|▌ | 886/17285 [7:52:01<140:27:09, 30.83s/it] 5%|▌ | 887/17285 [7:52:33<141:42:53, 31.11s/it] 5%|▌ | 888/17285 [7:53:06<144:48:49, 31.79s/it] 5%|▌ | 889/17285 [7:53:45<153:32:16, 33.71s/it] 5%|▌ | 890/17285 [7:54:23<160:01:01, 35.14s/it] {'loss': 2.0484, 'learning_rate': 0.0001999997364362091, 'epoch': 0.15} + 5%|▌ | 890/17285 [7:54:23<160:01:01, 35.14s/it] 5%|▌ | 891/17285 [7:54:53<153:24:36, 33.69s/it] 5%|▌ | 892/17285 [7:55:19<143:05:36, 31.42s/it] 5%|▌ | 893/17285 [7:55:49<141:07:22, 30.99s/it] 5%|▌ | 894/17285 [7:56:24<145:50:54, 32.03s/it] 5%|▌ | 895/17285 [7:57:02<153:28:35, 33.71s/it] 5%|▌ | 896/17285 [7:57:36<154:54:16, 34.03s/it] 5%|▌ | 897/17285 [7:58:12<156:45:58, 34.44s/it] 5%|▌ | 898/17285 [7:58:47<157:46:49, 34.66s/it] 5%|▌ | 899/17285 [7:59:16<150:48:27, 33.13s/it] 5%|▌ | 900/17285 [7:59:43<142:15:01, 31.25s/it] {'loss': 2.0489, 'learning_rate': 0.00019999911413373273, 'epoch': 0.16} + 5%|▌ | 900/17285 [7:59:43<142:15:01, 31.25s/it] 5%|▌ | 901/17285 [8:00:19<148:14:29, 32.57s/it] 5%|▌ | 902/17285 [8:00:58<156:47:31, 34.45s/it] 5%|▌ | 903/17285 [8:01:29<151:43:24, 33.34s/it] 5%|▌ | 904/17285 [8:02:00<149:15:54, 32.80s/it] 5%|▌ | 905/17285 [8:02:30<145:02:55, 31.88s/it] 5%|▌ | 906/17285 [8:03:00<143:17:01, 31.49s/it] 5%|▌ | 907/17285 [8:03:39<153:04:50, 33.65s/it] 5%|▌ | 908/17285 [8:04:05<141:52:30, 31.19s/it] 5%|▌ | 909/17285 [8:04:41<149:31:27, 32.87s/it] 5%|▌ | 910/17285 [8:05:13<148:02:04, 32.55s/it] {'loss': 2.0073, 'learning_rate': 0.00019999812577362934, 'epoch': 0.16} + 5%|▌ | 910/17285 [8:05:13<148:02:04, 32.55s/it] 5%|▌ | 911/17285 [8:05:46<148:42:11, 32.69s/it] 5%|▌ | 912/17285 [8:06:16<145:20:41, 31.96s/it] 5%|▌ | 913/17285 [8:06:45<140:11:11, 30.83s/it] 5%|▌ | 914/17285 [8:07:15<139:57:02, 30.78s/it] 5%|▌ | 915/17285 [8:07:48<143:16:16, 31.51s/it] 5%|▌ | 916/17285 [8:08:24<149:08:58, 32.80s/it] 5%|▌ | 917/17285 [8:08:57<148:43:14, 32.71s/it] 5%|▌ | 918/17285 [8:09:35<156:30:36, 34.43s/it] 5%|▌ | 919/17285 [8:10:02<146:06:57, 32.14s/it] 5%|▌ | 920/17285 [8:10:38<151:03:44, 33.23s/it] {'loss': 2.0241, 'learning_rate': 0.0001999967713595169, 'epoch': 0.16} + 5%|▌ | 920/17285 [8:10:38<151:03:44, 33.23s/it] 5%|▌ | 921/17285 [8:11:03<140:49:09, 30.98s/it] 5%|▌ | 922/17285 [8:11:42<151:07:26, 33.25s/it] 5%|▌ | 923/17285 [8:12:08<141:18:34, 31.09s/it] 5%|▌ | 924/17285 [8:12:44<147:13:44, 32.40s/it] 5%|▌ | 925/17285 [8:13:25<160:06:44, 35.23s/it] 5%|▌ | 926/17285 [8:14:01<160:23:42, 35.30s/it] 5%|▌ | 927/17285 [8:14:28<149:46:05, 32.96s/it] 5%|▌ | 928/17285 [8:15:02<151:04:45, 33.25s/it] 5%|▌ | 929/17285 [8:15:38<153:45:42, 33.84s/it] 5%|▌ | 930/17285 [8:16:10<151:27:26, 33.34s/it] {'loss': 2.0097, 'learning_rate': 0.00019999505089635347, 'epoch': 0.16} + 5%|▌ | 930/17285 [8:16:10<151:27:26, 33.34s/it] 5%|▌ | 931/17285 [8:16:48<157:58:00, 34.77s/it] 5%|▌ | 932/17285 [8:17:17<150:26:49, 33.12s/it] 5%|▌ | 933/17285 [8:17:48<147:20:17, 32.44s/it] 5%|▌ | 934/17285 [8:18:21<148:26:26, 32.68s/it] 5%|▌ | 935/17285 [8:18:55<150:34:26, 33.15s/it] 5%|▌ | 936/17285 [8:19:27<149:02:42, 32.82s/it] 5%|▌ | 937/17285 [8:20:11<163:39:56, 36.04s/it] 5%|▌ | 938/17285 [8:20:42<157:23:34, 34.66s/it] 5%|▌ | 939/17285 [8:21:16<156:23:07, 34.44s/it] 5%|▌ | 940/17285 [8:21:51<156:19:45, 34.43s/it] {'loss': 2.0251, 'learning_rate': 0.0001999929643904369, 'epoch': 0.16} + 5%|▌ | 940/17285 [8:21:51<156:19:45, 34.43s/it] 5%|▌ | 941/17285 [8:22:32<165:41:34, 36.50s/it] 5%|▌ | 942/17285 [8:23:02<156:52:16, 34.56s/it] 5%|▌ | 943/17285 [8:23:28<144:24:09, 31.81s/it] 5%|▌ | 944/17285 [8:24:00<145:57:48, 32.16s/it] 5%|▌ | 945/17285 [8:24:34<148:21:44, 32.69s/it] 5%|▌ | 946/17285 [8:24:59<137:16:46, 30.25s/it] 5%|▌ | 947/17285 [8:25:29<136:20:38, 30.04s/it] 5%|▌ | 948/17285 [8:25:55<131:36:31, 29.00s/it] 5%|▌ | 949/17285 [8:26:26<134:11:41, 29.57s/it] 5%|▌ | 950/17285 [8:27:02<143:34:46, 31.64s/it] {'loss': 1.9893, 'learning_rate': 0.00019999051184940516, 'epoch': 0.16} + 5%|▌ | 950/17285 [8:27:02<143:34:46, 31.64s/it] 6%|▌ | 951/17285 [8:27:34<143:36:50, 31.65s/it] 6%|▌ | 952/17285 [8:28:07<144:52:28, 31.93s/it] 6%|▌ | 953/17285 [8:28:36<141:42:37, 31.24s/it] 6%|▌ | 954/17285 [8:29:12<147:57:35, 32.62s/it] 6%|▌ | 955/17285 [8:29:45<147:42:20, 32.56s/it] 6%|▌ | 956/17285 [8:30:16<146:27:17, 32.29s/it] 6%|▌ | 957/17285 [8:30:53<151:51:40, 33.48s/it] 6%|▌ | 958/17285 [8:31:20<143:38:34, 31.67s/it] 6%|▌ | 959/17285 [8:31:57<150:35:20, 33.21s/it] 6%|▌ | 960/17285 [8:32:29<149:07:48, 32.89s/it] {'loss': 1.9893, 'learning_rate': 0.00019998769328223598, 'epoch': 0.17} + 6%|▌ | 960/17285 [8:32:29<149:07:48, 32.89s/it] 6%|▌ | 961/17285 [8:32:59<145:06:07, 32.00s/it] 6%|▌ | 962/17285 [8:33:32<146:01:53, 32.21s/it] 6%|▌ | 963/17285 [8:34:05<147:26:07, 32.52s/it] 6%|▌ | 964/17285 [8:34:32<140:13:05, 30.93s/it] 6%|▌ | 965/17285 [8:35:10<149:30:56, 32.98s/it] 6%|▌ | 966/17285 [8:35:45<152:20:41, 33.61s/it] 6%|▌ | 967/17285 [8:36:16<148:23:37, 32.74s/it] 6%|▌ | 968/17285 [8:36:48<147:22:51, 32.52s/it] 6%|▌ | 969/17285 [8:37:18<144:41:22, 31.92s/it] 6%|▌ | 970/17285 [8:37:49<143:54:50, 31.76s/it] {'loss': 1.9321, 'learning_rate': 0.00019998450869924703, 'epoch': 0.17} + 6%|▌ | 970/17285 [8:37:49<143:54:50, 31.76s/it] 6%|▌ | 971/17285 [8:38:20<141:39:53, 31.26s/it] 6%|▌ | 972/17285 [8:38:45<134:07:41, 29.60s/it] 6%|▌ | 973/17285 [8:39:19<140:11:00, 30.94s/it] 6%|▌ | 974/17285 [8:39:51<141:17:47, 31.19s/it] 6%|▌ | 975/17285 [8:40:21<139:21:45, 30.76s/it] 6%|▌ | 976/17285 [8:40:54<142:02:27, 31.35s/it] 6%|▌ | 977/17285 [8:41:27<144:37:49, 31.93s/it] 6%|▌ | 978/17285 [8:41:55<139:17:07, 30.75s/it] 6%|▌ | 979/17285 [8:42:27<141:20:50, 31.21s/it] 6%|▌ | 980/17285 [8:42:57<139:14:56, 30.74s/it] {'loss': 2.0008, 'learning_rate': 0.00019998095811209587, 'epoch': 0.17} + 6%|▌ | 980/17285 [8:42:57<139:14:56, 30.74s/it] 6%|▌ | 981/17285 [8:43:25<135:16:37, 29.87s/it] 6%|▌ | 982/17285 [8:43:52<132:20:13, 29.22s/it] 6%|▌ | 983/17285 [8:44:19<128:13:06, 28.31s/it] 6%|▌ | 984/17285 [8:44:52<135:09:14, 29.85s/it] 6%|▌ | 985/17285 [8:45:23<137:10:42, 30.30s/it] 6%|▌ | 986/17285 [8:45:58<142:53:14, 31.56s/it] 6%|▌ | 987/17285 [8:46:33<147:42:53, 32.63s/it] 6%|▌ | 988/17285 [8:46:58<137:46:56, 30.44s/it] 6%|▌ | 989/17285 [8:47:29<138:23:38, 30.57s/it] 6%|▌ | 990/17285 [8:48:03<142:34:15, 31.50s/it] {'loss': 2.0254, 'learning_rate': 0.00019997704153377978, 'epoch': 0.17} + 6%|▌ | 990/17285 [8:48:03<142:34:15, 31.50s/it] 6%|▌ | 991/17285 [8:48:34<141:31:00, 31.27s/it] 6%|▌ | 992/17285 [8:49:10<148:34:01, 32.83s/it] 6%|▌ | 993/17285 [8:49:43<148:41:24, 32.86s/it] 6%|▌ | 994/17285 [8:50:19<152:46:23, 33.76s/it] 6%|▌ | 995/17285 [8:50:44<141:47:03, 31.33s/it] 6%|▌ | 996/17285 [8:51:23<151:13:10, 33.42s/it] 6%|▌ | 997/17285 [8:51:55<149:03:49, 32.95s/it] 6%|▌ | 998/17285 [8:52:43<170:12:02, 37.62s/it] 6%|▌ | 999/17285 [8:53:16<164:21:35, 36.33s/it] 6%|▌ | 1000/17285 [8:53:51<162:15:14, 35.87s/it] {'loss': 1.873, 'learning_rate': 0.0001999727589786358, 'epoch': 0.17} + 6%|▌ | 1000/17285 [8:53:51<162:15:14, 35.87s/it][INFO|trainer.py:3081] 2023-08-23 08:48:28,968 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-23 08:48:28,968 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-23 08:48:28,968 >> Batch size = 2 + + 0%| | 0/33 [00:00> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-1000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-23 08:49:54,129 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-1000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-1000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-1000 + 6%|▌ | 1001/17285 [8:55:59<286:29:21, 63.34s/it] 6%|▌ | 1002/17285 [8:56:32<245:43:30, 54.33s/it] 6%|▌ | 1003/17285 [8:57:02<213:19:09, 47.17s/it] 6%|▌ | 1004/17285 [8:57:38<197:03:47, 43.57s/it] 6%|▌ | 1005/17285 [8:58:08<178:35:28, 39.49s/it] 6%|▌ | 1006/17285 [8:58:44<174:09:47, 38.52s/it] 6%|▌ | 1007/17285 [8:59:15<164:47:38, 36.45s/it] 6%|▌ | 1008/17285 [8:59:47<158:30:43, 35.06s/it] 6%|▌ | 1009/17285 [9:00:20<155:53:43, 34.48s/it] 6%|▌ | 1010/17285 [9:00:54<154:38:35, 34.21s/it] {'loss': 1.9664, 'learning_rate': 0.00019996811046234077, 'epoch': 0.18} + 6%|▌ | 1010/17285 [9:00:54<154:38:35, 34.21s/it] 6%|▌ | 1011/17285 [9:01:29<155:29:00, 34.39s/it] 6%|▌ | 1012/17285 [9:02:00<151:41:47, 33.56s/it] 6%|▌ | 1013/17285 [9:02:33<149:55:05, 33.17s/it] 6%|▌ | 1014/17285 [9:03:07<151:01:26, 33.41s/it] 6%|▌ | 1015/17285 [9:03:35<143:58:20, 31.86s/it] 6%|▌ | 1016/17285 [9:04:01<136:19:14, 30.17s/it] 6%|▌ | 1017/17285 [9:04:28<131:43:55, 29.15s/it] 6%|▌ | 1018/17285 [9:04:53<125:43:50, 27.83s/it] 6%|▌ | 1019/17285 [9:05:21<125:58:03, 27.88s/it] 6%|▌ | 1020/17285 [9:05:48<125:18:41, 27.74s/it] {'loss': 1.9404, 'learning_rate': 0.00019996309600191098, 'epoch': 0.18} + 6%|▌ | 1020/17285 [9:05:48<125:18:41, 27.74s/it] 6%|▌ | 1021/17285 [9:06:17<126:56:36, 28.10s/it] 6%|▌ | 1022/17285 [9:06:52<136:47:38, 30.28s/it] 6%|▌ | 1023/17285 [9:07:24<138:09:16, 30.58s/it] 6%|▌ | 1024/17285 [9:07:53<136:43:01, 30.27s/it] 6%|▌ | 1025/17285 [9:08:31<147:36:44, 32.68s/it] 6%|▌ | 1026/17285 [9:09:00<142:29:31, 31.55s/it] 6%|▌ | 1027/17285 [9:09:41<154:26:32, 34.20s/it] 6%|▌ | 1028/17285 [9:10:06<142:13:57, 31.50s/it] 6%|▌ | 1029/17285 [9:10:37<141:07:50, 31.25s/it] 6%|▌ | 1030/17285 [9:11:11<144:51:07, 32.08s/it] {'loss': 1.969, 'learning_rate': 0.00019995771561570248, 'epoch': 0.18} + 6%|▌ | 1030/17285 [9:11:11<144:51:07, 32.08s/it] 6%|▌ | 1031/17285 [9:11:48<152:18:18, 33.73s/it] 6%|▌ | 1032/17285 [9:12:20<150:08:22, 33.26s/it] 6%|▌ | 1033/17285 [9:12:53<149:29:50, 33.12s/it] 6%|▌ | 1034/17285 [9:13:24<146:03:03, 32.35s/it] 6%|▌ | 1035/17285 [9:13:49<136:43:29, 30.29s/it] 6%|▌ | 1036/17285 [9:14:18<134:52:35, 29.88s/it] 6%|▌ | 1037/17285 [9:14:47<134:05:58, 29.71s/it] 6%|▌ | 1038/17285 [9:15:23<141:25:55, 31.34s/it] 6%|▌ | 1039/17285 [9:15:54<141:01:09, 31.25s/it] 6%|▌ | 1040/17285 [9:16:22<137:14:01, 30.41s/it] {'loss': 1.9545, 'learning_rate': 0.00019995196932341073, 'epoch': 0.18} + 6%|▌ | 1040/17285 [9:16:22<137:14:01, 30.41s/it] 6%|▌ | 1041/17285 [9:16:57<143:03:11, 31.70s/it] 6%|▌ | 1042/17285 [9:17:28<142:30:30, 31.58s/it] 6%|▌ | 1043/17285 [9:17:58<140:06:56, 31.06s/it] 6%|▌ | 1044/17285 [9:18:35<148:32:58, 32.93s/it] 6%|▌ | 1045/17285 [9:19:10<151:25:11, 33.57s/it] 6%|▌ | 1046/17285 [9:19:48<157:01:28, 34.81s/it] 6%|▌ | 1047/17285 [9:20:18<150:05:46, 33.28s/it] 6%|▌ | 1048/17285 [9:20:49<147:15:32, 32.65s/it] 6%|▌ | 1049/17285 [9:21:25<152:16:01, 33.76s/it] 6%|▌ | 1050/17285 [9:22:01<154:40:07, 34.30s/it] {'loss': 1.9141, 'learning_rate': 0.00019994585714607066, 'epoch': 0.18} + 6%|▌ | 1050/17285 [9:22:01<154:40:07, 34.30s/it] 6%|▌ | 1051/17285 [9:22:38<158:20:55, 35.11s/it] 6%|▌ | 1052/17285 [9:23:16<162:40:12, 36.08s/it] 6%|▌ | 1053/17285 [9:23:46<154:54:12, 34.36s/it] 6%|▌ | 1054/17285 [9:24:17<150:15:26, 33.33s/it] 6%|▌ | 1055/17285 [9:24:50<148:55:31, 33.03s/it] 6%|▌ | 1056/17285 [9:25:18<142:35:34, 31.63s/it] 6%|▌ | 1057/17285 [9:25:51<144:24:05, 32.03s/it] 6%|▌ | 1058/17285 [9:26:31<154:34:36, 34.29s/it] 6%|▌ | 1059/17285 [9:27:06<155:52:28, 34.58s/it] 6%|▌ | 1060/17285 [9:27:36<149:13:54, 33.11s/it] {'loss': 1.9299, 'learning_rate': 0.00019993937910605658, 'epoch': 0.18} + 6%|▌ | 1060/17285 [9:27:36<149:13:54, 33.11s/it] 6%|▌ | 1061/17285 [9:28:02<140:42:17, 31.22s/it] 6%|▌ | 1062/17285 [9:28:41<150:10:16, 33.32s/it] 6%|▌ | 1063/17285 [9:29:14<149:35:20, 33.20s/it] 6%|▌ | 1064/17285 [9:29:42<143:48:27, 31.92s/it] 6%|▌ | 1065/17285 [9:30:22<154:09:03, 34.21s/it] 6%|▌ | 1066/17285 [9:30:53<149:38:34, 33.22s/it] 6%|▌ | 1067/17285 [9:31:23<145:31:39, 32.30s/it] 6%|▌ | 1068/17285 [9:31:53<142:29:42, 31.63s/it] 6%|▌ | 1069/17285 [9:32:23<140:08:22, 31.11s/it] 6%|▌ | 1070/17285 [9:33:00<148:20:59, 32.94s/it] {'loss': 1.9305, 'learning_rate': 0.00019993253522708205, 'epoch': 0.19} + 6%|▌ | 1070/17285 [9:33:00<148:20:59, 32.94s/it] 6%|▌ | 1071/17285 [9:33:29<142:57:11, 31.74s/it] 6%|▌ | 1072/17285 [9:34:01<143:29:04, 31.86s/it] 6%|▌ | 1073/17285 [9:34:30<139:32:31, 30.99s/it] 6%|▌ | 1074/17285 [9:35:04<142:48:56, 31.72s/it] 6%|▌ | 1075/17285 [9:35:38<146:09:41, 32.46s/it] 6%|▌ | 1076/17285 [9:36:09<144:20:42, 32.06s/it] 6%|▌ | 1077/17285 [9:36:38<139:52:21, 31.07s/it] 6%|▌ | 1078/17285 [9:37:10<141:39:38, 31.47s/it] 6%|▌ | 1079/17285 [9:37:42<141:28:39, 31.43s/it] 6%|▌ | 1080/17285 [9:38:14<143:06:18, 31.79s/it] {'loss': 1.902, 'learning_rate': 0.0001999253255341998, 'epoch': 0.19} + 6%|▌ | 1080/17285 [9:38:14<143:06:18, 31.79s/it] 6%|▋ | 1081/17285 [9:38:57<158:04:14, 35.12s/it] 6%|▋ | 1082/17285 [9:39:27<151:04:43, 33.57s/it] 6%|▋ | 1083/17285 [9:39:53<140:50:11, 31.29s/it] 6%|▋ | 1084/17285 [9:40:26<142:42:50, 31.71s/it] 6%|▋ | 1085/17285 [9:40:57<142:00:46, 31.56s/it] 6%|▋ | 1086/17285 [9:41:36<152:10:14, 33.82s/it] 6%|▋ | 1087/17285 [9:42:03<143:25:41, 31.88s/it] 6%|▋ | 1088/17285 [9:42:34<142:28:54, 31.67s/it] 6%|▋ | 1089/17285 [9:43:01<135:27:38, 30.11s/it] 6%|▋ | 1090/17285 [9:43:34<139:40:38, 31.05s/it] {'loss': 1.9416, 'learning_rate': 0.00019991775005380173, 'epoch': 0.19} + 6%|▋ | 1090/17285 [9:43:34<139:40:38, 31.05s/it][2023-08-23 09:38:38,511] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2. Reducing hysteresis to 1 + 6%|▋ | 1091/17285 [9:44:01<133:42:22, 29.72s/it] 6%|▋ | 1092/17285 [9:44:36<140:47:45, 31.30s/it] 6%|▋ | 1093/17285 [9:45:07<140:23:00, 31.21s/it] 6%|▋ | 1094/17285 [9:45:42<145:28:40, 32.35s/it] 6%|▋ | 1095/17285 [9:46:08<136:55:08, 30.45s/it] 6%|▋ | 1096/17285 [9:46:41<140:50:16, 31.32s/it] 6%|▋ | 1097/17285 [9:47:13<142:03:47, 31.59s/it] 6%|▋ | 1098/17285 [9:47:51<149:31:14, 33.25s/it] 6%|▋ | 1099/17285 [9:48:33<161:58:02, 36.02s/it] 6%|▋ | 1100/17285 [9:49:02<152:49:05, 33.99s/it] {'loss': 1.9164, 'learning_rate': 0.00019991061939600934, 'epoch': 0.19} + 6%|▋ | 1100/17285 [9:49:02<152:49:05, 33.99s/it] 6%|▋ | 1101/17285 [9:49:39<157:03:14, 34.94s/it] 6%|▋ | 1102/17285 [9:50:11<153:03:13, 34.05s/it] 6%|▋ | 1103/17285 [9:50:42<148:15:11, 32.98s/it] 6%|▋ | 1104/17285 [9:51:20<154:32:25, 34.38s/it] 6%|▋ | 1105/17285 [9:51:54<154:35:55, 34.40s/it] 6%|▋ | 1106/17285 [9:52:22<146:17:58, 32.55s/it] 6%|▋ | 1107/17285 [9:52:56<147:28:38, 32.82s/it] 6%|▋ | 1108/17285 [9:53:28<147:10:25, 32.75s/it] 6%|▋ | 1109/17285 [9:53:56<140:56:29, 31.37s/it] 6%|▋ | 1110/17285 [9:54:25<137:39:49, 30.64s/it] {'loss': 1.947, 'learning_rate': 0.00019990234899683635, 'epoch': 0.19} + 6%|▋ | 1110/17285 [9:54:25<137:39:49, 30.64s/it] 6%|▋ | 1111/17285 [9:54:52<131:59:26, 29.38s/it] 6%|▋ | 1112/17285 [9:55:30<144:01:41, 32.06s/it] 6%|▋ | 1113/17285 [9:56:04<146:15:04, 32.56s/it] 6%|▋ | 1114/17285 [9:56:32<139:55:39, 31.15s/it] 6%|▋ | 1115/17285 [9:57:06<144:21:21, 32.14s/it] 6%|▋ | 1116/17285 [9:57:42<148:50:57, 33.14s/it] 6%|▋ | 1117/17285 [9:58:11<143:56:25, 32.05s/it] 6%|▋ | 1118/17285 [9:58:40<139:20:49, 31.03s/it] 6%|▋ | 1119/17285 [9:59:07<134:27:56, 29.94s/it] 6%|▋ | 1120/17285 [9:59:43<142:31:49, 31.74s/it] {'loss': 1.9242, 'learning_rate': 0.00019989371289425568, 'epoch': 0.19} + 6%|▋ | 1120/17285 [9:59:43<142:31:49, 31.74s/it] 6%|▋ | 1121/17285 [10:00:16<144:06:09, 32.09s/it] 6%|▋ | 1122/17285 [10:00:52<149:37:42, 33.33s/it] 6%|▋ | 1123/17285 [10:01:24<147:31:26, 32.86s/it] 7%|▋ | 1124/17285 [10:02:00<151:10:49, 33.68s/it] 7%|▋ | 1125/17285 [10:02:44<166:05:26, 37.00s/it] 7%|▋ | 1126/17285 [10:03:14<155:42:49, 34.69s/it] 7%|▋ | 1127/17285 [10:03:42<147:03:49, 32.77s/it] 7%|▋ | 1128/17285 [10:04:12<142:52:15, 31.83s/it] 7%|▋ | 1129/17285 [10:04:46<146:09:00, 32.57s/it] 7%|▋ | 1130/17285 [10:05:20<148:51:26, 33.17s/it] {'loss': 1.9037, 'learning_rate': 0.00019988471111988062, 'epoch': 0.2} + 7%|▋ | 1130/17285 [10:05:20<148:51:26, 33.17s/it] 7%|▋ | 1131/17285 [10:05:55<151:10:05, 33.69s/it] 7%|▋ | 1132/17285 [10:06:23<142:32:43, 31.77s/it] 7%|▋ | 1133/17285 [10:07:00<149:59:38, 33.43s/it] 7%|▋ | 1134/17285 [10:07:33<149:08:58, 33.24s/it] 7%|▋ | 1135/17285 [10:08:10<154:03:09, 34.34s/it] 7%|▋ | 1136/17285 [10:08:42<150:45:00, 33.61s/it] 7%|▋ | 1137/17285 [10:09:09<141:57:16, 31.65s/it] 7%|▋ | 1138/17285 [10:09:39<139:50:01, 31.18s/it] 7%|▋ | 1139/17285 [10:10:06<134:34:11, 30.00s/it] 7%|▋ | 1140/17285 [10:10:35<133:35:16, 29.79s/it] {'loss': 1.915, 'learning_rate': 0.00019987534370666328, 'epoch': 0.2} + 7%|▋ | 1140/17285 [10:10:35<133:35:16, 29.79s/it] 7%|▋ | 1141/17285 [10:11:04<132:10:00, 29.47s/it] 7%|▋ | 1142/17285 [10:11:31<128:52:54, 28.74s/it] 7%|▋ | 1143/17285 [10:12:06<136:43:00, 30.49s/it] 7%|▋ | 1144/17285 [10:12:33<132:39:09, 29.59s/it] 7%|▋ | 1145/17285 [10:13:02<131:14:11, 29.27s/it] 7%|▋ | 1146/17285 [10:13:27<125:45:14, 28.05s/it] 7%|▋ | 1147/17285 [10:13:57<128:48:44, 28.73s/it] 7%|▋ | 1148/17285 [10:14:31<135:55:29, 30.32s/it] 7%|▋ | 1149/17285 [10:14:56<128:16:47, 28.62s/it] 7%|▋ | 1150/17285 [10:15:27<131:19:16, 29.30s/it] {'loss': 1.9268, 'learning_rate': 0.000199865610688894, 'epoch': 0.2} + 7%|▋ | 1150/17285 [10:15:27<131:19:16, 29.30s/it] 7%|▋ | 1151/17285 [10:16:02<138:56:31, 31.00s/it] 7%|▋ | 1152/17285 [10:16:35<142:03:38, 31.70s/it] 7%|▋ | 1153/17285 [10:17:05<140:00:24, 31.24s/it] 7%|▋ | 1154/17285 [10:17:37<141:17:34, 31.53s/it] 7%|▋ | 1155/17285 [10:18:12<144:55:34, 32.35s/it] 7%|▋ | 1156/17285 [10:18:50<153:13:03, 34.20s/it] 7%|▋ | 1157/17285 [10:19:20<146:59:54, 32.81s/it] 7%|▋ | 1158/17285 [10:19:53<147:01:31, 32.82s/it] 7%|▋ | 1159/17285 [10:20:18<137:11:58, 30.63s/it] 7%|▋ | 1160/17285 [10:20:52<142:07:59, 31.73s/it] {'loss': 1.9268, 'learning_rate': 0.00019985551210220158, 'epoch': 0.2} + 7%|▋ | 1160/17285 [10:20:52<142:07:59, 31.73s/it] 7%|▋ | 1161/17285 [10:21:24<141:53:24, 31.68s/it] 7%|▋ | 1162/17285 [10:21:57<143:28:02, 32.03s/it] 7%|▋ | 1163/17285 [10:22:27<141:14:33, 31.54s/it] 7%|▋ | 1164/17285 [10:22:57<139:34:08, 31.17s/it] 7%|▋ | 1165/17285 [10:23:30<141:18:00, 31.56s/it] 7%|▋ | 1166/17285 [10:23:58<136:18:37, 30.44s/it] 7%|▋ | 1167/17285 [10:24:30<138:38:29, 30.97s/it] 7%|▋ | 1168/17285 [10:25:04<143:09:57, 31.98s/it][2023-08-23 10:20:14,364] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 + 7%|▋ | 1169/17285 [10:25:37<143:42:29, 32.10s/it] 7%|▋ | 1170/17285 [10:26:10<145:00:11, 32.39s/it] {'loss': 1.9629, 'learning_rate': 0.00019984611084327463, 'epoch': 0.2} + 7%|▋ | 1170/17285 [10:26:10<145:00:11, 32.39s/it] 7%|▋ | 1171/17285 [10:26:40<141:29:18, 31.61s/it] 7%|▋ | 1172/17285 [10:27:13<144:09:17, 32.21s/it] 7%|▋ | 1173/17285 [10:27:42<140:11:27, 31.32s/it] 7%|▋ | 1174/17285 [10:28:10<135:24:05, 30.26s/it] 7%|▋ | 1175/17285 [10:28:47<143:43:07, 32.12s/it] 7%|▋ | 1176/17285 [10:29:23<149:53:11, 33.50s/it] 7%|▋ | 1177/17285 [10:30:00<153:30:31, 34.31s/it] 7%|▋ | 1178/17285 [10:30:38<159:29:41, 35.65s/it] 7%|▋ | 1179/17285 [10:31:12<156:34:47, 35.00s/it] 7%|▋ | 1180/17285 [10:31:45<153:46:24, 34.37s/it] {'loss': 1.854, 'learning_rate': 0.00019983531777857815, 'epoch': 0.2} + 7%|▋ | 1180/17285 [10:31:45<153:46:24, 34.37s/it] 7%|▋ | 1181/17285 [10:32:24<160:29:07, 35.88s/it] 7%|▋ | 1182/17285 [10:33:01<161:25:35, 36.09s/it] 7%|▋ | 1183/17285 [10:33:39<164:20:42, 36.74s/it] 7%|▋ | 1184/17285 [10:34:10<156:21:17, 34.96s/it] 7%|▋ | 1185/17285 [10:34:40<150:06:46, 33.57s/it] 7%|▋ | 1186/17285 [10:35:07<140:37:20, 31.45s/it] 7%|▋ | 1187/17285 [10:35:41<145:14:05, 32.48s/it] 7%|▋ | 1188/17285 [10:36:08<137:21:48, 30.72s/it] 7%|▋ | 1189/17285 [10:36:35<132:56:16, 29.73s/it] 7%|▋ | 1190/17285 [10:37:07<135:52:58, 30.39s/it] {'loss': 1.9051, 'learning_rate': 0.00019982415925584902, 'epoch': 0.21} + 7%|▋ | 1190/17285 [10:37:07<135:52:58, 30.39s/it] 7%|▋ | 1191/17285 [10:37:39<137:09:34, 30.68s/it] 7%|▋ | 1192/17285 [10:38:15<145:02:54, 32.45s/it] 7%|▋ | 1193/17285 [10:38:52<150:47:01, 33.73s/it] 7%|▋ | 1194/17285 [10:39:19<141:38:50, 31.69s/it] 7%|▋ | 1195/17285 [10:39:52<143:04:11, 32.01s/it] 7%|▋ | 1196/17285 [10:40:18<134:41:31, 30.14s/it] 7%|▋ | 1197/17285 [10:40:49<136:54:52, 30.64s/it] 7%|▋ | 1198/17285 [10:41:26<144:24:39, 32.32s/it] 7%|▋ | 1199/17285 [10:41:56<141:50:45, 31.74s/it] 7%|▋ | 1200/17285 [10:42:33<149:23:48, 33.44s/it] {'loss': 1.8801, 'learning_rate': 0.00019981263531593422, 'epoch': 0.21} + 7%|▋ | 1200/17285 [10:42:33<149:23:48, 33.44s/it] 7%|▋ | 1201/17285 [10:43:12<155:50:13, 34.88s/it] 7%|▋ | 1202/17285 [10:43:40<147:43:03, 33.06s/it] 7%|▋ | 1203/17285 [10:44:12<145:30:42, 32.57s/it] 7%|▋ | 1204/17285 [10:44:40<138:58:29, 31.11s/it] 7%|▋ | 1205/17285 [10:45:11<139:22:35, 31.20s/it] 7%|▋ | 1206/17285 [10:45:37<132:28:31, 29.66s/it] 7%|▋ | 1207/17285 [10:46:09<135:43:47, 30.39s/it] 7%|▋ | 1208/17285 [10:46:38<133:29:33, 29.89s/it] 7%|▋ | 1209/17285 [10:47:10<136:45:24, 30.62s/it] 7%|▋ | 1210/17285 [10:47:39<134:25:31, 30.10s/it] {'loss': 1.8322, 'learning_rate': 0.00019980074600101837, 'epoch': 0.21} + 7%|▋ | 1210/17285 [10:47:39<134:25:31, 30.10s/it] 7%|▋ | 1211/17285 [10:48:13<139:56:34, 31.34s/it] 7%|▋ | 1212/17285 [10:48:44<139:11:59, 31.18s/it] 7%|▋ | 1213/17285 [10:49:13<136:38:38, 30.61s/it] 7%|▋ | 1214/17285 [10:49:51<145:56:03, 32.69s/it] 7%|▋ | 1215/17285 [10:50:17<136:59:10, 30.69s/it] 7%|▋ | 1216/17285 [10:50:53<144:06:24, 32.28s/it] 7%|▋ | 1217/17285 [10:51:25<143:50:41, 32.23s/it] 7%|▋ | 1218/17285 [10:51:53<138:33:15, 31.04s/it] 7%|▋ | 1219/17285 [10:52:24<137:32:36, 30.82s/it] 7%|▋ | 1220/17285 [10:52:50<131:06:48, 29.38s/it] {'loss': 1.8857, 'learning_rate': 0.00019978849135462366, 'epoch': 0.21} + 7%|▋ | 1220/17285 [10:52:50<131:06:48, 29.38s/it] 7%|▋ | 1221/17285 [10:53:19<130:57:14, 29.35s/it] 7%|▋ | 1222/17285 [10:53:54<139:09:57, 31.19s/it] 7%|▋ | 1223/17285 [10:54:29<143:15:47, 32.11s/it] 7%|▋ | 1224/17285 [10:55:02<144:20:32, 32.35s/it] 7%|▋ | 1225/17285 [10:55:38<149:36:32, 33.54s/it] 7%|▋ | 1226/17285 [10:56:18<158:28:26, 35.53s/it] 7%|▋ | 1227/17285 [10:56:45<147:27:51, 33.06s/it] 7%|▋ | 1228/17285 [10:57:26<157:26:35, 35.30s/it] 7%|▋ | 1229/17285 [10:58:01<156:53:57, 35.18s/it] 7%|▋ | 1230/17285 [10:58:32<152:11:54, 34.13s/it] {'loss': 1.8805, 'learning_rate': 0.00019977587142160945, 'epoch': 0.21} + 7%|▋ | 1230/17285 [10:58:32<152:11:54, 34.13s/it] 7%|▋ | 1231/17285 [10:59:04<148:28:23, 33.29s/it] 7%|▋ | 1232/17285 [10:59:34<144:44:46, 32.46s/it] 7%|▋ | 1233/17285 [11:00:08<146:53:41, 32.94s/it] 7%|▋ | 1234/17285 [11:00:39<143:06:44, 32.10s/it] 7%|▋ | 1235/17285 [11:01:14<147:18:34, 33.04s/it] 7%|▋ | 1236/17285 [11:01:54<156:35:43, 35.13s/it] 7%|▋ | 1237/17285 [11:02:27<154:06:29, 34.57s/it] 7%|▋ | 1238/17285 [11:02:56<146:46:14, 32.93s/it] 7%|▋ | 1239/17285 [11:03:23<138:02:18, 30.97s/it] 7%|▋ | 1240/17285 [11:04:02<148:51:15, 33.40s/it] {'loss': 1.8511, 'learning_rate': 0.00019976288624817248, 'epoch': 0.22} + 7%|▋ | 1240/17285 [11:04:02<148:51:15, 33.40s/it] 7%|▋ | 1241/17285 [11:04:27<138:20:21, 31.04s/it] 7%|▋ | 1242/17285 [11:05:03<144:26:45, 32.41s/it] 7%|▋ | 1243/17285 [11:05:34<142:55:06, 32.07s/it] 7%|▋ | 1244/17285 [11:06:03<139:05:39, 31.22s/it] 7%|▋ | 1245/17285 [11:06:37<142:21:23, 31.95s/it] 7%|▋ | 1246/17285 [11:07:06<139:00:11, 31.20s/it] 7%|▋ | 1247/17285 [11:07:40<141:50:17, 31.84s/it] 7%|▋ | 1248/17285 [11:08:14<145:27:33, 32.65s/it] 7%|▋ | 1249/17285 [11:08:49<147:57:41, 33.22s/it] 7%|▋ | 1250/17285 [11:09:24<150:16:30, 33.74s/it] {'loss': 1.8872, 'learning_rate': 0.00019974953588184632, 'epoch': 0.22} + 7%|▋ | 1250/17285 [11:09:24<150:16:30, 33.74s/it] 7%|▋ | 1251/17285 [11:09:50<139:43:22, 31.37s/it] 7%|▋ | 1252/17285 [11:10:15<132:06:44, 29.66s/it] 7%|▋ | 1253/17285 [11:10:44<131:28:07, 29.52s/it] 7%|▋ | 1254/17285 [11:11:23<143:29:02, 32.22s/it] 7%|▋ | 1255/17285 [11:11:56<144:56:32, 32.55s/it] 7%|▋ | 1256/17285 [11:12:30<145:55:00, 32.77s/it] 7%|▋ | 1257/17285 [11:12:58<139:43:31, 31.38s/it] 7%|▋ | 1258/17285 [11:13:25<134:02:48, 30.11s/it] 7%|▋ | 1259/17285 [11:14:03<144:25:28, 32.44s/it] 7%|▋ | 1260/17285 [11:14:33<142:07:56, 31.93s/it] {'loss': 1.8636, 'learning_rate': 0.00019973582037150148, 'epoch': 0.22} + 7%|▋ | 1260/17285 [11:14:34<142:07:56, 31.93s/it] 7%|▋ | 1261/17285 [11:15:15<155:31:19, 34.94s/it] 7%|▋ | 1262/17285 [11:15:45<148:36:51, 33.39s/it] 7%|▋ | 1263/17285 [11:16:20<150:18:10, 33.77s/it] 7%|▋ | 1264/17285 [11:16:58<156:31:47, 35.17s/it] 7%|▋ | 1265/17285 [11:17:24<144:07:52, 32.39s/it] 7%|▋ | 1266/17285 [11:17:59<147:48:20, 33.22s/it] 7%|▋ | 1267/17285 [11:18:34<150:14:48, 33.77s/it] 7%|▋ | 1268/17285 [11:19:06<147:20:03, 33.12s/it] 7%|▋ | 1269/17285 [11:19:33<139:41:57, 31.40s/it] 7%|▋ | 1270/17285 [11:20:06<140:50:02, 31.66s/it] {'loss': 1.8701, 'learning_rate': 0.00019972173976734507, 'epoch': 0.22} + 7%|▋ | 1270/17285 [11:20:06<140:50:02, 31.66s/it] 7%|▋ | 1271/17285 [11:20:38<141:53:45, 31.90s/it] 7%|▋ | 1272/17285 [11:21:16<149:12:09, 33.54s/it] 7%|▋ | 1273/17285 [11:21:50<150:20:16, 33.80s/it] 7%|▋ | 1274/17285 [11:22:22<147:54:24, 33.26s/it] 7%|▋ | 1275/17285 [11:22:50<141:27:18, 31.81s/it] 7%|▋ | 1276/17285 [11:23:29<150:25:09, 33.83s/it] 7%|▋ | 1277/17285 [11:24:00<146:33:30, 32.96s/it] 7%|▋ | 1278/17285 [11:24:33<146:21:57, 32.92s/it] 7%|▋ | 1279/17285 [11:24:59<138:00:20, 31.04s/it] 7%|▋ | 1280/17285 [11:25:34<142:16:09, 32.00s/it] {'loss': 1.8454, 'learning_rate': 0.00019970729412092063, 'epoch': 0.22} + 7%|▋ | 1280/17285 [11:25:34<142:16:09, 32.00s/it] 7%|▋ | 1281/17285 [11:26:06<142:21:57, 32.02s/it] 7%|▋ | 1282/17285 [11:26:38<142:23:40, 32.03s/it] 7%|▋ | 1283/17285 [11:27:09<141:36:11, 31.86s/it] 7%|▋ | 1284/17285 [11:27:36<135:37:28, 30.51s/it] 7%|▋ | 1285/17285 [11:28:03<130:14:33, 29.30s/it] 7%|▋ | 1286/17285 [11:28:32<129:20:08, 29.10s/it] 7%|▋ | 1287/17285 [11:28:59<126:45:04, 28.52s/it] 7%|▋ | 1288/17285 [11:29:28<127:41:29, 28.74s/it] 7%|▋ | 1289/17285 [11:30:06<140:02:22, 31.52s/it] 7%|▋ | 1290/17285 [11:30:44<148:53:46, 33.51s/it] {'loss': 1.8941, 'learning_rate': 0.00019969248348510808, 'epoch': 0.22} + 7%|▋ | 1290/17285 [11:30:44<148:53:46, 33.51s/it] 7%|▋ | 1291/17285 [11:31:21<153:57:28, 34.65s/it] 7%|▋ | 1292/17285 [11:31:56<153:20:19, 34.52s/it] 7%|▋ | 1293/17285 [11:32:25<146:45:20, 33.04s/it] 7%|▋ | 1294/17285 [11:32:57<144:51:29, 32.61s/it] 7%|▋ | 1295/17285 [11:33:35<151:59:39, 34.22s/it] 7%|▋ | 1296/17285 [11:34:04<145:29:34, 32.76s/it] 8%|▊ | 1297/17285 [11:34:36<144:10:25, 32.46s/it] 8%|▊ | 1298/17285 [11:35:07<141:58:54, 31.97s/it] 8%|▊ | 1299/17285 [11:35:43<148:15:39, 33.39s/it] 8%|▊ | 1300/17285 [11:36:11<140:46:58, 31.71s/it] {'loss': 1.8561, 'learning_rate': 0.00019967730791412328, 'epoch': 0.23} + 8%|▊ | 1300/17285 [11:36:11<140:46:58, 31.71s/it] 8%|▊ | 1301/17285 [11:36:43<140:16:27, 31.59s/it] 8%|▊ | 1302/17285 [11:37:13<138:57:56, 31.30s/it] 8%|▊ | 1303/17285 [11:37:40<132:47:06, 29.91s/it] 8%|▊ | 1304/17285 [11:38:08<130:00:00, 29.28s/it] 8%|▊ | 1305/17285 [11:38:43<138:17:49, 31.16s/it] 8%|▊ | 1306/17285 [11:39:09<130:50:13, 29.48s/it] 8%|▊ | 1307/17285 [11:39:44<138:45:16, 31.26s/it] 8%|▊ | 1308/17285 [11:40:18<142:01:40, 32.00s/it] 8%|▊ | 1309/17285 [11:40:49<140:35:15, 31.68s/it] 8%|▊ | 1310/17285 [11:41:17<135:43:24, 30.59s/it] {'loss': 1.8992, 'learning_rate': 0.00019966176746351818, 'epoch': 0.23} + 8%|▊ | 1310/17285 [11:41:17<135:43:24, 30.59s/it] 8%|▊ | 1311/17285 [11:41:50<139:06:17, 31.35s/it] 8%|▊ | 1312/17285 [11:42:24<142:46:12, 32.18s/it] 8%|▊ | 1313/17285 [11:42:50<133:56:00, 30.19s/it] 8%|▊ | 1314/17285 [11:43:19<133:21:03, 30.06s/it] 8%|▊ | 1315/17285 [11:43:55<140:13:34, 31.61s/it] 8%|▊ | 1316/17285 [11:44:23<135:24:57, 30.53s/it] 8%|▊ | 1317/17285 [11:44:56<139:45:34, 31.51s/it] 8%|▊ | 1318/17285 [11:45:21<130:26:23, 29.41s/it] 8%|▊ | 1319/17285 [11:45:51<130:42:00, 29.47s/it] 8%|▊ | 1320/17285 [11:46:21<131:55:07, 29.75s/it] {'loss': 1.8372, 'learning_rate': 0.00019964586219018018, 'epoch': 0.23} + 8%|▊ | 1320/17285 [11:46:21<131:55:07, 29.75s/it] 8%|▊ | 1321/17285 [11:46:50<130:16:00, 29.38s/it] 8%|▊ | 1322/17285 [11:47:28<141:56:47, 32.01s/it] 8%|▊ | 1323/17285 [11:48:03<146:29:10, 33.04s/it] 8%|▊ | 1324/17285 [11:48:34<143:25:58, 32.35s/it] 8%|▊ | 1325/17285 [11:49:07<144:34:49, 32.61s/it] 8%|▊ | 1326/17285 [11:49:42<148:17:42, 33.45s/it] 8%|▊ | 1327/17285 [11:50:17<150:16:37, 33.90s/it] 8%|▊ | 1328/17285 [11:50:50<148:21:35, 33.47s/it] 8%|▊ | 1329/17285 [11:51:21<144:33:56, 32.62s/it] 8%|▊ | 1330/17285 [11:51:53<143:42:57, 32.43s/it] {'loss': 1.8278, 'learning_rate': 0.0001996295921523323, 'epoch': 0.23} + 8%|▊ | 1330/17285 [11:51:53<143:42:57, 32.43s/it] 8%|▊ | 1331/17285 [11:52:32<152:42:20, 34.46s/it] 8%|▊ | 1332/17285 [11:53:02<147:09:46, 33.21s/it] 8%|▊ | 1333/17285 [11:53:38<150:40:01, 34.00s/it] 8%|▊ | 1334/17285 [11:54:08<145:12:48, 32.77s/it] 8%|▊ | 1335/17285 [11:54:41<146:13:05, 33.00s/it] 8%|▊ | 1336/17285 [11:55:13<144:12:30, 32.55s/it] 8%|▊ | 1337/17285 [11:55:40<137:40:10, 31.08s/it] 8%|▊ | 1338/17285 [11:56:12<138:22:04, 31.24s/it] 8%|▊ | 1339/17285 [11:56:37<130:18:33, 29.42s/it] 8%|▊ | 1340/17285 [11:57:07<130:06:52, 29.38s/it] {'loss': 1.8311, 'learning_rate': 0.00019961295740953278, 'epoch': 0.23} + 8%|▊ | 1340/17285 [11:57:07<130:06:52, 29.38s/it] 8%|▊ | 1341/17285 [11:57:36<129:44:35, 29.29s/it] 8%|▊ | 1342/17285 [11:58:08<134:16:25, 30.32s/it] 8%|▊ | 1343/17285 [11:58:39<134:42:42, 30.42s/it] 8%|▊ | 1344/17285 [11:59:05<128:13:32, 28.96s/it] 8%|▊ | 1345/17285 [11:59:33<127:07:46, 28.71s/it] 8%|▊ | 1346/17285 [11:59:59<123:36:27, 27.92s/it] 8%|▊ | 1347/17285 [12:00:25<121:27:08, 27.43s/it] 8%|▊ | 1348/17285 [12:01:03<135:33:13, 30.62s/it] 8%|▊ | 1349/17285 [12:01:43<147:56:39, 33.42s/it] 8%|▊ | 1350/17285 [12:02:11<141:17:25, 31.92s/it] {'loss': 1.8281, 'learning_rate': 0.00019959595802267492, 'epoch': 0.23} + 8%|▊ | 1350/17285 [12:02:11<141:17:25, 31.92s/it] 8%|▊ | 1351/17285 [12:02:42<138:58:25, 31.40s/it] 8%|▊ | 1352/17285 [12:03:12<136:59:50, 30.95s/it] 8%|▊ | 1353/17285 [12:03:44<139:28:31, 31.52s/it] 8%|▊ | 1354/17285 [12:04:24<150:15:09, 33.95s/it] 8%|▊ | 1355/17285 [12:05:01<153:57:26, 34.79s/it] 8%|▊ | 1356/17285 [12:05:35<153:34:47, 34.71s/it] 8%|▊ | 1357/17285 [12:06:01<142:09:17, 32.13s/it] 8%|▊ | 1358/17285 [12:06:26<132:19:28, 29.91s/it] 8%|▊ | 1359/17285 [12:06:57<133:19:49, 30.14s/it] 8%|▊ | 1360/17285 [12:07:36<145:36:54, 32.92s/it] {'loss': 1.8188, 'learning_rate': 0.0001995785940539868, 'epoch': 0.24} + 8%|▊ | 1360/17285 [12:07:36<145:36:54, 32.92s/it] 8%|▊ | 1361/17285 [12:08:15<153:49:19, 34.78s/it] 8%|▊ | 1362/17285 [12:08:51<154:47:19, 35.00s/it] 8%|▊ | 1363/17285 [12:09:28<157:44:33, 35.67s/it] 8%|▊ | 1364/17285 [12:09:59<151:43:35, 34.31s/it] 8%|▊ | 1365/17285 [12:10:39<159:23:59, 36.05s/it] 8%|▊ | 1366/17285 [12:11:10<151:58:23, 34.37s/it] 8%|▊ | 1367/17285 [12:11:42<148:51:37, 33.67s/it] 8%|▊ | 1368/17285 [12:12:07<137:42:07, 31.14s/it] 8%|▊ | 1369/17285 [12:12:43<144:40:35, 32.72s/it] 8%|▊ | 1370/17285 [12:13:15<143:25:06, 32.44s/it] {'loss': 1.8156, 'learning_rate': 0.00019956086556703113, 'epoch': 0.24} + 8%|▊ | 1370/17285 [12:13:15<143:25:06, 32.44s/it] 8%|▊ | 1371/17285 [12:13:49<144:53:13, 32.78s/it] 8%|▊ | 1372/17285 [12:14:18<140:14:15, 31.73s/it] 8%|▊ | 1373/17285 [12:14:50<141:08:23, 31.93s/it] 8%|▊ | 1374/17285 [12:15:20<137:48:31, 31.18s/it] 8%|▊ | 1375/17285 [12:15:46<131:11:49, 29.69s/it] 8%|▊ | 1376/17285 [12:16:21<138:33:34, 31.35s/it] 8%|▊ | 1377/17285 [12:16:55<140:56:47, 31.90s/it] 8%|▊ | 1378/17285 [12:17:34<150:51:40, 34.14s/it] 8%|▊ | 1379/17285 [12:18:05<146:52:35, 33.24s/it] 8%|▊ | 1380/17285 [12:18:29<134:21:55, 30.41s/it] {'loss': 1.7751, 'learning_rate': 0.00019954277262670495, 'epoch': 0.24} + 8%|▊ | 1380/17285 [12:18:29<134:21:55, 30.41s/it] 8%|▊ | 1381/17285 [12:19:03<138:55:51, 31.45s/it] 8%|▊ | 1382/17285 [12:19:33<136:56:34, 31.00s/it] 8%|▊ | 1383/17285 [12:20:13<149:14:18, 33.79s/it] 8%|▊ | 1384/17285 [12:20:41<141:42:43, 32.08s/it] 8%|▊ | 1385/17285 [12:21:11<139:08:42, 31.50s/it] 8%|▊ | 1386/17285 [12:21:40<135:32:59, 30.69s/it] 8%|▊ | 1387/17285 [12:22:06<129:44:01, 29.38s/it] 8%|▊ | 1388/17285 [12:22:37<131:13:41, 29.72s/it] 8%|▊ | 1389/17285 [12:23:04<128:18:53, 29.06s/it] 8%|▊ | 1390/17285 [12:23:44<141:45:35, 32.11s/it] {'loss': 1.832, 'learning_rate': 0.00019952431529923949, 'epoch': 0.24} + 8%|▊ | 1390/17285 [12:23:44<141:45:35, 32.11s/it] 8%|▊ | 1391/17285 [12:24:19<146:36:43, 33.21s/it] 8%|▊ | 1392/17285 [12:24:53<146:59:11, 33.29s/it] 8%|▊ | 1393/17285 [12:25:28<149:21:33, 33.83s/it] 8%|▊ | 1394/17285 [12:25:58<144:26:27, 32.72s/it] 8%|▊ | 1395/17285 [12:26:27<139:31:39, 31.61s/it] 8%|▊ | 1396/17285 [12:27:03<145:49:24, 33.04s/it] 8%|▊ | 1397/17285 [12:27:37<146:39:35, 33.23s/it] 8%|▊ | 1398/17285 [12:28:08<142:58:28, 32.40s/it] 8%|▊ | 1399/17285 [12:28:44<147:57:03, 33.53s/it] 8%|▊ | 1400/17285 [12:29:13<141:57:41, 32.17s/it] {'loss': 1.8475, 'learning_rate': 0.00019950549365219968, 'epoch': 0.24} + 8%|▊ | 1400/17285 [12:29:13<141:57:41, 32.17s/it] 8%|▊ | 1401/17285 [12:29:42<137:58:52, 31.27s/it] 8%|▊ | 1402/17285 [12:30:08<131:05:45, 29.71s/it] 8%|▊ | 1403/17285 [12:30:38<132:01:59, 29.93s/it] 8%|▊ | 1404/17285 [12:31:07<129:56:28, 29.46s/it] 8%|▊ | 1405/17285 [12:31:35<128:08:59, 29.05s/it] 8%|▊ | 1406/17285 [12:32:02<125:15:27, 28.40s/it] 8%|▊ | 1407/17285 [12:32:32<127:43:14, 28.96s/it] 8%|▊ | 1408/17285 [12:33:00<126:32:58, 28.69s/it] 8%|▊ | 1409/17285 [12:33:30<128:39:07, 29.17s/it] 8%|▊ | 1410/17285 [12:34:01<130:28:00, 29.59s/it] {'loss': 1.8329, 'learning_rate': 0.00019948630775448433, 'epoch': 0.24} + 8%|▊ | 1410/17285 [12:34:01<130:28:00, 29.59s/it] 8%|▊ | 1411/17285 [12:34:39<142:06:17, 32.23s/it] 8%|▊ | 1412/17285 [12:35:11<141:36:38, 32.12s/it] 8%|▊ | 1413/17285 [12:35:42<140:18:30, 31.82s/it] 8%|▊ | 1414/17285 [12:36:21<148:49:02, 33.76s/it] 8%|▊ | 1415/17285 [12:36:54<148:32:54, 33.70s/it] 8%|▊ | 1416/17285 [12:37:29<149:34:38, 33.93s/it] 8%|▊ | 1417/17285 [12:38:08<156:22:34, 35.48s/it] 8%|▊ | 1418/17285 [12:38:39<150:15:14, 34.09s/it] 8%|▊ | 1419/17285 [12:39:05<140:10:54, 31.81s/it] 8%|▊ | 1420/17285 [12:39:39<143:22:07, 32.53s/it] {'loss': 1.8352, 'learning_rate': 0.00019946675767632544, 'epoch': 0.25} + 8%|▊ | 1420/17285 [12:39:39<143:22:07, 32.53s/it] 8%|▊ | 1421/17285 [12:40:11<142:21:06, 32.30s/it] 8%|▊ | 1422/17285 [12:40:42<141:01:00, 32.00s/it] 8%|▊ | 1423/17285 [12:41:15<142:20:03, 32.30s/it] 8%|▊ | 1424/17285 [12:41:49<144:03:59, 32.70s/it] 8%|▊ | 1425/17285 [12:42:15<135:37:36, 30.79s/it] 8%|▊ | 1426/17285 [12:42:44<133:08:14, 30.22s/it] 8%|▊ | 1427/17285 [12:43:13<131:25:12, 29.83s/it] 8%|▊ | 1428/17285 [12:43:49<139:36:29, 31.70s/it] 8%|▊ | 1429/17285 [12:44:19<136:45:27, 31.05s/it] 8%|▊ | 1430/17285 [12:45:05<156:37:30, 35.56s/it] {'loss': 1.8325, 'learning_rate': 0.00019944684348928822, 'epoch': 0.25} + 8%|▊ | 1430/17285 [12:45:05<156:37:30, 35.56s/it] 8%|▊ | 1431/17285 [12:45:43<159:39:53, 36.26s/it] 8%|▊ | 1432/17285 [12:46:15<154:08:03, 35.00s/it] 8%|▊ | 1433/17285 [12:46:46<149:26:59, 33.94s/it] 8%|▊ | 1434/17285 [12:47:15<143:10:05, 32.52s/it][2023-08-23 12:42:27,970] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 8%|▊ | 1435/17285 [12:47:50<146:15:00, 33.22s/it] 8%|▊ | 1436/17285 [12:48:19<140:15:58, 31.86s/it] 8%|▊ | 1437/17285 [12:48:49<137:25:59, 31.22s/it] 8%|▊ | 1438/17285 [12:49:20<137:58:00, 31.34s/it] 8%|▊ | 1439/17285 [12:49:53<139:20:12, 31.66s/it] 8%|▊ | 1440/17285 [12:50:25<139:53:31, 31.78s/it] {'loss': 1.8484, 'learning_rate': 0.00019942860946808643, 'epoch': 0.25} + 8%|▊ | 1440/17285 [12:50:25<139:53:31, 31.78s/it] 8%|▊ | 1441/17285 [12:50:54<136:11:05, 30.94s/it] 8%|▊ | 1442/17285 [12:51:25<135:55:51, 30.89s/it] 8%|▊ | 1443/17285 [12:51:57<137:22:20, 31.22s/it] 8%|▊ | 1444/17285 [12:52:34<145:03:06, 32.96s/it] 8%|▊ | 1445/17285 [12:53:10<149:28:53, 33.97s/it] 8%|▊ | 1446/17285 [12:53:43<148:54:03, 33.84s/it] 8%|▊ | 1447/17285 [12:54:17<147:56:49, 33.63s/it] 8%|▊ | 1448/17285 [12:54:47<143:10:26, 32.55s/it] 8%|▊ | 1449/17285 [12:55:17<139:49:09, 31.79s/it] 8%|▊ | 1450/17285 [12:55:47<138:39:04, 31.52s/it] {'loss': 1.837, 'learning_rate': 0.00019940800367611585, 'epoch': 0.25} + 8%|▊ | 1450/17285 [12:55:48<138:39:04, 31.52s/it] 8%|▊ | 1451/17285 [12:56:23<144:21:11, 32.82s/it] 8%|▊ | 1452/17285 [12:56:57<145:28:08, 33.08s/it] 8%|▊ | 1453/17285 [12:57:36<152:53:05, 34.76s/it] 8%|▊ | 1454/17285 [12:58:18<162:10:16, 36.88s/it] 8%|▊ | 1455/17285 [12:58:45<149:27:16, 33.99s/it] 8%|▊ | 1456/17285 [12:59:31<165:17:43, 37.59s/it] 8%|▊ | 1457/17285 [12:59:57<150:26:37, 34.22s/it] 8%|▊ | 1458/17285 [13:00:27<145:05:42, 33.00s/it] 8%|▊ | 1459/17285 [13:00:56<139:40:12, 31.77s/it] 8%|▊ | 1460/17285 [13:01:26<136:35:34, 31.07s/it] {'loss': 1.8295, 'learning_rate': 0.00019938703399034234, 'epoch': 0.25} + 8%|▊ | 1460/17285 [13:01:26<136:35:34, 31.07s/it] 8%|▊ | 1461/17285 [13:02:07<150:42:19, 34.29s/it] 8%|▊ | 1462/17285 [13:02:44<154:07:00, 35.06s/it] 8%|▊ | 1463/17285 [13:03:18<152:38:26, 34.73s/it] 8%|▊ | 1464/17285 [13:03:53<152:35:53, 34.72s/it] 8%|▊ | 1465/17285 [13:04:23<146:10:45, 33.26s/it] 8%|▊ | 1466/17285 [13:04:57<147:02:47, 33.46s/it] 8%|▊ | 1467/17285 [13:05:30<146:18:48, 33.30s/it] 8%|▊ | 1468/17285 [13:06:10<155:39:46, 35.43s/it] 8%|▊ | 1469/17285 [13:06:44<153:49:44, 35.01s/it] 9%|��� | 1470/17285 [13:07:11<143:06:37, 32.58s/it] {'loss': 1.8153, 'learning_rate': 0.00019936570048752775, 'epoch': 0.26} + 9%|▊ | 1470/17285 [13:07:11<143:06:37, 32.58s/it] 9%|▊ | 1471/17285 [13:07:43<142:58:50, 32.55s/it] 9%|▊ | 1472/17285 [13:08:18<145:44:43, 33.18s/it] 9%|▊ | 1473/17285 [13:08:58<154:16:22, 35.12s/it] 9%|▊ | 1474/17285 [13:09:36<158:33:16, 36.10s/it] 9%|▊ | 1475/17285 [13:10:08<153:14:50, 34.90s/it] 9%|▊ | 1476/17285 [13:10:34<141:39:17, 32.26s/it] 9%|▊ | 1477/17285 [13:11:04<138:51:02, 31.62s/it] 9%|▊ | 1478/17285 [13:11:37<140:09:53, 31.92s/it] 9%|▊ | 1479/17285 [13:12:15<147:44:28, 33.65s/it] 9%|▊ | 1480/17285 [13:12:41<137:34:14, 31.34s/it] {'loss': 1.7925, 'learning_rate': 0.00019934400324576564, 'epoch': 0.26} + 9%|▊ | 1480/17285 [13:12:41<137:34:14, 31.34s/it] 9%|▊ | 1481/17285 [13:13:12<137:19:13, 31.28s/it] 9%|▊ | 1482/17285 [13:13:57<156:00:14, 35.54s/it] 9%|▊ | 1483/17285 [13:14:24<144:26:56, 32.91s/it] 9%|▊ | 1484/17285 [13:15:00<148:33:27, 33.85s/it] 9%|▊ | 1485/17285 [13:15:34<148:00:51, 33.72s/it] 9%|▊ | 1486/17285 [13:16:06<146:13:16, 33.32s/it] 9%|▊ | 1487/17285 [13:16:55<166:56:18, 38.04s/it] 9%|▊ | 1488/17285 [13:17:38<173:38:22, 39.57s/it] 9%|▊ | 1489/17285 [13:18:11<164:38:29, 37.52s/it] 9%|▊ | 1490/17285 [13:18:49<164:53:38, 37.58s/it] {'loss': 1.8383, 'learning_rate': 0.0001993219423444811, 'epoch': 0.26} + 9%|▊ | 1490/17285 [13:18:49<164:53:38, 37.58s/it] 9%|▊ | 1491/17285 [13:19:25<163:25:42, 37.25s/it] 9%|▊ | 1492/17285 [13:20:01<162:11:31, 36.97s/it] 9%|▊ | 1493/17285 [13:20:32<154:04:18, 35.12s/it] 9%|▊ | 1494/17285 [13:21:01<146:08:46, 33.32s/it] 9%|▊ | 1495/17285 [13:21:39<152:08:03, 34.69s/it] 9%|▊ | 1496/17285 [13:22:07<143:31:59, 32.73s/it] 9%|▊ | 1497/17285 [13:22:37<138:49:53, 31.66s/it] 9%|▊ | 1498/17285 [13:23:03<131:40:49, 30.03s/it] 9%|▊ | 1499/17285 [13:23:35<134:57:32, 30.78s/it] 9%|▊ | 1500/17285 [13:24:22<156:03:20, 35.59s/it] {'loss': 1.8135, 'learning_rate': 0.0001992995178644305, 'epoch': 0.26} + 9%|▊ | 1500/17285 [13:24:22<156:03:20, 35.59s/it] 9%|▊ | 1501/17285 [13:24:52<148:54:14, 33.96s/it] 9%|▊ | 1502/17285 [13:25:28<150:54:51, 34.42s/it] 9%|▊ | 1503/17285 [13:25:58<144:54:51, 33.06s/it] 9%|▊ | 1504/17285 [13:26:24<136:22:31, 31.11s/it] 9%|▊ | 1505/17285 [13:26:48<126:41:56, 28.90s/it] 9%|▊ | 1506/17285 [13:27:29<142:15:36, 32.46s/it] 9%|▊ | 1507/17285 [13:28:01<141:46:39, 32.35s/it] 9%|▊ | 1508/17285 [13:28:35<144:42:56, 33.02s/it] 9%|▊ | 1509/17285 [13:29:03<137:03:54, 31.28s/it] 9%|▊ | 1510/17285 [13:29:32<135:12:09, 30.85s/it] {'loss': 1.8036, 'learning_rate': 0.00019927672988770105, 'epoch': 0.26} + 9%|▊ | 1510/17285 [13:29:32<135:12:09, 30.85s/it] 9%|▊ | 1511/17285 [13:30:07<140:23:04, 32.04s/it] 9%|▊ | 1512/17285 [13:30:37<137:02:29, 31.28s/it] 9%|▉ | 1513/17285 [13:31:05<132:38:23, 30.28s/it] 9%|▉ | 1514/17285 [13:31:32<128:12:34, 29.27s/it] 9%|▉ | 1515/17285 [13:32:00<126:29:45, 28.88s/it] 9%|▉ | 1516/17285 [13:32:32<130:39:05, 29.83s/it] 9%|▉ | 1517/17285 [13:33:15<148:47:37, 33.97s/it] 9%|▉ | 1518/17285 [13:33:46<144:08:39, 32.91s/it] 9%|▉ | 1519/17285 [13:34:25<152:11:19, 34.75s/it] 9%|▉ | 1520/17285 [13:34:55<146:23:38, 33.43s/it] {'loss': 1.8035, 'learning_rate': 0.00019925357849771066, 'epoch': 0.26} + 9%|▉ | 1520/17285 [13:34:55<146:23:38, 33.43s/it] 9%|▉ | 1521/17285 [13:35:34<153:13:15, 34.99s/it] 9%|▉ | 1522/17285 [13:36:04<146:51:50, 33.54s/it] 9%|▉ | 1523/17285 [13:36:30<136:54:55, 31.27s/it] 9%|▉ | 1524/17285 [13:37:15<155:12:29, 35.45s/it] 9%|▉ | 1525/17285 [13:37:43<145:36:44, 33.26s/it] 9%|▉ | 1526/17285 [13:38:17<146:31:10, 33.47s/it][2023-08-23 13:33:24,604] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 9%|▉ | 1527/17285 [13:38:47<141:34:14, 32.34s/it] 9%|▉ | 1528/17285 [13:39:16<137:41:17, 31.46s/it] 9%|▉ | 1529/17285 [13:39:48<137:20:05, 31.38s/it] 9%|▉ | 1530/17285 [13:40:18<135:34:53, 30.98s/it] {'loss': 1.8135, 'learning_rate': 0.00019923243159839809, 'epoch': 0.27} + 9%|▉ | 1530/17285 [13:40:18<135:34:53, 30.98s/it] 9%|▉ | 1531/17285 [13:40:52<140:27:20, 32.10s/it] 9%|▉ | 1532/17285 [13:41:26<142:32:58, 32.58s/it] 9%|▉ | 1533/17285 [13:41:56<139:08:46, 31.80s/it] 9%|▉ | 1534/17285 [13:42:33<145:27:03, 33.24s/it] 9%|▉ | 1535/17285 [13:43:03<141:42:43, 32.39s/it] 9%|▉ | 1536/17285 [13:43:38<144:44:40, 33.09s/it] 9%|▉ | 1537/17285 [13:44:09<142:57:56, 32.68s/it] 9%|▉ | 1538/17285 [13:44:42<142:52:54, 32.66s/it] 9%|▉ | 1539/17285 [13:45:11<138:37:21, 31.69s/it] 9%|▉ | 1540/17285 [13:45:50<147:36:33, 33.75s/it] {'loss': 1.7839, 'learning_rate': 0.00019920858995779232, 'epoch': 0.27} + 9%|▉ | 1540/17285 [13:45:50<147:36:33, 33.75s/it] 9%|▉ | 1541/17285 [13:46:18<139:43:35, 31.95s/it] 9%|▉ | 1542/17285 [13:46:53<143:44:51, 32.87s/it] 9%|▉ | 1543/17285 [13:47:23<140:30:47, 32.13s/it] 9%|▉ | 1544/17285 [13:48:00<146:28:15, 33.50s/it] 9%|▉ | 1545/17285 [13:48:29<140:46:14, 32.20s/it] 9%|▉ | 1546/17285 [13:48:57<135:05:56, 30.90s/it] 9%|▉ | 1547/17285 [13:49:25<131:37:27, 30.11s/it] 9%|▉ | 1548/17285 [13:50:01<138:43:40, 31.74s/it] 9%|▉ | 1549/17285 [13:50:39<147:17:40, 33.70s/it] 9%|▉ | 1550/17285 [13:51:12<146:15:57, 33.46s/it] {'loss': 1.7759, 'learning_rate': 0.00019918438515335927, 'epoch': 0.27} + 9%|▉ | 1550/17285 [13:51:12<146:15:57, 33.46s/it] 9%|▉ | 1551/17285 [13:51:39<138:24:52, 31.67s/it] 9%|▉ | 1552/17285 [13:52:07<133:16:53, 30.50s/it] 9%|▉ | 1553/17285 [13:52:34<128:16:32, 29.35s/it] 9%|▉ | 1554/17285 [13:53:08<134:00:00, 30.67s/it] 9%|▉ | 1555/17285 [13:53:35<129:52:32, 29.72s/it] 9%|▉ | 1556/17285 [13:54:06<130:56:47, 29.97s/it] 9%|▉ | 1557/17285 [13:54:38<134:35:24, 30.81s/it] 9%|▉ | 1558/17285 [13:55:07<132:15:57, 30.28s/it] 9%|▉ | 1559/17285 [13:55:43<139:04:14, 31.84s/it] 9%|▉ | 1560/17285 [13:56:14<138:23:13, 31.68s/it] {'loss': 1.7933, 'learning_rate': 0.00019915981727370316, 'epoch': 0.27} + 9%|▉ | 1560/17285 [13:56:14<138:23:13, 31.68s/it] 9%|▉ | 1561/17285 [13:56:46<138:16:41, 31.66s/it] 9%|▉ | 1562/17285 [13:57:17<137:41:30, 31.53s/it] 9%|▉ | 1563/17285 [13:57:48<136:43:27, 31.31s/it] 9%|▉ | 1564/17285 [13:58:23<141:12:33, 32.34s/it] 9%|▉ | 1565/17285 [13:58:54<140:28:07, 32.17s/it] 9%|▉ | 1566/17285 [13:59:39<156:49:51, 35.92s/it] 9%|▉ | 1567/17285 [14:00:11<151:53:43, 34.79s/it] 9%|▉ | 1568/17285 [14:00:43<148:18:41, 33.97s/it] 9%|▉ | 1569/17285 [14:01:09<137:40:33, 31.54s/it] 9%|▉ | 1570/17285 [14:01:36<131:53:12, 30.21s/it] {'loss': 1.7977, 'learning_rate': 0.00019913488640875744, 'epoch': 0.27} + 9%|▉ | 1570/17285 [14:01:36<131:53:12, 30.21s/it] 9%|▉ | 1571/17285 [14:02:04<128:53:13, 29.53s/it] 9%|▉ | 1572/17285 [14:02:41<138:25:36, 31.71s/it] 9%|▉ | 1573/17285 [14:03:07<130:53:13, 29.99s/it] 9%|▉ | 1574/17285 [14:03:36<130:00:29, 29.79s/it] 9%|▉ | 1575/17285 [14:04:06<130:35:17, 29.92s/it] 9%|▉ | 1576/17285 [14:04:40<134:53:07, 30.91s/it] 9%|▉ | 1577/17285 [14:05:10<134:20:46, 30.79s/it] 9%|▉ | 1578/17285 [14:05:45<140:11:59, 32.13s/it] 9%|▉ | 1579/17285 [14:06:16<137:36:32, 31.54s/it] 9%|▉ | 1580/17285 [14:06:42<131:06:54, 30.06s/it] {'loss': 1.7797, 'learning_rate': 0.00019910959264978422, 'epoch': 0.27} + 9%|▉ | 1580/17285 [14:06:42<131:06:54, 30.06s/it] 9%|▉ | 1581/17285 [14:07:16<135:46:12, 31.12s/it] 9%|▉ | 1582/17285 [14:07:42<129:51:17, 29.77s/it] 9%|▉ | 1583/17285 [14:08:12<129:52:46, 29.78s/it] 9%|▉ | 1584/17285 [14:08:43<131:25:44, 30.13s/it] 9%|▉ | 1585/17285 [14:09:12<130:02:20, 29.82s/it] 9%|▉ | 1586/17285 [14:09:47<136:12:10, 31.23s/it] 9%|▉ | 1587/17285 [14:10:23<142:41:33, 32.72s/it] 9%|▉ | 1588/17285 [14:10:53<138:37:15, 31.79s/it] 9%|▉ | 1589/17285 [14:11:21<133:35:02, 30.64s/it] 9%|▉ | 1590/17285 [14:11:50<131:20:03, 30.12s/it] {'loss': 1.7656, 'learning_rate': 0.00019908393608937406, 'epoch': 0.28} + 9%|▉ | 1590/17285 [14:11:50<131:20:03, 30.12s/it] 9%|▉ | 1591/17285 [14:12:30<145:27:56, 33.37s/it] 9%|▉ | 1592/17285 [14:13:01<142:16:00, 32.64s/it] 9%|▉ | 1593/17285 [14:13:32<139:29:08, 32.00s/it] 9%|▉ | 1594/17285 [14:14:07<142:52:57, 32.78s/it] 9%|▉ | 1595/17285 [14:14:37<139:20:12, 31.97s/it] 9%|▉ | 1596/17285 [14:15:12<143:50:46, 33.01s/it] 9%|▉ | 1597/17285 [14:15:41<138:33:43, 31.80s/it] 9%|▉ | 1598/17285 [14:16:14<140:01:55, 32.14s/it] 9%|▉ | 1599/17285 [14:16:46<139:41:02, 32.06s/it] 9%|▉ | 1600/17285 [14:17:17<138:21:05, 31.75s/it] {'loss': 1.798, 'learning_rate': 0.00019905791682144557, 'epoch': 0.28} + 9%|▉ | 1600/17285 [14:17:17<138:21:05, 31.75s/it] 9%|▉ | 1601/17285 [14:17:49<139:11:54, 31.95s/it] 9%|▉ | 1602/17285 [14:18:19<136:58:15, 31.44s/it] 9%|▉ | 1603/17285 [14:18:59<147:42:50, 33.91s/it] 9%|▉ | 1604/17285 [14:19:35<150:21:45, 34.52s/it] 9%|▉ | 1605/17285 [14:20:07<146:58:35, 33.74s/it] 9%|▉ | 1606/17285 [14:20:32<135:46:54, 31.18s/it] 9%|▉ | 1607/17285 [14:21:05<137:25:47, 31.56s/it] 9%|▉ | 1608/17285 [14:21:36<136:36:18, 31.37s/it] 9%|▉ | 1609/17285 [14:22:08<137:57:39, 31.68s/it] 9%|▉ | 1610/17285 [14:22:43<142:11:05, 32.65s/it] {'loss': 1.7618, 'learning_rate': 0.00019903153494124518, 'epoch': 0.28} + 9%|▉ | 1610/17285 [14:22:43<142:11:05, 32.65s/it] 9%|▉ | 1611/17285 [14:23:24<152:37:21, 35.05s/it] 9%|▉ | 1612/17285 [14:24:03<157:47:12, 36.24s/it] 9%|▉ | 1613/17285 [14:24:32<148:12:55, 34.05s/it] 9%|▉ | 1614/17285 [14:25:08<151:25:10, 34.78s/it] 9%|▉ | 1615/17285 [14:25:47<156:44:01, 36.01s/it] 9%|▉ | 1616/17285 [14:26:15<146:51:47, 33.74s/it] 9%|▉ | 1617/17285 [14:26:52<150:04:20, 34.48s/it] 9%|▉ | 1618/17285 [14:27:19<141:02:22, 32.41s/it] 9%|▉ | 1619/17285 [14:27:49<137:57:12, 31.70s/it] 9%|▉ | 1620/17285 [14:28:21<138:27:05, 31.82s/it] {'loss': 1.7879, 'learning_rate': 0.00019900479054534652, 'epoch': 0.28} + 9%|▉ | 1620/17285 [14:28:21<138:27:05, 31.82s/it] 9%|▉ | 1621/17285 [14:28:53<138:19:04, 31.79s/it] 9%|▉ | 1622/17285 [14:29:18<128:50:23, 29.61s/it] 9%|▉ | 1623/17285 [14:29:44<124:47:27, 28.68s/it] 9%|▉ | 1624/17285 [14:30:19<133:14:17, 30.63s/it] 9%|▉ | 1625/17285 [14:30:54<139:13:48, 32.01s/it] 9%|▉ | 1626/17285 [14:31:21<131:32:42, 30.24s/it] 9%|▉ | 1627/17285 [14:31:51<132:06:21, 30.37s/it] 9%|▉ | 1628/17285 [14:32:26<138:20:29, 31.81s/it] 9%|▉ | 1629/17285 [14:32:56<135:26:26, 31.14s/it] 9%|▉ | 1630/17285 [14:33:28<136:49:28, 31.46s/it] {'loss': 1.7972, 'learning_rate': 0.00019897768373165046, 'epoch': 0.28} + 9%|▉ | 1630/17285 [14:33:28<136:49:28, 31.46s/it] 9%|▉ | 1631/17285 [14:33:55<130:29:58, 30.01s/it] 9%|▉ | 1632/17285 [14:34:22<126:31:01, 29.10s/it][2023-08-23 14:29:33,851] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 9%|▉ | 1633/17285 [14:34:56<133:23:56, 30.68s/it] 9%|▉ | 1634/17285 [14:35:33<141:16:24, 32.50s/it] 9%|▉ | 1635/17285 [14:36:04<139:19:50, 32.05s/it] 9%|▉ | 1636/17285 [14:36:33<135:35:05, 31.19s/it] 9%|▉ | 1637/17285 [14:37:00<129:56:39, 29.90s/it] 9%|▉ | 1638/17285 [14:37:33<133:59:32, 30.83s/it] 9%|▉ | 1639/17285 [14:38:12<144:21:03, 33.21s/it] 9%|▉ | 1640/17285 [14:38:43<142:03:20, 32.69s/it] {'loss': 1.7738, 'learning_rate': 0.00019895297781409127, 'epoch': 0.28} + 9%|▉ | 1640/17285 [14:38:43<142:03:20, 32.69s/it] 9%|▉ | 1641/17285 [14:39:18<145:12:26, 33.42s/it] 9%|▉ | 1642/17285 [14:39:52<146:07:28, 33.63s/it] 10%|▉ | 1643/17285 [14:40:29<150:32:13, 34.65s/it] 10%|▉ | 1644/17285 [14:41:02<147:52:53, 34.04s/it] 10%|▉ | 1645/17285 [14:41:31<140:50:19, 32.42s/it] 10%|▉ | 1646/17285 [14:41:57<133:21:32, 30.70s/it] 10%|▉ | 1647/17285 [14:42:23<127:08:16, 29.27s/it] 10%|▉ | 1648/17285 [14:42:50<123:17:57, 28.39s/it] 10%|▉ | 1649/17285 [14:43:26<133:02:04, 30.63s/it] 10%|▉ | 1650/17285 [14:43:54<130:44:00, 30.10s/it] {'loss': 1.7901, 'learning_rate': 0.00019892518268104788, 'epoch': 0.29} + 10%|▉ | 1650/17285 [14:43:54<130:44:00, 30.10s/it] 10%|▉ | 1651/17285 [14:44:32<140:25:24, 32.33s/it] 10%|▉ | 1652/17285 [14:45:09<146:52:09, 33.82s/it] 10%|▉ | 1653/17285 [14:45:53<160:23:29, 36.94s/it] 10%|▉ | 1654/17285 [14:46:27<156:27:38, 36.03s/it] 10%|▉ | 1655/17285 [14:47:02<153:59:35, 35.47s/it] 10%|▉ | 1656/17285 [14:47:34<149:39:15, 34.47s/it] 10%|▉ | 1657/17285 [14:48:02<142:13:42, 32.76s/it] 10%|▉ | 1658/17285 [14:48:35<142:12:26, 32.76s/it] 10%|▉ | 1659/17285 [14:49:03<136:19:04, 31.41s/it] 10%|▉ | 1660/17285 [14:49:39<141:43:42, 32.65s/it] {'loss': 1.7489, 'learning_rate': 0.00019889702542162026, 'epoch': 0.29} + 10%|▉ | 1660/17285 [14:49:39<141:43:42, 32.65s/it] 10%|▉ | 1661/17285 [14:50:15<146:14:34, 33.70s/it] 10%|▉ | 1662/17285 [14:50:49<146:43:40, 33.81s/it] 10%|▉ | 1663/17285 [14:51:27<151:22:53, 34.88s/it][2023-08-23 14:46:40,226] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 10%|▉ | 1664/17285 [14:52:03<152:44:11, 35.20s/it] 10%|▉ | 1665/17285 [14:52:28<140:22:33, 32.35s/it] 10%|▉ | 1666/17285 [14:52:56<133:50:38, 30.85s/it] 10%|▉ | 1667/17285 [14:53:37<146:57:06, 33.87s/it] 10%|▉ | 1668/17285 [14:54:07<142:45:50, 32.91s/it] 10%|▉ | 1669/17285 [14:54:34<134:33:12, 31.02s/it] 10%|▉ | 1670/17285 [14:55:05<135:07:01, 31.15s/it] {'loss': 1.8051, 'learning_rate': 0.00019887137435523912, 'epoch': 0.29} + 10%|▉ | 1670/17285 [14:55:05<135:07:01, 31.15s/it] 10%|▉ | 1671/17285 [14:55:34<131:20:07, 30.28s/it] 10%|▉ | 1672/17285 [14:56:03<130:22:36, 30.06s/it] 10%|▉ | 1673/17285 [14:56:32<129:32:50, 29.87s/it] 10%|▉ | 1674/17285 [14:57:07<135:12:35, 31.18s/it] 10%|▉ | 1675/17285 [14:57:37<133:25:24, 30.77s/it] 10%|▉ | 1676/17285 [14:58:11<138:37:32, 31.97s/it] 10%|▉ | 1677/17285 [14:58:44<139:04:35, 32.08s/it] 10%|▉ | 1678/17285 [14:59:17<140:12:10, 32.34s/it] 10%|▉ | 1679/17285 [14:59:51<142:39:28, 32.91s/it] 10%|▉ | 1680/17285 [15:00:22<140:12:12, 32.34s/it] {'loss': 1.7428, 'learning_rate': 0.00019884252934074216, 'epoch': 0.29} + 10%|▉ | 1680/17285 [15:00:22<140:12:12, 32.34s/it] 10%|▉ | 1681/17285 [15:00:49<133:10:37, 30.73s/it] 10%|▉ | 1682/17285 [15:01:22<135:55:01, 31.36s/it] 10%|▉ | 1683/17285 [15:01:55<138:58:33, 32.07s/it] 10%|▉ | 1684/17285 [15:02:21<130:39:07, 30.15s/it] 10%|▉ | 1685/17285 [15:02:53<133:39:42, 30.85s/it] 10%|▉ | 1686/17285 [15:03:24<132:36:57, 30.61s/it] 10%|▉ | 1687/17285 [15:04:00<140:16:10, 32.37s/it] 10%|▉ | 1688/17285 [15:04:36<144:18:58, 33.31s/it] 10%|▉ | 1689/17285 [15:05:10<145:44:47, 33.64s/it] 10%|▉ | 1690/17285 [15:05:35<134:56:35, 31.15s/it] {'loss': 1.7961, 'learning_rate': 0.0001988133225024225, 'epoch': 0.29} + 10%|▉ | 1690/17285 [15:05:35<134:56:35, 31.15s/it] 10%|▉ | 1691/17285 [15:06:07<135:31:15, 31.29s/it] 10%|▉ | 1692/17285 [15:06:37<134:33:42, 31.07s/it] 10%|▉ | 1693/17285 [15:07:04<129:12:07, 29.83s/it] 10%|▉ | 1694/17285 [15:07:35<130:32:10, 30.14s/it] 10%|▉ | 1695/17285 [15:08:16<144:43:31, 33.42s/it] 10%|▉ | 1696/17285 [15:08:53<149:02:20, 34.42s/it] 10%|▉ | 1697/17285 [15:09:25<145:15:03, 33.55s/it] 10%|▉ | 1698/17285 [15:09:55<141:45:19, 32.74s/it] 10%|▉ | 1699/17285 [15:10:39<155:31:27, 35.92s/it] 10%|▉ | 1700/17285 [15:11:10<149:41:47, 34.58s/it] {'loss': 1.7779, 'learning_rate': 0.00019878375394719502, 'epoch': 0.3} + 10%|▉ | 1700/17285 [15:11:10<149:41:47, 34.58s/it] 10%|▉ | 1701/17285 [15:11:40<143:18:08, 33.10s/it] 10%|▉ | 1702/17285 [15:12:10<139:16:39, 32.18s/it] 10%|▉ | 1703/17285 [15:12:49<147:57:55, 34.19s/it] 10%|▉ | 1704/17285 [15:13:15<138:03:27, 31.90s/it] 10%|▉ | 1705/17285 [15:13:47<137:15:55, 31.72s/it] 10%|▉ | 1706/17285 [15:14:19<138:32:18, 32.01s/it] 10%|▉ | 1707/17285 [15:14:51<138:19:57, 31.97s/it] 10%|▉ | 1708/17285 [15:15:25<140:11:40, 32.40s/it] 10%|▉ | 1709/17285 [15:15:53<134:46:47, 31.15s/it] 10%|▉ | 1710/17285 [15:16:25<136:18:43, 31.51s/it] {'loss': 1.8037, 'learning_rate': 0.00019875382378329857, 'epoch': 0.3} + 10%|▉ | 1710/17285 [15:16:25<136:18:43, 31.51s/it] 10%|▉ | 1711/17285 [15:17:00<140:34:21, 32.49s/it] 10%|▉ | 1712/17285 [15:17:34<142:22:45, 32.91s/it] 10%|▉ | 1713/17285 [15:18:03<136:55:08, 31.65s/it] 10%|▉ | 1714/17285 [15:18:38<141:33:46, 32.73s/it] 10%|▉ | 1715/17285 [15:19:18<150:53:07, 34.89s/it] 10%|▉ | 1716/17285 [15:19:49<145:41:33, 33.69s/it] 10%|▉ | 1717/17285 [15:20:22<145:13:27, 33.58s/it] 10%|▉ | 1718/17285 [15:20:48<135:03:39, 31.23s/it] 10%|▉ | 1719/17285 [15:21:21<137:20:07, 31.76s/it] 10%|▉ | 1720/17285 [15:21:50<133:40:49, 30.92s/it] {'loss': 1.7767, 'learning_rate': 0.0001987235321202958, 'epoch': 0.3} + 10%|▉ | 1720/17285 [15:21:50<133:40:49, 30.92s/it] 10%|▉ | 1721/17285 [15:22:20<132:51:45, 30.73s/it] 10%|▉ | 1722/17285 [15:22:55<137:51:09, 31.89s/it] 10%|▉ | 1723/17285 [15:23:22<132:40:43, 30.69s/it] 10%|▉ | 1724/17285 [15:24:03<145:07:56, 33.58s/it] 10%|▉ | 1725/17285 [15:24:32<139:30:43, 32.28s/it] 10%|▉ | 1726/17285 [15:25:01<134:56:28, 31.22s/it] 10%|▉ | 1727/17285 [15:25:33<136:44:57, 31.64s/it] 10%|▉ | 1728/17285 [15:26:12<145:33:34, 33.68s/it] 10%|█ | 1729/17285 [15:26:37<134:43:26, 31.18s/it] 10%|█ | 1730/17285 [15:27:11<137:57:10, 31.93s/it] {'loss': 1.8044, 'learning_rate': 0.00019869287906907265, 'epoch': 0.3} + 10%|█ | 1730/17285 [15:27:11<137:57:10, 31.93s/it] 10%|█ | 1731/17285 [15:27:40<134:45:47, 31.19s/it] 10%|█ | 1732/17285 [15:28:12<135:59:50, 31.48s/it] 10%|█ | 1733/17285 [15:28:43<134:35:47, 31.16s/it] 10%|█ | 1734/17285 [15:29:17<138:40:05, 32.10s/it] 10%|█ | 1735/17285 [15:29:53<143:31:38, 33.23s/it] 10%|█ | 1736/17285 [15:30:24<140:55:35, 32.63s/it] 10%|█ | 1737/17285 [15:30:49<131:00:55, 30.34s/it] 10%|█ | 1738/17285 [15:31:22<133:28:52, 30.91s/it] 10%|█ | 1739/17285 [15:31:50<130:17:39, 30.17s/it] 10%|█ | 1740/17285 [15:32:26<138:07:28, 31.99s/it] {'loss': 1.7517, 'learning_rate': 0.0001986618647418379, 'epoch': 0.3} + 10%|█ | 1740/17285 [15:32:26<138:07:28, 31.99s/it] 10%|█ | 1741/17285 [15:32:58<137:18:26, 31.80s/it] 10%|█ | 1742/17285 [15:33:31<139:17:21, 32.26s/it] 10%|█ | 1743/17285 [15:34:06<142:51:11, 33.09s/it] 10%|█ | 1744/17285 [15:34:35<137:35:10, 31.87s/it] 10%|█ | 1745/17285 [15:35:08<139:31:56, 32.32s/it] 10%|█ | 1746/17285 [15:35:46<146:12:02, 33.87s/it] 10%|█ | 1747/17285 [15:36:17<142:41:22, 33.06s/it] 10%|█ | 1748/17285 [15:36:52<144:46:21, 33.54s/it] 10%|█ | 1749/17285 [15:37:18<135:01:29, 31.29s/it] 10%|█ | 1750/17285 [15:37:49<135:17:16, 31.35s/it] {'loss': 1.8253, 'learning_rate': 0.0001986304892521229, 'epoch': 0.3} + 10%|█ | 1750/17285 [15:37:49<135:17:16, 31.35s/it] 10%|█ | 1751/17285 [15:38:22<137:44:18, 31.92s/it] 10%|█ | 1752/17285 [15:39:00<144:36:32, 33.52s/it] 10%|█ | 1753/17285 [15:39:32<143:42:53, 33.31s/it] 10%|█ | 1754/17285 [15:40:06<144:15:45, 33.44s/it] 10%|█ | 1755/17285 [15:40:34<137:23:11, 31.85s/it] 10%|█ | 1756/17285 [15:41:14<147:43:28, 34.25s/it] 10%|█ | 1757/17285 [15:41:45<143:50:40, 33.35s/it] 10%|█ | 1758/17285 [15:42:24<150:15:02, 34.84s/it] 10%|█ | 1759/17285 [15:42:59<150:52:24, 34.98s/it] 10%|█ | 1760/17285 [15:43:41<160:03:37, 37.12s/it] {'loss': 1.7588, 'learning_rate': 0.00019859875271478102, 'epoch': 0.31} + 10%|█ | 1760/17285 [15:43:41<160:03:37, 37.12s/it] 10%|█ | 1761/17285 [15:44:22<165:28:47, 38.37s/it] 10%|█ | 1762/17285 [15:45:06<171:41:39, 39.82s/it] 10%|█ | 1763/17285 [15:45:42<166:45:33, 38.68s/it] 10%|█ | 1764/17285 [15:46:24<171:48:58, 39.85s/it] 10%|█ | 1765/17285 [15:47:03<169:56:40, 39.42s/it] 10%|█ | 1766/17285 [15:47:48<177:16:24, 41.12s/it] 10%|█ | 1767/17285 [15:48:38<188:26:14, 43.72s/it] 10%|█ | 1768/17285 [15:49:17<182:39:03, 42.38s/it] 10%|█ | 1769/17285 [15:49:53<174:53:57, 40.58s/it] 10%|█ | 1770/17285 [15:50:43<186:55:50, 43.37s/it] {'loss': 1.7948, 'learning_rate': 0.00019856665524598733, 'epoch': 0.31} + 10%|█ | 1770/17285 [15:50:43<186:55:50, 43.37s/it] 10%|█ | 1771/17285 [15:51:22<181:30:49, 42.12s/it] 10%|█ | 1772/17285 [15:52:02<177:58:07, 41.30s/it] 10%|█ | 1773/17285 [15:52:37<170:15:21, 39.51s/it] 10%|█ | 1774/17285 [15:53:13<165:25:10, 38.39s/it] 10%|█ | 1775/17285 [15:53:52<165:58:38, 38.52s/it] 10%|█ | 1776/17285 [15:54:27<161:56:59, 37.59s/it] 10%|█ | 1777/17285 [15:55:05<161:51:30, 37.57s/it] 10%|█ | 1778/17285 [15:55:43<163:09:28, 37.88s/it] 10%|█ | 1779/17285 [15:56:20<162:01:12, 37.62s/it] 10%|█ | 1780/17285 [15:56:59<163:53:48, 38.05s/it] {'loss': 1.8023, 'learning_rate': 0.00019853419696323806, 'epoch': 0.31} + 10%|█ | 1780/17285 [15:56:59<163:53:48, 38.05s/it] 10%|█ | 1781/17285 [15:57:42<169:42:21, 39.41s/it] 10%|█ | 1782/17285 [15:58:22<170:49:18, 39.67s/it][2023-08-23 15:53:41,069] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 10%|█ | 1783/17285 [15:59:03<172:58:06, 40.17s/it] 10%|█ | 1784/17285 [15:59:46<176:17:51, 40.94s/it] 10%|█ | 1785/17285 [16:00:27<176:23:53, 40.97s/it] 10%|█ | 1786/17285 [16:01:08<175:45:38, 40.82s/it] 10%|█ | 1787/17285 [16:01:50<177:48:30, 41.30s/it] 10%|█ | 1788/17285 [16:02:31<176:48:39, 41.07s/it] 10%|█ | 1789/17285 [16:03:10<174:26:30, 40.53s/it] 10%|█ | 1790/17285 [16:03:49<172:54:38, 40.17s/it] {'loss': 1.7663, 'learning_rate': 0.00019850467611100676, 'epoch': 0.31} + 10%|█ | 1790/17285 [16:03:49<172:54:38, 40.17s/it] 10%|█ | 1791/17285 [16:04:30<173:35:59, 40.34s/it] 10%|█ | 1792/17285 [16:05:18<184:00:51, 42.76s/it] 10%|█ | 1793/17285 [16:06:02<185:26:08, 43.09s/it] 10%|█ | 1794/17285 [16:06:41<180:03:29, 41.84s/it] 10%|█ | 1795/17285 [16:07:25<182:26:41, 42.40s/it] 10%|█ | 1796/17285 [16:08:04<178:07:28, 41.40s/it] 10%|█ | 1797/17285 [16:08:40<171:10:53, 39.79s/it] 10%|█ | 1798/17285 [16:09:16<166:28:22, 38.70s/it] 10%|█ | 1799/17285 [16:09:59<172:25:11, 40.08s/it] 10%|█ | 1800/17285 [16:10:39<172:11:20, 40.03s/it] {'loss': 1.7566, 'learning_rate': 0.00019847153261017426, 'epoch': 0.31} + 10%|█ | 1800/17285 [16:10:39<172:11:20, 40.03s/it] 10%|█ | 1801/17285 [16:11:18<170:43:20, 39.69s/it] 10%|█ | 1802/17285 [16:12:05<179:36:54, 41.76s/it] 10%|█ | 1803/17285 [16:12:42<174:01:04, 40.46s/it] 10%|█ | 1804/17285 [16:13:19<169:09:49, 39.34s/it] 10%|█ | 1805/17285 [16:13:55<164:38:54, 38.29s/it] 10%|█ | 1806/17285 [16:14:37<170:08:31, 39.57s/it] 10%|█ | 1807/17285 [16:15:16<168:35:54, 39.21s/it] 10%|█ | 1808/17285 [16:15:53<166:34:28, 38.75s/it] 10%|█ | 1809/17285 [16:16:35<169:51:00, 39.51s/it] 10%|█ | 1810/17285 [16:17:11<165:28:45, 38.50s/it] {'loss': 1.7882, 'learning_rate': 0.00019843802864359298, 'epoch': 0.31} + 10%|█ | 1810/17285 [16:17:11<165:28:45, 38.50s/it] 10%|█ | 1811/17285 [16:17:48<163:20:42, 38.00s/it] 10%|█ | 1812/17285 [16:18:26<164:08:42, 38.19s/it] 10%|█ | 1813/17285 [16:19:06<166:29:49, 38.74s/it] 10%|█ | 1814/17285 [16:19:47<169:02:35, 39.34s/it] 11%|█ | 1815/17285 [16:20:27<170:02:28, 39.57s/it] 11%|█ | 1816/17285 [16:21:07<170:23:03, 39.65s/it] 11%|█ | 1817/17285 [16:21:44<167:05:38, 38.89s/it] 11%|█ | 1818/17285 [16:22:23<167:09:48, 38.91s/it] 11%|█ | 1819/17285 [16:22:59<163:54:09, 38.15s/it] 11%|█ | 1820/17285 [16:23:41<167:43:11, 39.04s/it] {'loss': 1.782, 'learning_rate': 0.00019840416433390782, 'epoch': 0.32} + 11%|█ | 1820/17285 [16:23:41<167:43:11, 39.04s/it] 11%|█ | 1821/17285 [16:24:17<163:49:25, 38.14s/it] 11%|█ | 1822/17285 [16:24:59<168:54:28, 39.32s/it] 11%|█ | 1823/17285 [16:25:39<170:43:16, 39.75s/it] 11%|█ | 1824/17285 [16:26:18<169:02:51, 39.36s/it] 11%|█ | 1825/17285 [16:26:55<166:24:32, 38.75s/it] 11%|█ | 1826/17285 [16:27:33<165:14:41, 38.48s/it] 11%|█ | 1827/17285 [16:28:10<163:07:26, 37.99s/it] 11%|█ | 1828/17285 [16:28:50<166:13:57, 38.72s/it] 11%|█ | 1829/17285 [16:29:34<172:01:07, 40.07s/it] 11%|█ | 1830/17285 [16:30:13<171:02:55, 39.84s/it] {'loss': 1.7849, 'learning_rate': 0.00019836993980508268, 'epoch': 0.32} + 11%|█ | 1830/17285 [16:30:13<171:02:55, 39.84s/it] 11%|█ | 1831/17285 [16:30:50<166:55:28, 38.88s/it] 11%|█ | 1832/17285 [16:31:30<169:33:48, 39.50s/it] 11%|█ | 1833/17285 [16:32:07<165:51:52, 38.64s/it] 11%|█ | 1834/17285 [16:32:43<162:33:07, 37.87s/it] 11%|█ | 1835/17285 [16:33:19<160:23:24, 37.37s/it] 11%|█ | 1836/17285 [16:34:11<179:14:39, 41.77s/it] 11%|█ | 1837/17285 [16:34:47<171:46:22, 40.03s/it] 11%|█ | 1838/17285 [16:35:26<170:10:48, 39.66s/it] 11%|█ | 1839/17285 [16:36:03<166:07:11, 38.72s/it] 11%|█ | 1840/17285 [16:36:41<165:51:51, 38.66s/it] {'loss': 1.7793, 'learning_rate': 0.00019833535518240031, 'epoch': 0.32} + 11%|█ | 1840/17285 [16:36:41<165:51:51, 38.66s/it] 11%|█ | 1841/17285 [16:37:20<165:48:38, 38.65s/it] 11%|█ | 1842/17285 [16:38:03<171:56:07, 40.08s/it] 11%|█ | 1843/17285 [16:38:42<169:57:03, 39.62s/it] 11%|█ | 1844/17285 [16:39:20<168:09:12, 39.20s/it] 11%|█ | 1845/17285 [16:39:59<167:24:51, 39.03s/it] 11%|█ | 1846/17285 [16:40:41<171:01:17, 39.88s/it] 11%|█ | 1847/17285 [16:41:20<170:10:04, 39.68s/it] 11%|█ | 1848/17285 [16:41:56<165:37:43, 38.63s/it] 11%|█ | 1849/17285 [16:42:36<167:48:25, 39.14s/it] 11%|█ | 1850/17285 [16:43:23<177:57:52, 41.51s/it] {'loss': 1.7761, 'learning_rate': 0.0001983004105924614, 'epoch': 0.32} + 11%|█ | 1850/17285 [16:43:23<177:57:52, 41.51s/it] 11%|█ | 1851/17285 [16:43:59<170:51:01, 39.85s/it] 11%|█ | 1852/17285 [16:44:38<168:51:17, 39.39s/it] 11%|█ | 1853/17285 [16:45:15<166:38:23, 38.87s/it] 11%|█ | 1854/17285 [16:45:43<152:55:25, 35.68s/it] 11%|█ | 1855/17285 [16:46:16<149:15:46, 34.82s/it] 11%|█ | 1856/17285 [16:46:47<143:30:14, 33.48s/it] 11%|█ | 1857/17285 [16:47:17<139:45:41, 32.61s/it] 11%|█ | 1858/17285 [16:47:57<149:02:35, 34.78s/it][2023-08-23 16:43:03,690] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 11%|█ | 1859/17285 [16:48:26<141:30:06, 33.02s/it] 11%|█ | 1860/17285 [16:49:00<142:47:19, 33.33s/it] {'loss': 1.7676, 'learning_rate': 0.00019826865279520944, 'epoch': 0.32} + 11%|█ | 1860/17285 [16:49:00<142:47:19, 33.33s/it] 11%|█ | 1861/17285 [16:49:29<137:27:29, 32.08s/it] 11%|█ | 1862/17285 [16:50:01<136:34:25, 31.88s/it] 11%|█ | 1863/17285 [16:50:29<132:11:08, 30.86s/it] 11%|█ | 1864/17285 [16:51:06<139:38:12, 32.60s/it] 11%|█ | 1865/17285 [16:51:37<138:13:23, 32.27s/it] 11%|█ | 1866/17285 [16:52:17<147:10:23, 34.36s/it] 11%|█ | 1867/17285 [16:52:40<133:25:15, 31.15s/it] 11%|█ | 1868/17285 [16:53:11<132:39:46, 30.98s/it] 11%|█ | 1869/17285 [16:53:42<132:36:19, 30.97s/it] 11%|█ | 1870/17285 [16:54:14<134:49:08, 31.49s/it] {'loss': 1.7274, 'learning_rate': 0.0001982330246209872, 'epoch': 0.32} + 11%|█ | 1870/17285 [16:54:14<134:49:08, 31.49s/it] 11%|█ | 1871/17285 [16:54:39<126:09:09, 29.46s/it] 11%|█ | 1872/17285 [16:55:16<136:09:03, 31.80s/it] 11%|█ | 1873/17285 [16:55:49<137:50:14, 32.20s/it] 11%|█ | 1874/17285 [16:56:20<135:19:28, 31.61s/it] 11%|█ | 1875/17285 [16:56:46<127:47:50, 29.86s/it] 11%|█ | 1876/17285 [16:57:16<129:01:34, 30.14s/it] 11%|█ | 1877/17285 [16:57:46<129:02:37, 30.15s/it] 11%|█ | 1878/17285 [16:58:11<121:57:39, 28.50s/it] 11%|█ | 1879/17285 [16:58:41<123:11:22, 28.79s/it] 11%|█ | 1880/17285 [16:59:14<129:21:17, 30.23s/it] {'loss': 1.7513, 'learning_rate': 0.00019819703685410058, 'epoch': 0.33} + 11%|█ | 1880/17285 [16:59:14<129:21:17, 30.23s/it] 11%|█ | 1881/17285 [16:59:44<128:24:22, 30.01s/it] 11%|█ | 1882/17285 [17:00:21<138:13:58, 32.31s/it] 11%|█ | 1883/17285 [17:00:51<134:39:55, 31.48s/it] 11%|█ | 1884/17285 [17:01:26<138:42:27, 32.42s/it] 11%|█ | 1885/17285 [17:01:55<134:27:18, 31.43s/it] 11%|█ | 1886/17285 [17:02:24<131:34:23, 30.76s/it] 11%|█ | 1887/17285 [17:03:00<138:49:11, 32.46s/it] 11%|█ | 1888/17285 [17:03:30<135:21:15, 31.65s/it] 11%|█ | 1889/17285 [17:04:00<132:59:51, 31.10s/it] 11%|█ | 1890/17285 [17:04:31<133:04:38, 31.12s/it] {'loss': 1.7478, 'learning_rate': 0.0001981606896262867, 'epoch': 0.33} + 11%|█ | 1890/17285 [17:04:31<133:04:38, 31.12s/it] 11%|█ | 1891/17285 [17:04:57<126:22:37, 29.55s/it] 11%|█ | 1892/17285 [17:05:26<125:31:51, 29.36s/it] 11%|█ | 1893/17285 [17:06:01<133:13:26, 31.16s/it] 11%|█ | 1894/17285 [17:06:34<135:28:53, 31.69s/it] 11%|█ | 1895/17285 [17:07:01<129:02:51, 30.19s/it] 11%|█ | 1896/17285 [17:07:37<136:56:37, 32.04s/it] 11%|█ | 1897/17285 [17:08:06<132:21:22, 30.96s/it] 11%|█ | 1898/17285 [17:08:45<143:43:14, 33.63s/it] 11%|█ | 1899/17285 [17:09:18<142:57:45, 33.45s/it] 11%|█ | 1900/17285 [17:09:56<148:44:49, 34.81s/it] {'loss': 1.781, 'learning_rate': 0.00019812398307059856, 'epoch': 0.33} + 11%|█ | 1900/17285 [17:09:56<148:44:49, 34.81s/it] 11%|█ | 1901/17285 [17:10:21<136:11:59, 31.87s/it] 11%|█ | 1902/17285 [17:10:51<133:14:36, 31.18s/it] 11%|█ | 1903/17285 [17:11:21<131:18:12, 30.73s/it] 11%|█ | 1904/17285 [17:11:51<131:03:45, 30.68s/it] 11%|█ | 1905/17285 [17:12:23<132:11:53, 30.94s/it] 11%|█ | 1906/17285 [17:12:52<129:59:55, 30.43s/it] 11%|█ | 1907/17285 [17:13:27<135:25:53, 31.70s/it] 11%|█ | 1908/17285 [17:13:57<133:17:43, 31.21s/it] 11%|█ | 1909/17285 [17:14:30<136:24:48, 31.94s/it] 11%|█ | 1910/17285 [17:15:12<148:51:04, 34.85s/it] {'loss': 1.7504, 'learning_rate': 0.00019808691732140448, 'epoch': 0.33} + 11%|█ | 1910/17285 [17:15:12<148:51:04, 34.85s/it] 11%|█ | 1911/17285 [17:16:01<166:56:42, 39.09s/it] 11%|█ | 1912/17285 [17:16:27<150:01:24, 35.13s/it] 11%|█ | 1913/17285 [17:17:01<148:28:38, 34.77s/it] 11%|█ | 1914/17285 [17:17:32<143:36:11, 33.63s/it] 11%|█ | 1915/17285 [17:18:09<147:31:58, 34.56s/it] 11%|█ | 1916/17285 [17:18:42<146:39:54, 34.35s/it] 11%|█ | 1917/17285 [17:19:14<143:20:02, 33.58s/it] 11%|█ | 1918/17285 [17:20:04<163:43:30, 38.36s/it] 11%|█ | 1919/17285 [17:20:38<158:44:22, 37.19s/it] 11%|█ | 1920/17285 [17:21:25<171:28:23, 40.18s/it] {'loss': 1.7552, 'learning_rate': 0.00019804949251438767, 'epoch': 0.33} + 11%|█ | 1920/17285 [17:21:25<171:28:23, 40.18s/it] 11%|█ | 1921/17285 [17:22:00<164:18:45, 38.50s/it] 11%|█ | 1922/17285 [17:22:24<145:37:27, 34.12s/it] 11%|█ | 1923/17285 [17:22:50<135:49:55, 31.83s/it] 11%|█ | 1924/17285 [17:23:17<129:27:46, 30.34s/it] 11%|█ | 1925/17285 [17:23:43<123:40:57, 28.99s/it] 11%|█ | 1926/17285 [17:24:26<141:30:44, 33.17s/it] 11%|█ | 1927/17285 [17:25:02<145:41:10, 34.15s/it] 11%|█ | 1928/17285 [17:25:29<136:11:06, 31.92s/it] 11%|█ | 1929/17285 [17:26:08<145:15:51, 34.06s/it] 11%|█ | 1930/17285 [17:26:39<141:38:49, 33.21s/it] {'loss': 1.8154, 'learning_rate': 0.0001980117087865457, 'epoch': 0.33} + 11%|█ | 1930/17285 [17:26:39<141:38:49, 33.21s/it] 11%|█ | 1931/17285 [17:27:11<139:35:22, 32.73s/it] 11%|█ | 1932/17285 [17:27:41<136:16:25, 31.95s/it] 11%|█ | 1933/17285 [17:28:07<128:38:52, 30.17s/it] 11%|█ | 1934/17285 [17:28:46<139:20:25, 32.68s/it] 11%|█ | 1935/17285 [17:29:14<134:09:04, 31.46s/it] 11%|█ | 1936/17285 [17:29:41<127:42:52, 29.95s/it] 11%|█ | 1937/17285 [17:30:20<139:37:51, 32.75s/it] 11%|█ | 1938/17285 [17:30:44<128:43:51, 30.20s/it] 11%|█ | 1939/17285 [17:31:15<129:48:01, 30.45s/it] 11%|█ | 1940/17285 [17:31:47<131:36:00, 30.87s/it] {'loss': 1.7762, 'learning_rate': 0.00019797356627619, 'epoch': 0.34} + 11%|█ | 1940/17285 [17:31:47<131:36:00, 30.87s/it] 11%|█ | 1941/17285 [17:32:21<135:12:46, 31.72s/it] 11%|█ | 1942/17285 [17:32:46<126:36:52, 29.71s/it] 11%|█ | 1943/17285 [17:33:12<122:26:07, 28.73s/it] 11%|█ | 1944/17285 [17:33:38<118:23:25, 27.78s/it] 11%|█▏ | 1945/17285 [17:34:16<130:59:25, 30.74s/it] 11%|█▏ | 1946/17285 [17:34:48<133:22:52, 31.30s/it] 11%|█▏ | 1947/17285 [17:35:17<129:43:25, 30.45s/it] 11%|█▏ | 1948/17285 [17:35:47<129:21:51, 30.37s/it] 11%|█▏ | 1949/17285 [17:36:14<125:20:13, 29.42s/it] 11%|█▏ | 1950/17285 [17:36:43<124:53:17, 29.32s/it] {'loss': 1.7263, 'learning_rate': 0.00019793506512294542, 'epoch': 0.34} + 11%|█▏ | 1950/17285 [17:36:43<124:53:17, 29.32s/it] 11%|█▏ | 1951/17285 [17:37:13<125:07:21, 29.38s/it] 11%|█▏ | 1952/17285 [17:37:55<141:19:21, 33.18s/it] 11%|█▏ | 1953/17285 [17:38:24<136:06:55, 31.96s/it] 11%|█▏ | 1954/17285 [17:38:57<138:24:45, 32.50s/it] 11%|█▏ | 1955/17285 [17:39:36<145:29:22, 34.17s/it] 11%|█▏ | 1956/17285 [17:40:10<145:28:28, 34.16s/it] 11%|█▏ | 1957/17285 [17:40:47<149:19:43, 35.07s/it] 11%|█▏ | 1958/17285 [17:41:17<143:22:17, 33.68s/it] 11%|█▏ | 1959/17285 [17:41:45<136:18:55, 32.02s/it] 11%|█▏ | 1960/17285 [17:42:11<127:35:17, 29.97s/it] {'loss': 1.7446, 'learning_rate': 0.00019789620546774956, 'epoch': 0.34} + 11%|█▏ | 1960/17285 [17:42:11<127:35:17, 29.97s/it] 11%|█▏ | 1961/17285 [17:42:42<128:52:23, 30.28s/it] 11%|█▏ | 1962/17285 [17:43:18<137:07:13, 32.22s/it] 11%|█▏ | 1963/17285 [17:43:51<137:15:57, 32.25s/it] 11%|█▏ | 1964/17285 [17:44:20<132:59:45, 31.25s/it] 11%|█▏ | 1965/17285 [17:44:51<133:02:13, 31.26s/it] 11%|█▏ | 1966/17285 [17:45:25<136:57:12, 32.18s/it] 11%|█▏ | 1967/17285 [17:45:52<130:37:49, 30.70s/it][2023-08-23 17:41:06,435] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 11%|█▏ | 1968/17285 [17:46:29<137:43:10, 32.37s/it] 11%|█▏ | 1969/17285 [17:46:55<129:41:58, 30.49s/it][2023-08-23 17:41:57,143] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 11%|█▏ | 1970/17285 [17:47:19<122:12:05, 28.73s/it] {'loss': 1.694, 'learning_rate': 0.00019786485971773587, 'epoch': 0.34} + 11%|█▏ | 1970/17285 [17:47:19<122:12:05, 28.73s/it] 11%|█▏ | 1971/17285 [17:47:54<129:27:47, 30.43s/it] 11%|█▏ | 1972/17285 [17:48:36<144:18:34, 33.93s/it] 11%|█▏ | 1973/17285 [17:49:16<152:09:36, 35.77s/it] 11%|█▏ | 1974/17285 [17:49:53<153:03:40, 35.99s/it] 11%|█▏ | 1975/17285 [17:50:23<146:14:08, 34.39s/it] 11%|█▏ | 1976/17285 [17:50:51<138:17:50, 32.52s/it] 11%|█▏ | 1977/17285 [17:51:34<151:20:24, 35.59s/it] 11%|█▏ | 1978/17285 [17:52:08<148:38:53, 34.96s/it] 11%|█▏ | 1979/17285 [17:52:36<139:52:05, 32.90s/it] 11%|█▏ | 1980/17285 [17:53:10<141:14:12, 33.22s/it] {'loss': 1.7198, 'learning_rate': 0.0001978253551183793, 'epoch': 0.34} + 11%|█▏ | 1980/17285 [17:53:10<141:14:12, 33.22s/it] 11%|█▏ | 1981/17285 [17:53:53<154:35:33, 36.37s/it] 11%|█▏ | 1982/17285 [17:54:18<139:46:05, 32.88s/it] 11%|█▏ | 1983/17285 [17:54:45<132:25:35, 31.16s/it] 11%|█▏ | 1984/17285 [17:55:20<137:10:25, 32.27s/it] 11%|█▏ | 1985/17285 [17:55:50<133:53:38, 31.50s/it] 11%|█▏ | 1986/17285 [17:56:20<132:41:50, 31.22s/it] 11%|█▏ | 1987/17285 [17:56:52<133:11:57, 31.35s/it] 12%|█▏ | 1988/17285 [17:57:22<131:48:42, 31.02s/it] 12%|█▏ | 1989/17285 [17:57:55<133:47:54, 31.49s/it] 12%|█▏ | 1990/17285 [17:58:30<138:10:59, 32.52s/it] {'loss': 1.7423, 'learning_rate': 0.00019778549241867687, 'epoch': 0.35} + 12%|█▏ | 1990/17285 [17:58:30<138:10:59, 32.52s/it] 12%|█▏ | 1991/17285 [17:58:55<128:48:56, 30.32s/it] 12%|█▏ | 1992/17285 [17:59:26<129:20:31, 30.45s/it] 12%|█▏ | 1993/17285 [18:00:07<142:53:55, 33.64s/it] 12%|█▏ | 1994/17285 [18:00:39<141:06:13, 33.22s/it] 12%|█▏ | 1995/17285 [18:01:04<130:57:55, 30.84s/it] 12%|█▏ | 1996/17285 [18:01:29<123:30:15, 29.08s/it] 12%|█▏ | 1997/17285 [18:02:03<129:04:35, 30.39s/it] 12%|█▏ | 1998/17285 [18:02:33<129:15:01, 30.44s/it] 12%|█▏ | 1999/17285 [18:03:08<134:25:02, 31.66s/it] 12%|█▏ | 2000/17285 [18:03:37<131:20:15, 30.93s/it] {'loss': 1.7434, 'learning_rate': 0.0001977452717645503, 'epoch': 0.35} + 12%|█▏ | 2000/17285 [18:03:37<131:20:15, 30.93s/it][INFO|trainer.py:3081] 2023-08-23 17:58:14,777 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-23 17:58:14,778 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-23 17:58:14,778 >> Batch size = 2 + + 0%| | 0/33 [00:00> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-2000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-23 17:59:39,156 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-2000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-2000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-2000 + 12%|█▏ | 2001/17285 [18:05:39<247:18:22, 58.25s/it] 12%|█▏ | 2002/17285 [18:06:18<223:19:18, 52.60s/it] 12%|█▏ | 2003/17285 [18:06:52<199:11:39, 46.92s/it] 12%|█▏ | 2004/17285 [18:07:28<185:00:11, 43.58s/it] 12%|█▏ | 2005/17285 [18:07:54<163:15:51, 38.47s/it] 12%|█▏ | 2006/17285 [18:08:20<146:47:32, 34.59s/it] 12%|█▏ | 2007/17285 [18:09:00<153:11:56, 36.10s/it] 12%|█▏ | 2008/17285 [18:09:27<141:27:52, 33.34s/it] 12%|█▏ | 2009/17285 [18:09:59<139:45:30, 32.94s/it] 12%|█▏ | 2010/17285 [18:10:35<144:19:55, 34.02s/it] {'loss': 1.7791, 'learning_rate': 0.00019770469330323174, 'epoch': 0.35} + 12%|█▏ | 2010/17285 [18:10:35<144:19:55, 34.02s/it] 12%|█▏ | 2011/17285 [18:11:05<139:28:32, 32.87s/it] 12%|█▏ | 2012/17285 [18:11:44<147:06:28, 34.67s/it] 12%|█▏ | 2013/17285 [18:12:11<137:31:25, 32.42s/it] 12%|█▏ | 2014/17285 [18:12:54<150:34:35, 35.50s/it] 12%|█▏ | 2015/17285 [18:13:29<150:23:58, 35.46s/it] 12%|█▏ | 2016/17285 [18:14:04<149:41:25, 35.29s/it] 12%|█▏ | 2017/17285 [18:14:34<142:41:01, 33.64s/it] 12%|█▏ | 2018/17285 [18:15:02<134:50:15, 31.80s/it] 12%|█▏ | 2019/17285 [18:15:35<136:59:33, 32.31s/it] 12%|█▏ | 2020/17285 [18:16:08<137:46:09, 32.49s/it] {'loss': 1.7459, 'learning_rate': 0.00019766375718326297, 'epoch': 0.35} + 12%|█▏ | 2020/17285 [18:16:08<137:46:09, 32.49s/it] 12%|█▏ | 2021/17285 [18:16:37<133:23:03, 31.46s/it] 12%|█▏ | 2022/17285 [18:17:09<133:25:32, 31.47s/it] 12%|█▏ | 2023/17285 [18:17:36<128:42:26, 30.36s/it] 12%|█▏ | 2024/17285 [18:18:07<129:00:47, 30.43s/it] 12%|█▏ | 2025/17285 [18:18:39<130:46:54, 30.85s/it] 12%|█▏ | 2026/17285 [18:19:08<129:23:34, 30.53s/it] 12%|█▏ | 2027/17285 [18:19:47<139:53:49, 33.01s/it] 12%|█▏ | 2028/17285 [18:20:23<143:20:37, 33.82s/it] 12%|█▏ | 2029/17285 [18:20:50<134:43:22, 31.79s/it] 12%|█▏ | 2030/17285 [18:21:22<135:21:55, 31.94s/it] {'loss': 1.7342, 'learning_rate': 0.00019762246355449516, 'epoch': 0.35} + 12%|█▏ | 2030/17285 [18:21:22<135:21:55, 31.94s/it] 12%|█▏ | 2031/17285 [18:22:00<142:10:32, 33.55s/it] 12%|█▏ | 2032/17285 [18:22:39<149:42:53, 35.34s/it] 12%|█▏ | 2033/17285 [18:23:19<155:14:05, 36.64s/it] 12%|█▏ | 2034/17285 [18:23:49<147:33:31, 34.83s/it] 12%|█▏ | 2035/17285 [18:24:19<140:37:08, 33.20s/it] 12%|█▏ | 2036/17285 [18:24:45<132:13:51, 31.22s/it] 12%|█▏ | 2037/17285 [18:25:20<136:35:20, 32.25s/it] 12%|█▏ | 2038/17285 [18:25:46<128:36:56, 30.37s/it] 12%|█▏ | 2039/17285 [18:26:20<132:36:56, 31.31s/it] 12%|█▏ | 2040/17285 [18:26:54<136:25:53, 32.22s/it] {'loss': 1.7564, 'learning_rate': 0.00019758081256808816, 'epoch': 0.35} + 12%|█▏ | 2040/17285 [18:26:54<136:25:53, 32.22s/it] 12%|█▏ | 2041/17285 [18:27:19<127:56:00, 30.21s/it] 12%|█▏ | 2042/17285 [18:27:52<131:14:06, 30.99s/it] 12%|█▏ | 2043/17285 [18:28:29<138:30:59, 32.72s/it] 12%|█▏ | 2044/17285 [18:28:55<129:26:19, 30.57s/it] 12%|█▏ | 2045/17285 [18:29:23<126:32:12, 29.89s/it] 12%|█▏ | 2046/17285 [18:29:54<127:32:42, 30.13s/it] 12%|█▏ | 2047/17285 [18:30:37<145:01:21, 34.26s/it] 12%|█▏ | 2048/17285 [18:31:11<144:43:35, 34.19s/it] 12%|█▏ | 2049/17285 [18:31:41<138:08:49, 32.64s/it] 12%|█▏ | 2050/17285 [18:32:13<137:20:51, 32.45s/it] {'loss': 1.7394, 'learning_rate': 0.00019753880437650985, 'epoch': 0.36} + 12%|█▏ | 2050/17285 [18:32:13<137:20:51, 32.45s/it] 12%|█▏ | 2051/17285 [18:32:46<139:06:21, 32.87s/it] 12%|█▏ | 2052/17285 [18:33:19<139:15:22, 32.91s/it] 12%|█▏ | 2053/17285 [18:33:45<130:19:08, 30.80s/it] 12%|█▏ | 2054/17285 [18:34:15<129:35:43, 30.63s/it] 12%|█▏ | 2055/17285 [18:34:51<135:20:52, 31.99s/it] 12%|█▏ | 2056/17285 [18:35:23<135:47:02, 32.10s/it] 12%|█▏ | 2057/17285 [18:35:55<135:28:16, 32.03s/it] 12%|█▏ | 2058/17285 [18:36:27<135:55:53, 32.14s/it] 12%|█▏ | 2059/17285 [18:37:02<139:25:17, 32.96s/it] 12%|█▏ | 2060/17285 [18:37:42<148:09:26, 35.03s/it] {'loss': 1.7663, 'learning_rate': 0.00019749643913353582, 'epoch': 0.36} + 12%|█▏ | 2060/17285 [18:37:42<148:09:26, 35.03s/it] 12%|█▏ | 2061/17285 [18:38:15<145:54:57, 34.50s/it] 12%|█▏ | 2062/17285 [18:38:55<152:46:39, 36.13s/it] 12%|█▏ | 2063/17285 [18:39:29<149:46:29, 35.42s/it] 12%|█▏ | 2064/17285 [18:40:01<144:51:53, 34.26s/it] 12%|█▏ | 2065/17285 [18:40:25<132:28:25, 31.33s/it] 12%|█▏ | 2066/17285 [18:40:57<132:44:35, 31.40s/it] 12%|█▏ | 2067/17285 [18:41:31<136:08:01, 32.20s/it] 12%|█▏ | 2068/17285 [18:41:58<129:41:13, 30.68s/it] 12%|█▏ | 2069/17285 [18:42:34<137:05:14, 32.43s/it] 12%|█▏ | 2070/17285 [18:43:14<146:07:45, 34.58s/it] {'loss': 1.7222, 'learning_rate': 0.00019745371699424864, 'epoch': 0.36} + 12%|█▏ | 2070/17285 [18:43:14<146:07:45, 34.58s/it] 12%|█▏ | 2071/17285 [18:43:46<143:27:07, 33.94s/it] 12%|█▏ | 2072/17285 [18:44:18<140:03:07, 33.14s/it] 12%|█▏ | 2073/17285 [18:44:43<129:45:51, 30.71s/it] 12%|█▏ | 2074/17285 [18:45:14<130:48:14, 30.96s/it] 12%|█▏ | 2075/17285 [18:45:55<143:19:39, 33.92s/it] 12%|█▏ | 2076/17285 [18:46:26<139:10:07, 32.94s/it] 12%|█▏ | 2077/17285 [18:46:58<137:57:56, 32.66s/it] 12%|█▏ | 2078/17285 [18:47:26<132:12:15, 31.30s/it] 12%|█▏ | 2079/17285 [18:47:56<130:55:01, 30.99s/it] 12%|█▏ | 2080/17285 [18:48:34<140:13:36, 33.20s/it] {'loss': 1.7046, 'learning_rate': 0.00019741063811503734, 'epoch': 0.36} + 12%|█▏ | 2080/17285 [18:48:34<140:13:36, 33.20s/it] 12%|█▏ | 2081/17285 [18:49:07<138:53:43, 32.89s/it] 12%|█▏ | 2082/17285 [18:49:36<134:20:03, 31.81s/it][2023-08-23 18:44:42,715] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 12%|█▏ | 2083/17285 [18:50:05<130:55:45, 31.01s/it] 12%|█▏ | 2084/17285 [18:50:35<130:12:49, 30.84s/it][2023-08-23 18:45:42,091] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 12%|█▏ | 2085/17285 [18:51:04<127:47:09, 30.27s/it] 12%|█▏ | 2086/17285 [18:51:38<132:28:28, 31.38s/it] 12%|█▏ | 2087/17285 [18:52:05<126:28:53, 29.96s/it] 12%|█▏ | 2088/17285 [18:52:34<124:37:42, 29.52s/it] 12%|█▏ | 2089/17285 [18:53:16<140:42:59, 33.34s/it] 12%|█▏ | 2090/17285 [18:53:54<146:36:33, 34.73s/it] {'loss': 1.6904, 'learning_rate': 0.0001973759182648501, 'epoch': 0.36} + 12%|█▏ | 2090/17285 [18:53:54<146:36:33, 34.73s/it] 12%|█▏ | 2091/17285 [18:54:30<147:55:10, 35.05s/it] 12%|█▏ | 2092/17285 [18:55:04<147:32:02, 34.96s/it] 12%|█▏ | 2093/17285 [18:55:45<155:11:48, 36.78s/it] 12%|█▏ | 2094/17285 [18:56:13<143:34:30, 34.02s/it] 12%|█▏ | 2095/17285 [18:56:42<137:16:00, 32.53s/it] 12%|█▏ | 2096/17285 [18:57:13<135:04:17, 32.01s/it] 12%|█▏ | 2097/17285 [18:57:47<138:20:21, 32.79s/it] 12%|█▏ | 2098/17285 [18:58:28<148:36:46, 35.23s/it] 12%|█▏ | 2099/17285 [18:59:05<150:16:20, 35.62s/it] 12%|█▏ | 2100/17285 [18:59:34<142:08:15, 33.70s/it] {'loss': 1.6956, 'learning_rate': 0.00019733219765204383, 'epoch': 0.36} + 12%|█▏ | 2100/17285 [18:59:34<142:08:15, 33.70s/it] 12%|█▏ | 2101/17285 [19:00:09<143:56:19, 34.13s/it] 12%|█▏ | 2102/17285 [19:00:36<134:48:36, 31.96s/it] 12%|█▏ | 2103/17285 [19:01:11<138:10:21, 32.76s/it] 12%|█▏ | 2104/17285 [19:01:36<128:47:21, 30.54s/it] 12%|█▏ | 2105/17285 [19:02:10<133:17:58, 31.61s/it] 12%|█▏ | 2106/17285 [19:02:45<137:47:10, 32.68s/it] 12%|█▏ | 2107/17285 [19:03:28<150:29:21, 35.69s/it] 12%|█▏ | 2108/17285 [19:04:00<145:47:54, 34.58s/it] 12%|█▏ | 2109/17285 [19:04:30<140:00:12, 33.21s/it] 12%|█▏ | 2110/17285 [19:05:02<138:25:44, 32.84s/it] {'loss': 1.7511, 'learning_rate': 0.00019728812074414819, 'epoch': 0.37} + 12%|█▏ | 2110/17285 [19:05:02<138:25:44, 32.84s/it] 12%|█▏ | 2111/17285 [19:05:28<129:39:03, 30.76s/it] 12%|█▏ | 2112/17285 [19:06:02<134:07:14, 31.82s/it] 12%|█▏ | 2113/17285 [19:06:28<126:03:23, 29.91s/it] 12%|█▏ | 2114/17285 [19:06:57<125:34:53, 29.80s/it] 12%|█▏ | 2115/17285 [19:07:22<119:30:24, 28.36s/it] 12%|█▏ | 2116/17285 [19:07:55<124:45:24, 29.61s/it] 12%|█▏ | 2117/17285 [19:08:30<131:57:21, 31.32s/it] 12%|█▏ | 2118/17285 [19:08:58<127:16:35, 30.21s/it] 12%|█▏ | 2119/17285 [19:09:28<126:45:11, 30.09s/it] 12%|█▏ | 2120/17285 [19:10:02<132:49:41, 31.53s/it] {'loss': 1.7262, 'learning_rate': 0.00019724368770251155, 'epoch': 0.37} + 12%|█▏ | 2120/17285 [19:10:02<132:49:41, 31.53s/it] 12%|█▏ | 2121/17285 [19:10:36<134:47:27, 32.00s/it] 12%|█▏ | 2122/17285 [19:11:11<139:26:31, 33.11s/it] 12%|█▏ | 2123/17285 [19:11:40<133:44:25, 31.75s/it] 12%|█▏ | 2124/17285 [19:12:05<125:31:53, 29.81s/it] 12%|█▏ | 2125/17285 [19:12:32<121:25:18, 28.83s/it] 12%|█▏ | 2126/17285 [19:13:08<131:27:27, 31.22s/it] 12%|█▏ | 2127/17285 [19:13:38<129:55:45, 30.86s/it] 12%|█▏ | 2128/17285 [19:14:13<134:33:50, 31.96s/it] 12%|█▏ | 2129/17285 [19:14:47<137:43:43, 32.71s/it] 12%|█▏ | 2130/17285 [19:15:18<135:27:41, 32.18s/it] {'loss': 1.7114, 'learning_rate': 0.0001971988986897858, 'epoch': 0.37} + 12%|█▏ | 2130/17285 [19:15:18<135:27:41, 32.18s/it] 12%|█▏ | 2131/17285 [19:15:45<127:54:33, 30.39s/it] 12%|█▏ | 2132/17285 [19:16:18<131:58:05, 31.35s/it] 12%|█▏ | 2133/17285 [19:16:44<125:24:19, 29.80s/it] 12%|█▏ | 2134/17285 [19:17:23<136:20:51, 32.40s/it] 12%|█▏ | 2135/17285 [19:17:51<130:35:41, 31.03s/it] 12%|█▏ | 2136/17285 [19:18:25<134:48:10, 32.03s/it] 12%|█▏ | 2137/17285 [19:18:55<132:39:51, 31.53s/it] 12%|█▏ | 2138/17285 [19:19:31<137:29:04, 32.68s/it] 12%|█▏ | 2139/17285 [19:19:57<129:47:57, 30.85s/it] 12%|█▏ | 2140/17285 [19:20:26<126:40:42, 30.11s/it] {'loss': 1.7182, 'learning_rate': 0.00019715375386992608, 'epoch': 0.37} + 12%|█▏ | 2140/17285 [19:20:26<126:40:42, 30.11s/it] 12%|█▏ | 2141/17285 [19:20:51<120:10:40, 28.57s/it] 12%|█▏ | 2142/17285 [19:21:20<121:22:08, 28.85s/it] 12%|█▏ | 2143/17285 [19:21:49<121:51:10, 28.97s/it] 12%|█▏ | 2144/17285 [19:22:26<130:49:08, 31.10s/it] 12%|█▏ | 2145/17285 [19:22:59<133:13:11, 31.68s/it] 12%|█▏ | 2146/17285 [19:23:28<129:49:26, 30.87s/it] 12%|█▏ | 2147/17285 [19:24:02<134:20:39, 31.95s/it] 12%|█▏ | 2148/17285 [19:24:35<136:05:37, 32.37s/it] 12%|█▏ | 2149/17285 [19:25:08<135:53:26, 32.32s/it] 12%|█▏ | 2150/17285 [19:25:41<136:51:37, 32.55s/it] {'loss': 1.7034, 'learning_rate': 0.00019710825340818987, 'epoch': 0.37} + 12%|█▏ | 2150/17285 [19:25:41<136:51:37, 32.55s/it] 12%|█▏ | 2151/17285 [19:26:08<130:23:56, 31.02s/it] 12%|█▏ | 2152/17285 [19:26:45<137:59:43, 32.83s/it] 12%|█▏ | 2153/17285 [19:27:17<137:18:45, 32.67s/it] 12%|█▏ | 2154/17285 [19:27:52<140:00:31, 33.31s/it] 12%|█▏ | 2155/17285 [19:28:22<134:59:12, 32.12s/it] 12%|█▏ | 2156/17285 [19:28:56<137:41:05, 32.76s/it] 12%|█▏ | 2157/17285 [19:29:23<130:37:19, 31.08s/it] 12%|█▏ | 2158/17285 [19:29:53<129:30:08, 30.82s/it] 12%|█▏ | 2159/17285 [19:30:21<125:07:57, 29.78s/it] 12%|█▏ | 2160/17285 [19:30:58<134:37:18, 32.04s/it] {'loss': 1.7282, 'learning_rate': 0.00019706239747113656, 'epoch': 0.37} + 12%|█▏ | 2160/17285 [19:30:58<134:37:18, 32.04s/it] 13%|█▎ | 2161/17285 [19:31:28<132:34:32, 31.56s/it] 13%|█▎ | 2162/17285 [19:31:55<126:30:41, 30.12s/it] 13%|█▎ | 2163/17285 [19:32:27<128:40:23, 30.63s/it] 13%|█▎ | 2164/17285 [19:33:05<138:07:09, 32.88s/it] 13%|█▎ | 2165/17285 [19:33:35<134:38:39, 32.06s/it] 13%|█▎ | 2166/17285 [19:34:03<129:20:46, 30.80s/it] 13%|█▎ | 2167/17285 [19:34:40<136:47:52, 32.58s/it] 13%|█▎ | 2168/17285 [19:35:06<129:15:33, 30.78s/it] 13%|█▎ | 2169/17285 [19:35:37<129:23:29, 30.82s/it] 13%|█▎ | 2170/17285 [19:36:08<129:50:26, 30.92s/it] {'loss': 1.74, 'learning_rate': 0.00019701618622662678, 'epoch': 0.38} + 13%|█▎ | 2170/17285 [19:36:08<129:50:26, 30.92s/it] 13%|█▎ | 2171/17285 [19:36:44<136:15:26, 32.46s/it] 13%|█▎ | 2172/17285 [19:37:24<145:12:14, 34.59s/it] 13%|█▎ | 2173/17285 [19:37:53<138:31:31, 33.00s/it] 13%|█▎ | 2174/17285 [19:38:27<139:33:32, 33.25s/it] 13%|█▎ | 2175/17285 [19:39:04<144:22:00, 34.40s/it] 13%|█▎ | 2176/17285 [19:39:42<148:17:15, 35.33s/it] 13%|█▎ | 2177/17285 [19:40:16<147:10:57, 35.07s/it] 13%|█▎ | 2178/17285 [19:40:49<143:58:57, 34.31s/it] 13%|█▎ | 2179/17285 [19:41:26<147:05:54, 35.06s/it] 13%|█▎ | 2180/17285 [19:42:02<148:32:45, 35.40s/it] {'loss': 1.6854, 'learning_rate': 0.00019696961984382182, 'epoch': 0.38} + 13%|█▎ | 2180/17285 [19:42:02<148:32:45, 35.40s/it] 13%|█▎ | 2181/17285 [19:42:36<147:08:23, 35.07s/it] 13%|█▎ | 2182/17285 [19:43:06<141:15:25, 33.67s/it] 13%|█▎ | 2183/17285 [19:43:36<136:35:05, 32.56s/it] 13%|█▎ | 2184/17285 [19:44:12<141:01:59, 33.62s/it] 13%|█▎ | 2185/17285 [19:44:42<135:32:08, 32.31s/it] 13%|█▎ | 2186/17285 [19:45:13<134:32:09, 32.08s/it] 13%|█▎ | 2187/17285 [19:45:46<135:06:42, 32.22s/it] 13%|█▎ | 2188/17285 [19:46:17<134:08:50, 31.99s/it] 13%|█▎ | 2189/17285 [19:46:45<128:53:05, 30.74s/it] 13%|█▎ | 2190/17285 [19:47:15<127:51:04, 30.49s/it] {'loss': 1.756, 'learning_rate': 0.00019692269849318303, 'epoch': 0.38} + 13%|█▎ | 2190/17285 [19:47:15<127:51:04, 30.49s/it] 13%|█▎ | 2191/17285 [19:47:44<125:39:07, 29.97s/it] 13%|█▎ | 2192/17285 [19:48:12<123:39:17, 29.49s/it] 13%|█▎ | 2193/17285 [19:48:43<124:59:31, 29.82s/it] 13%|█▎ | 2194/17285 [19:49:18<132:06:59, 31.52s/it] 13%|█▎ | 2195/17285 [19:49:51<133:43:13, 31.90s/it] 13%|█▎ | 2196/17285 [19:50:23<133:41:01, 31.89s/it] 13%|█▎ | 2197/17285 [19:50:51<129:33:21, 30.91s/it] 13%|█▎ | 2198/17285 [19:51:19<125:23:36, 29.92s/it] 13%|█▎ | 2199/17285 [19:51:47<123:06:27, 29.38s/it] 13%|█▎ | 2200/17285 [19:52:17<123:29:56, 29.47s/it] {'loss': 1.7159, 'learning_rate': 0.00019687542234647106, 'epoch': 0.38} + 13%|█▎ | 2200/17285 [19:52:17<123:29:56, 29.47s/it] 13%|█▎ | 2201/17285 [19:52:48<125:52:42, 30.04s/it] 13%|█▎ | 2202/17285 [19:53:20<127:29:09, 30.43s/it] 13%|█▎ | 2203/17285 [19:53:49<126:47:11, 30.26s/it] 13%|█▎ | 2204/17285 [19:54:19<125:51:07, 30.04s/it] 13%|█▎ | 2205/17285 [19:54:54<132:05:20, 31.53s/it] 13%|█▎ | 2206/17285 [19:55:25<131:00:17, 31.28s/it] 13%|█▎ | 2207/17285 [19:55:50<123:13:40, 29.42s/it] 13%|█▎ | 2208/17285 [19:56:20<123:54:41, 29.59s/it] 13%|█▎ | 2209/17285 [19:56:50<124:52:37, 29.82s/it] 13%|█▎ | 2210/17285 [19:57:19<123:45:42, 29.56s/it] {'loss': 1.7095, 'learning_rate': 0.00019682779157674537, 'epoch': 0.38} + 13%|█▎ | 2210/17285 [19:57:19<123:45:42, 29.56s/it] 13%|█▎ | 2211/17285 [19:57:43<117:17:39, 28.01s/it] 13%|█▎ | 2212/17285 [19:58:31<141:29:54, 33.80s/it] 13%|█▎ | 2213/17285 [19:59:01<136:48:14, 32.68s/it] 13%|█▎ | 2214/17285 [19:59:40<144:38:14, 34.55s/it] 13%|█▎ | 2215/17285 [20:00:15<145:41:10, 34.80s/it] 13%|█▎ | 2216/17285 [20:00:50<145:59:17, 34.88s/it] 13%|█▎ | 2217/17285 [20:01:30<152:23:42, 36.41s/it] 13%|█▎ | 2218/17285 [20:02:02<146:24:49, 34.98s/it] 13%|█▎ | 2219/17285 [20:02:37<146:50:53, 35.09s/it] 13%|█▎ | 2220/17285 [20:03:03<134:53:36, 32.23s/it] {'loss': 1.7071, 'learning_rate': 0.00019677980635836363, 'epoch': 0.39} + 13%|█▎ | 2220/17285 [20:03:03<134:53:36, 32.23s/it] 13%|█▎ | 2221/17285 [20:03:37<137:23:46, 32.83s/it] 13%|█▎ | 2222/17285 [20:04:03<128:39:32, 30.75s/it] 13%|█▎ | 2223/17285 [20:04:40<136:39:04, 32.66s/it] 13%|█▎ | 2224/17285 [20:05:25<152:42:14, 36.50s/it] 13%|█▎ | 2225/17285 [20:05:59<148:28:13, 35.49s/it] 13%|█▎ | 2226/17285 [20:06:34<148:14:34, 35.44s/it] 13%|█▎ | 2227/17285 [20:07:05<143:08:51, 34.22s/it] 13%|█▎ | 2228/17285 [20:07:35<137:53:49, 32.97s/it] 13%|█▎ | 2229/17285 [20:08:06<135:10:28, 32.32s/it] 13%|█▎ | 2230/17285 [20:08:42<139:03:29, 33.25s/it] {'loss': 1.7077, 'learning_rate': 0.00019673146686698093, 'epoch': 0.39} + 13%|█▎ | 2230/17285 [20:08:42<139:03:29, 33.25s/it] 13%|█▎ | 2231/17285 [20:09:15<139:29:01, 33.36s/it] 13%|█▎ | 2232/17285 [20:09:48<138:57:53, 33.23s/it] 13%|█▎ | 2233/17285 [20:10:19<135:33:53, 32.42s/it] 13%|█▎ | 2234/17285 [20:10:45<127:37:17, 30.53s/it] 13%|█▎ | 2235/17285 [20:11:18<131:01:32, 31.34s/it] 13%|█▎ | 2236/17285 [20:11:47<128:28:55, 30.74s/it] 13%|█▎ | 2237/17285 [20:12:26<138:05:58, 33.04s/it] 13%|█▎ | 2238/17285 [20:12:55<133:22:23, 31.91s/it] 13%|█▎ | 2239/17285 [20:13:31<138:09:12, 33.06s/it] 13%|█▎ | 2240/17285 [20:14:07<141:59:36, 33.98s/it] {'loss': 1.7144, 'learning_rate': 0.00019668277327954917, 'epoch': 0.39} + 13%|█▎ | 2240/17285 [20:14:07<141:59:36, 33.98s/it] 13%|█▎ | 2241/17285 [20:14:34<133:54:28, 32.04s/it] 13%|█▎ | 2242/17285 [20:15:02<128:39:45, 30.79s/it] 13%|█▎ | 2243/17285 [20:15:42<139:57:37, 33.50s/it] 13%|█▎ | 2244/17285 [20:16:18<142:28:49, 34.10s/it] 13%|█▎ | 2245/17285 [20:16:55<146:20:30, 35.03s/it] 13%|█▎ | 2246/17285 [20:17:22<136:05:39, 32.58s/it] 13%|█▎ | 2247/17285 [20:18:00<143:23:28, 34.33s/it] 13%|█▎ | 2248/17285 [20:18:35<144:16:27, 34.54s/it] 13%|█▎ | 2249/17285 [20:19:10<145:13:57, 34.77s/it] 13%|█▎ | 2250/17285 [20:19:50<150:53:50, 36.13s/it] {'loss': 1.6873, 'learning_rate': 0.00019663372577431663, 'epoch': 0.39} + 13%|█▎ | 2250/17285 [20:19:50<150:53:50, 36.13s/it] 13%|█▎ | 2251/17285 [20:20:20<143:20:30, 34.32s/it] 13%|█▎ | 2252/17285 [20:20:57<146:40:53, 35.13s/it] 13%|█▎ | 2253/17285 [20:21:32<147:04:14, 35.22s/it] 13%|█▎ | 2254/17285 [20:21:59<135:54:44, 32.55s/it][2023-08-23 20:17:01,365] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 13%|█▎ | 2255/17285 [20:22:24<126:36:28, 30.33s/it] 13%|█▎ | 2256/17285 [20:22:53<124:47:22, 29.89s/it] 13%|█▎ | 2257/17285 [20:23:25<127:25:46, 30.53s/it] 13%|█▎ | 2258/17285 [20:23:54<126:22:56, 30.28s/it] 13%|█▎ | 2259/17285 [20:24:34<138:29:01, 33.18s/it] 13%|█▎ | 2260/17285 [20:25:05<136:02:35, 32.60s/it] {'loss': 1.7081, 'learning_rate': 0.0001965892805682537, 'epoch': 0.39} + 13%|█▎ | 2260/17285 [20:25:05<136:02:35, 32.60s/it] 13%|█▎ | 2261/17285 [20:25:35<132:00:29, 31.63s/it] 13%|█▎ | 2262/17285 [20:26:06<132:00:02, 31.63s/it] 13%|█▎ | 2263/17285 [20:26:38<132:13:14, 31.69s/it] 13%|█▎ | 2264/17285 [20:27:10<132:51:12, 31.84s/it] 13%|█▎ | 2265/17285 [20:27:48<140:22:30, 33.65s/it] 13%|█▎ | 2266/17285 [20:28:23<141:54:44, 34.02s/it] 13%|█▎ | 2267/17285 [20:28:52<135:17:36, 32.43s/it] 13%|█▎ | 2268/17285 [20:29:22<132:51:00, 31.85s/it] 13%|█▎ | 2269/17285 [20:29:49<125:37:06, 30.12s/it] 13%|█▎ | 2270/17285 [20:30:20<126:49:39, 30.41s/it] {'loss': 1.6979, 'learning_rate': 0.00019653956111491275, 'epoch': 0.39} + 13%|█▎ | 2270/17285 [20:30:20<126:49:39, 30.41s/it] 13%|█▎ | 2271/17285 [20:30:50<127:08:38, 30.49s/it] 13%|█▎ | 2272/17285 [20:31:19<124:44:41, 29.91s/it] 13%|█▎ | 2273/17285 [20:31:44<119:23:42, 28.63s/it] 13%|█▎ | 2274/17285 [20:32:15<122:04:18, 29.28s/it] 13%|█▎ | 2275/17285 [20:32:48<126:15:01, 30.28s/it] 13%|█▎ | 2276/17285 [20:33:17<124:11:48, 29.79s/it] 13%|█▎ | 2277/17285 [20:33:48<126:19:18, 30.30s/it] 13%|█▎ | 2278/17285 [20:34:16<123:21:27, 29.59s/it] 13%|█▎ | 2279/17285 [20:34:54<133:20:08, 31.99s/it] 13%|█▎ | 2280/17285 [20:35:36<145:57:26, 35.02s/it] {'loss': 1.698, 'learning_rate': 0.00019648948826801467, 'epoch': 0.4} + 13%|█▎ | 2280/17285 [20:35:36<145:57:26, 35.02s/it] 13%|█▎ | 2281/17285 [20:36:08<142:27:49, 34.18s/it] 13%|█▎ | 2282/17285 [20:36:42<142:18:06, 34.15s/it] 13%|█▎ | 2283/17285 [20:37:23<151:15:49, 36.30s/it] 13%|█▎ | 2284/17285 [20:37:54<144:17:10, 34.63s/it] 13%|█▎ | 2285/17285 [20:38:26<141:38:17, 33.99s/it][2023-08-23 20:33:37,696] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 13%|█▎ | 2286/17285 [20:39:00<141:02:42, 33.85s/it] 13%|█▎ | 2287/17285 [20:39:28<134:02:32, 32.17s/it] 13%|█▎ | 2288/17285 [20:39:57<130:20:17, 31.29s/it] 13%|█▎ | 2289/17285 [20:40:26<127:12:10, 30.54s/it] 13%|█▎ | 2290/17285 [20:41:01<132:23:44, 31.79s/it] {'loss': 1.692, 'learning_rate': 0.00019644412070578336, 'epoch': 0.4} + 13%|█▎ | 2290/17285 [20:41:01<132:23:44, 31.79s/it] 13%|█▎ | 2291/17285 [20:41:35<135:40:08, 32.57s/it] 13%|█▎ | 2292/17285 [20:42:12<140:14:51, 33.68s/it] 13%|█▎ | 2293/17285 [20:42:44<138:57:35, 33.37s/it] 13%|█▎ | 2294/17285 [20:43:19<140:21:16, 33.71s/it] 13%|█▎ | 2295/17285 [20:43:57<145:38:59, 34.98s/it] 13%|█▎ | 2296/17285 [20:44:22<133:05:52, 31.97s/it] 13%|█▎ | 2297/17285 [20:44:47<125:06:35, 30.05s/it] 13%|█▎ | 2298/17285 [20:45:19<127:10:06, 30.55s/it] 13%|█▎ | 2299/17285 [20:45:48<125:42:40, 30.20s/it] 13%|█▎ | 2300/17285 [20:46:22<129:31:16, 31.12s/it] {'loss': 1.6938, 'learning_rate': 0.00019639337691717884, 'epoch': 0.4} + 13%|█▎ | 2300/17285 [20:46:22<129:31:16, 31.12s/it] 13%|█▎ | 2301/17285 [20:46:49<125:21:37, 30.12s/it] 13%|█▎ | 2302/17285 [20:47:20<126:12:32, 30.32s/it] 13%|█▎ | 2303/17285 [20:47:51<126:34:47, 30.42s/it] 13%|█▎ | 2304/17285 [20:48:23<128:55:12, 30.98s/it] 13%|█▎ | 2305/17285 [20:48:51<124:25:53, 29.90s/it] 13%|█▎ | 2306/17285 [20:49:30<135:46:35, 32.63s/it] 13%|█▎ | 2307/17285 [20:50:08<143:42:29, 34.54s/it] 13%|█▎ | 2308/17285 [20:50:38<137:13:02, 32.98s/it] 13%|█▎ | 2309/17285 [20:51:07<132:09:33, 31.77s/it] 13%|█▎ | 2310/17285 [20:51:40<133:36:04, 32.12s/it] {'loss': 1.7152, 'learning_rate': 0.00019634228027014033, 'epoch': 0.4} + 13%|█▎ | 2310/17285 [20:51:40<133:36:04, 32.12s/it] 13%|█▎ | 2311/17285 [20:52:10<131:54:22, 31.71s/it] 13%|█▎ | 2312/17285 [20:52:45<135:09:48, 32.50s/it] 13%|█▎ | 2313/17285 [20:53:14<130:46:24, 31.44s/it] 13%|█▎ | 2314/17285 [20:53:41<125:55:35, 30.28s/it] 13%|█▎ | 2315/17285 [20:54:11<125:31:06, 30.18s/it] 13%|█▎ | 2316/17285 [20:54:49<135:02:29, 32.48s/it] 13%|█▎ | 2317/17285 [20:55:25<138:45:09, 33.37s/it] 13%|█▎ | 2318/17285 [20:55:56<136:17:08, 32.78s/it] 13%|█▎ | 2319/17285 [20:56:28<135:47:35, 32.66s/it] 13%|█▎ | 2320/17285 [20:57:03<138:28:16, 33.31s/it] {'loss': 1.7155, 'learning_rate': 0.00019629083095171264, 'epoch': 0.4} + 13%|█▎ | 2320/17285 [20:57:03<138:28:16, 33.31s/it] 13%|█▎ | 2321/17285 [20:57:29<129:33:06, 31.17s/it] 13%|█▎ | 2322/17285 [20:58:06<136:11:26, 32.77s/it] 13%|█▎ | 2323/17285 [20:58:32<127:33:50, 30.69s/it] 13%|█▎ | 2324/17285 [20:59:05<130:41:53, 31.45s/it] 13%|█▎ | 2325/17285 [20:59:40<134:52:42, 32.46s/it] 13%|█▎ | 2326/17285 [21:00:11<132:49:10, 31.96s/it] 13%|█▎ | 2327/17285 [21:00:44<135:05:24, 32.51s/it] 13%|█▎ | 2328/17285 [21:01:15<132:33:57, 31.91s/it] 13%|█▎ | 2329/17285 [21:01:43<128:17:27, 30.88s/it] 13%|█▎ | 2330/17285 [21:02:23<138:40:12, 33.38s/it] {'loss': 1.6839, 'learning_rate': 0.0001962390291502316, 'epoch': 0.4} + 13%|█▎ | 2330/17285 [21:02:23<138:40:12, 33.38s/it] 13%|█▎ | 2331/17285 [21:02:56<138:25:13, 33.32s/it] 13%|█▎ | 2332/17285 [21:03:30<138:58:44, 33.46s/it] 13%|█▎ | 2333/17285 [21:03:56<130:13:35, 31.35s/it] 14%|█▎ | 2334/17285 [21:04:24<125:39:30, 30.26s/it] 14%|█▎ | 2335/17285 [21:04:57<129:22:27, 31.15s/it] 14%|█▎ | 2336/17285 [21:05:30<132:11:28, 31.83s/it] 14%|█▎ | 2337/17285 [21:06:12<144:11:33, 34.73s/it] 14%|█▎ | 2338/17285 [21:06:46<143:12:13, 34.49s/it] 14%|█▎ | 2339/17285 [21:07:13<134:22:15, 32.37s/it] 14%|█▎ | 2340/17285 [21:07:46<135:18:34, 32.59s/it] {'loss': 1.6888, 'learning_rate': 0.00019618687505532334, 'epoch': 0.41} + 14%|█▎ | 2340/17285 [21:07:46<135:18:34, 32.59s/it] 14%|█▎ | 2341/17285 [21:08:20<136:07:06, 32.79s/it] 14%|█▎ | 2342/17285 [21:08:50<132:45:59, 31.99s/it] 14%|█▎ | 2343/17285 [21:09:24<135:49:09, 32.72s/it] 14%|█▎ | 2344/17285 [21:09:55<133:21:55, 32.13s/it] 14%|█▎ | 2345/17285 [21:10:25<130:21:31, 31.41s/it] 14%|█▎ | 2346/17285 [21:11:03<139:37:31, 33.65s/it] 14%|█▎ | 2347/17285 [21:11:31<131:28:54, 31.69s/it] 14%|█▎ | 2348/17285 [21:12:02<131:18:54, 31.65s/it] 14%|█▎ | 2349/17285 [21:12:44<143:34:09, 34.60s/it] 14%|█▎ | 2350/17285 [21:13:18<143:50:31, 34.67s/it] {'loss': 1.6962, 'learning_rate': 0.0001961343688579036, 'epoch': 0.41} + 14%|█▎ | 2350/17285 [21:13:18<143:50:31, 34.67s/it] 14%|█▎ | 2351/17285 [21:13:53<143:46:04, 34.66s/it] 14%|█▎ | 2352/17285 [21:14:18<131:23:25, 31.68s/it] 14%|█▎ | 2353/17285 [21:14:54<136:29:09, 32.91s/it] 14%|█▎ | 2354/17285 [21:15:28<137:55:12, 33.25s/it] 14%|█▎ | 2355/17285 [21:16:03<140:48:22, 33.95s/it] 14%|█▎ | 2356/17285 [21:16:33<135:01:01, 32.56s/it] 14%|█▎ | 2357/17285 [21:17:09<139:54:57, 33.74s/it] 14%|█▎ | 2358/17285 [21:17:40<136:55:00, 33.02s/it] 14%|█▎ | 2359/17285 [21:18:09<131:59:50, 31.84s/it] 14%|█▎ | 2360/17285 [21:18:43<134:03:30, 32.34s/it] {'loss': 1.6784, 'learning_rate': 0.000196081510750177, 'epoch': 0.41} + 14%|█▎ | 2360/17285 [21:18:43<134:03:30, 32.34s/it] 14%|█▎ | 2361/17285 [21:19:21<141:02:47, 34.02s/it] 14%|█▎ | 2362/17285 [21:19:56<142:53:18, 34.47s/it] 14%|█▎ | 2363/17285 [21:20:31<142:27:52, 34.37s/it] 14%|█▎ | 2364/17285 [21:21:01<137:26:34, 33.16s/it] 14%|█▎ | 2365/17285 [21:21:41<146:22:56, 35.32s/it] 14%|█▎ | 2366/17285 [21:22:13<141:43:33, 34.20s/it] 14%|█▎ | 2367/17285 [21:22:48<143:04:52, 34.53s/it] 14%|█▎ | 2368/17285 [21:23:17<136:00:40, 32.82s/it] 14%|█▎ | 2369/17285 [21:23:49<134:58:29, 32.58s/it] 14%|█▎ | 2370/17285 [21:24:24<138:33:16, 33.44s/it] {'loss': 1.672, 'learning_rate': 0.00019602830092563643, 'epoch': 0.41} + 14%|█▎ | 2370/17285 [21:24:24<138:33:16, 33.44s/it] 14%|█▎ | 2371/17285 [21:24:55<135:34:27, 32.73s/it] 14%|█▎ | 2372/17285 [21:25:21<126:11:23, 30.46s/it] 14%|█▎ | 2373/17285 [21:25:49<123:17:09, 29.76s/it] 14%|█▎ | 2374/17285 [21:26:34<142:22:35, 34.37s/it] 14%|█▎ | 2375/17285 [21:27:11<145:06:52, 35.04s/it] 14%|█▎ | 2376/17285 [21:27:39<137:27:22, 33.19s/it] 14%|█▍ | 2377/17285 [21:28:15<140:15:20, 33.87s/it] 14%|█▍ | 2378/17285 [21:28:45<135:27:14, 32.71s/it] 14%|█▍ | 2379/17285 [21:29:18<136:01:33, 32.85s/it] 14%|█▍ | 2380/17285 [21:29:51<136:05:21, 32.87s/it] {'loss': 1.6769, 'learning_rate': 0.00019597473957906224, 'epoch': 0.41} + 14%|█▍ | 2380/17285 [21:29:51<136:05:21, 32.87s/it] 14%|█▍ | 2381/17285 [21:30:24<136:46:50, 33.04s/it] 14%|█▍ | 2382/17285 [21:30:54<132:05:06, 31.91s/it] 14%|█▍ | 2383/17285 [21:31:21<126:27:27, 30.55s/it] 14%|█▍ | 2384/17285 [21:31:49<123:13:05, 29.77s/it] 14%|█▍ | 2385/17285 [21:32:19<123:37:20, 29.87s/it] 14%|█▍ | 2386/17285 [21:32:46<120:28:25, 29.11s/it] 14%|█▍ | 2387/17285 [21:33:17<122:07:00, 29.51s/it] 14%|█▍ | 2388/17285 [21:33:52<128:38:48, 31.09s/it] 14%|█▍ | 2389/17285 [21:34:23<129:15:38, 31.24s/it] 14%|█▍ | 2390/17285 [21:35:00<136:15:21, 32.93s/it] {'loss': 1.6975, 'learning_rate': 0.00019592082690652148, 'epoch': 0.41} + 14%|█▍ | 2390/17285 [21:35:00<136:15:21, 32.93s/it] 14%|█▍ | 2391/17285 [21:35:26<128:06:13, 30.96s/it] 14%|█▍ | 2392/17285 [21:35:53<122:30:31, 29.61s/it] 14%|█▍ | 2393/17285 [21:36:30<131:38:36, 31.82s/it] 14%|█▍ | 2394/17285 [21:36:55<123:19:27, 29.81s/it] 14%|█▍ | 2395/17285 [21:37:35<136:05:44, 32.90s/it] 14%|█▍ | 2396/17285 [21:38:02<128:59:35, 31.19s/it] 14%|█▍ | 2397/17285 [21:38:31<125:27:32, 30.34s/it] 14%|█▍ | 2398/17285 [21:38:56<119:07:55, 28.81s/it] 14%|█▍ | 2399/17285 [21:39:32<128:31:25, 31.08s/it] 14%|█▍ | 2400/17285 [21:40:08<134:42:33, 32.58s/it] {'loss': 1.7687, 'learning_rate': 0.00019586656310536743, 'epoch': 0.42} + 14%|█▍ | 2400/17285 [21:40:08<134:42:33, 32.58s/it] 14%|█▍ | 2401/17285 [21:40:46<141:30:25, 34.23s/it] 14%|█▍ | 2402/17285 [21:41:22<143:07:21, 34.62s/it] 14%|█▍ | 2403/17285 [21:41:57<143:36:56, 34.74s/it] 14%|█▍ | 2404/17285 [21:42:27<137:57:27, 33.37s/it] 14%|█▍ | 2405/17285 [21:43:03<141:31:47, 34.24s/it] 14%|█▍ | 2406/17285 [21:43:40<143:44:34, 34.78s/it] 14%|█▍ | 2407/17285 [21:44:07<135:00:08, 32.67s/it] 14%|█▍ | 2408/17285 [21:44:51<148:55:44, 36.04s/it] 14%|█▍ | 2409/17285 [21:45:19<139:02:52, 33.65s/it] 14%|█▍ | 2410/17285 [21:45:57<143:46:13, 34.79s/it] {'loss': 1.685, 'learning_rate': 0.00019581194837423857, 'epoch': 0.42} + 14%|█▍ | 2410/17285 [21:45:57<143:46:13, 34.79s/it] 14%|█▍ | 2411/17285 [21:46:37<150:31:29, 36.43s/it] 14%|█▍ | 2412/17285 [21:47:17<155:03:26, 37.53s/it] 14%|█▍ | 2413/17285 [21:47:51<150:24:25, 36.41s/it] 14%|█▍ | 2414/17285 [21:48:16<137:04:31, 33.18s/it] 14%|█▍ | 2415/17285 [21:48:56<144:28:09, 34.98s/it] 14%|█▍ | 2416/17285 [21:49:23<135:06:12, 32.71s/it] 14%|█▍ | 2417/17285 [21:49:51<129:16:13, 31.30s/it] 14%|█▍ | 2418/17285 [21:50:20<126:32:31, 30.64s/it] 14%|█▍ | 2419/17285 [21:50:57<133:43:56, 32.39s/it] 14%|█▍ | 2420/17285 [21:51:25<128:39:32, 31.16s/it] {'loss': 1.6858, 'learning_rate': 0.00019575698291305813, 'epoch': 0.42} + 14%|█▍ | 2420/17285 [21:51:25<128:39:32, 31.16s/it] 14%|█▍ | 2421/17285 [21:51:51<122:26:24, 29.65s/it] 14%|█▍ | 2422/17285 [21:52:17<118:17:05, 28.65s/it] 14%|█▍ | 2423/17285 [21:52:56<129:59:57, 31.49s/it] 14%|█▍ | 2424/17285 [21:53:26<129:00:08, 31.25s/it] 14%|█▍ | 2425/17285 [21:53:54<124:46:47, 30.23s/it] 14%|█▍ | 2426/17285 [21:54:28<129:14:02, 31.31s/it] 14%|█▍ | 2427/17285 [21:55:00<130:33:25, 31.63s/it] 14%|█▍ | 2428/17285 [21:55:30<128:21:43, 31.10s/it] 14%|█▍ | 2429/17285 [21:56:00<126:43:01, 30.71s/it] 14%|█▍ | 2430/17285 [21:56:28<123:47:56, 30.00s/it] {'loss': 1.6883, 'learning_rate': 0.0001957016669230331, 'epoch': 0.42} + 14%|█▍ | 2430/17285 [21:56:28<123:47:56, 30.00s/it][2023-08-23 21:51:31,264] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 14%|█▍ | 2431/17285 [21:56:54<117:59:16, 28.60s/it] 14%|█▍ | 2432/17285 [21:57:32<130:29:37, 31.63s/it] 14%|█▍ | 2433/17285 [21:58:10<137:34:29, 33.35s/it] 14%|█▍ | 2434/17285 [21:58:45<139:28:39, 33.81s/it] 14%|█▍ | 2435/17285 [21:59:16<137:05:38, 33.23s/it] 14%|█▍ | 2436/17285 [21:59:53<140:51:37, 34.15s/it] 14%|█▍ | 2437/17285 [22:00:25<138:07:53, 33.49s/it] 14%|█▍ | 2438/17285 [22:01:03<143:33:54, 34.81s/it] 14%|█▍ | 2439/17285 [22:01:33<137:50:58, 33.43s/it] 14%|█▍ | 2440/17285 [22:01:59<129:27:24, 31.39s/it] {'loss': 1.6471, 'learning_rate': 0.00019565158299718013, 'epoch': 0.42} + 14%|█▍ | 2440/17285 [22:01:59<129:27:24, 31.39s/it] 14%|█▍ | 2441/17285 [22:02:27<124:13:43, 30.13s/it] 14%|█▍ | 2442/17285 [22:02:52<118:38:26, 28.77s/it] 14%|█▍ | 2443/17285 [22:03:36<136:46:56, 33.18s/it] 14%|█▍ | 2444/17285 [22:04:07<134:17:26, 32.58s/it] 14%|█▍ | 2445/17285 [22:04:39<133:35:28, 32.41s/it] 14%|█▍ | 2446/17285 [22:05:14<136:32:00, 33.12s/it] 14%|█▍ | 2447/17285 [22:05:50<140:56:19, 34.19s/it] 14%|█▍ | 2448/17285 [22:06:29<145:54:30, 35.40s/it] 14%|█▍ | 2449/17285 [22:07:00<141:12:38, 34.27s/it] 14%|█▍ | 2450/17285 [22:07:35<141:33:52, 34.35s/it] {'loss': 1.6831, 'learning_rate': 0.0001955956015612708, 'epoch': 0.43} + 14%|█▍ | 2450/17285 [22:07:35<141:33:52, 34.35s/it] 14%|█▍ | 2451/17285 [22:08:03<134:21:12, 32.61s/it] 14%|█▍ | 2452/17285 [22:08:33<130:13:14, 31.60s/it] 14%|█▍ | 2453/17285 [22:08:58<122:29:28, 29.73s/it] 14%|█▍ | 2454/17285 [22:09:25<118:57:18, 28.87s/it] 14%|█▍ | 2455/17285 [22:10:02<128:42:25, 31.24s/it] 14%|█▍ | 2456/17285 [22:10:38<134:57:39, 32.76s/it] 14%|█▍ | 2457/17285 [22:11:14<138:36:21, 33.65s/it] 14%|█▍ | 2458/17285 [22:11:39<128:44:11, 31.26s/it] 14%|█▍ | 2459/17285 [22:12:11<129:46:31, 31.51s/it] 14%|█▍ | 2460/17285 [22:12:45<132:12:10, 32.10s/it] {'loss': 1.6686, 'learning_rate': 0.0001955392701872709, 'epoch': 0.43} + 14%|█▍ | 2460/17285 [22:12:45<132:12:10, 32.10s/it] 14%|█▍ | 2461/17285 [22:13:14<128:26:44, 31.19s/it] 14%|█▍ | 2462/17285 [22:13:44<126:53:38, 30.82s/it] 14%|█▍ | 2463/17285 [22:14:17<130:09:50, 31.61s/it] 14%|█▍ | 2464/17285 [22:14:41<120:26:34, 29.26s/it] 14%|█▍ | 2465/17285 [22:15:15<126:27:36, 30.72s/it] 14%|█▍ | 2466/17285 [22:15:57<140:13:16, 34.06s/it] 14%|█▍ | 2467/17285 [22:16:29<137:41:29, 33.45s/it] 14%|█▍ | 2468/17285 [22:16:58<132:20:11, 32.15s/it] 14%|█▍ | 2469/17285 [22:17:33<135:28:29, 32.92s/it] 14%|█▍ | 2470/17285 [22:18:05<134:15:25, 32.62s/it] {'loss': 1.7006, 'learning_rate': 0.00019548258908138753, 'epoch': 0.43} + 14%|█▍ | 2470/17285 [22:18:05<134:15:25, 32.62s/it] 14%|█▍ | 2471/17285 [22:18:39<135:39:35, 32.97s/it] 14%|█▍ | 2472/17285 [22:19:06<128:38:59, 31.27s/it] 14%|█▍ | 2473/17285 [22:19:38<129:16:22, 31.42s/it] 14%|█▍ | 2474/17285 [22:20:03<122:17:46, 29.73s/it] 14%|█▍ | 2475/17285 [22:20:33<122:22:05, 29.75s/it] 14%|█▍ | 2476/17285 [22:21:10<130:31:04, 31.73s/it] 14%|█▍ | 2477/17285 [22:21:37<125:33:36, 30.53s/it] 14%|█▍ | 2478/17285 [22:22:08<125:54:12, 30.61s/it] 14%|█▍ | 2479/17285 [22:22:37<124:09:18, 30.19s/it] 14%|█▍ | 2480/17285 [22:23:05<120:50:15, 29.38s/it] {'loss': 1.7317, 'learning_rate': 0.00019542555845110805, 'epoch': 0.43} + 14%|█▍ | 2480/17285 [22:23:05<120:50:15, 29.38s/it] 14%|█▍ | 2481/17285 [22:23:39<126:15:59, 30.71s/it] 14%|█▍ | 2482/17285 [22:24:14<132:36:58, 32.25s/it] 14%|█▍ | 2483/17285 [22:24:41<125:23:42, 30.50s/it] 14%|█▍ | 2484/17285 [22:25:11<125:09:35, 30.44s/it] 14%|█▍ | 2485/17285 [22:25:37<119:32:10, 29.08s/it] 14%|█▍ | 2486/17285 [22:26:05<117:37:05, 28.61s/it] 14%|█▍ | 2487/17285 [22:26:47<134:44:44, 32.78s/it] 14%|█▍ | 2488/17285 [22:27:27<143:31:48, 34.92s/it] 14%|█▍ | 2489/17285 [22:28:03<144:29:29, 35.16s/it] 14%|█▍ | 2490/17285 [22:28:30<134:37:47, 32.76s/it] {'loss': 1.6572, 'learning_rate': 0.00019536817850519927, 'epoch': 0.43} + 14%|█▍ | 2490/17285 [22:28:30<134:37:47, 32.76s/it] 14%|█▍ | 2491/17285 [22:29:05<138:06:36, 33.61s/it] 14%|█▍ | 2492/17285 [22:29:35<132:40:24, 32.29s/it] 14%|█▍ | 2493/17285 [22:30:04<129:32:48, 31.53s/it] 14%|█▍ | 2494/17285 [22:30:36<129:41:28, 31.57s/it] 14%|█▍ | 2495/17285 [22:31:12<135:09:24, 32.90s/it] 14%|█▍ | 2496/17285 [22:31:47<137:29:42, 33.47s/it] 14%|█▍ | 2497/17285 [22:32:20<136:36:39, 33.26s/it] 14%|█▍ | 2498/17285 [22:32:55<139:14:46, 33.90s/it] 14%|█▍ | 2499/17285 [22:33:28<138:11:45, 33.65s/it] 14%|█▍ | 2500/17285 [22:34:03<140:14:31, 34.15s/it] {'loss': 1.6916, 'learning_rate': 0.0001953104494537067, 'epoch': 0.43} + 14%|█▍ | 2500/17285 [22:34:03<140:14:31, 34.15s/it] 14%|█▍ | 2501/17285 [22:34:41<144:05:22, 35.09s/it] 14%|█▍ | 2502/17285 [22:35:21<150:44:28, 36.71s/it] 14%|█▍ | 2503/17285 [22:35:51<141:48:27, 34.54s/it] 14%|█▍ | 2504/17285 [22:36:23<138:28:25, 33.73s/it] 14%|█▍ | 2505/17285 [22:36:49<129:16:11, 31.49s/it] 14%|█▍ | 2506/17285 [22:37:24<133:33:50, 32.53s/it] 15%|█▍ | 2507/17285 [22:37:51<126:47:01, 30.89s/it] 15%|█▍ | 2508/17285 [22:38:21<125:21:49, 30.54s/it] 15%|█▍ | 2509/17285 [22:38:48<121:24:33, 29.58s/it] 15%|█▍ | 2510/17285 [22:39:22<127:16:53, 31.01s/it] {'loss': 1.6533, 'learning_rate': 0.0001952523715079538, 'epoch': 0.44} + 15%|█▍ | 2510/17285 [22:39:22<127:16:53, 31.01s/it] 15%|█▍ | 2511/17285 [22:40:08<144:48:44, 35.29s/it] 15%|█▍ | 2512/17285 [22:40:49<152:41:13, 37.21s/it] 15%|█▍ | 2513/17285 [22:41:29<155:37:08, 37.93s/it] 15%|█▍ | 2514/17285 [22:42:00<146:43:37, 35.76s/it] 15%|█▍ | 2515/17285 [22:42:33<143:34:39, 35.00s/it] 15%|█▍ | 2516/17285 [22:43:06<141:04:05, 34.39s/it] 15%|█▍ | 2517/17285 [22:43:47<149:44:58, 36.50s/it] 15%|█▍ | 2518/17285 [22:44:18<142:18:26, 34.69s/it] 15%|█▍ | 2519/17285 [22:44:47<135:53:50, 33.13s/it] 15%|█▍ | 2520/17285 [22:45:15<129:08:32, 31.49s/it] {'loss': 1.6463, 'learning_rate': 0.00019519394488054127, 'epoch': 0.44} + 15%|█▍ | 2520/17285 [22:45:15<129:08:32, 31.49s/it] 15%|█▍ | 2521/17285 [22:45:47<129:57:35, 31.69s/it] 15%|█▍ | 2522/17285 [22:46:25<137:40:03, 33.57s/it] 15%|█▍ | 2523/17285 [22:46:59<138:45:17, 33.84s/it] 15%|█▍ | 2524/17285 [22:47:25<128:54:11, 31.44s/it] 15%|█▍ | 2525/17285 [22:47:50<121:22:16, 29.60s/it] 15%|█▍ | 2526/17285 [22:48:24<126:20:39, 30.82s/it] 15%|█▍ | 2527/17285 [22:48:56<127:44:56, 31.16s/it] 15%|█▍ | 2528/17285 [22:49:26<126:41:51, 30.91s/it] 15%|█▍ | 2529/17285 [22:50:02<132:31:31, 32.33s/it] 15%|█▍ | 2530/17285 [22:50:34<132:36:37, 32.35s/it] {'loss': 1.6984, 'learning_rate': 0.00019513516978534608, 'epoch': 0.44} + 15%|█▍ | 2530/17285 [22:50:34<132:36:37, 32.35s/it] 15%|█▍ | 2531/17285 [22:51:06<131:51:28, 32.17s/it] 15%|█▍ | 2532/17285 [22:51:42<135:43:49, 33.12s/it] 15%|█▍ | 2533/17285 [22:52:06<124:31:59, 30.39s/it] 15%|█▍ | 2534/17285 [22:52:34<122:25:04, 29.88s/it] 15%|█▍ | 2535/17285 [22:53:00<117:47:44, 28.75s/it] 15%|█▍ | 2536/17285 [22:53:44<136:28:45, 33.31s/it] 15%|█▍ | 2537/17285 [22:54:26<146:23:00, 35.73s/it][2023-08-23 22:49:30,249] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 15%|█▍ | 2538/17285 [22:54:53<135:28:38, 33.07s/it] 15%|█▍ | 2539/17285 [22:55:38<150:52:37, 36.83s/it] 15%|█▍ | 2540/17285 [22:56:05<138:44:50, 33.88s/it] {'loss': 1.6643, 'learning_rate': 0.00019508197443751353, 'epoch': 0.44} + 15%|█▍ | 2540/17285 [22:56:05<138:44:50, 33.88s/it][2023-08-23 22:51:16,163] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 15%|█▍ | 2541/17285 [22:56:38<138:04:13, 33.71s/it] 15%|█▍ | 2542/17285 [22:57:09<134:40:48, 32.89s/it] 15%|█▍ | 2543/17285 [22:57:44<136:57:29, 33.45s/it] 15%|█▍ | 2544/17285 [22:58:20<140:01:34, 34.20s/it] 15%|█▍ | 2545/17285 [22:58:51<136:00:00, 33.22s/it] 15%|█▍ | 2546/17285 [22:59:25<137:04:34, 33.48s/it] 15%|█▍ | 2547/17285 [22:59:57<134:26:20, 32.84s/it] 15%|█▍ | 2548/17285 [23:00:29<133:37:07, 32.64s/it] 15%|█▍ | 2549/17285 [23:00:56<126:33:05, 30.92s/it] 15%|█▍ | 2550/17285 [23:01:37<139:20:36, 34.04s/it] {'loss': 1.6853, 'learning_rate': 0.0001950284971627635, 'epoch': 0.44} + 15%|█▍ | 2550/17285 [23:01:37<139:20:36, 34.04s/it] 15%|█▍ | 2551/17285 [23:02:14<142:49:22, 34.90s/it] 15%|█▍ | 2552/17285 [23:02:53<148:19:52, 36.24s/it] 15%|█▍ | 2553/17285 [23:03:19<135:35:06, 33.13s/it] 15%|█▍ | 2554/17285 [23:03:47<129:43:20, 31.70s/it] 15%|█▍ | 2555/17285 [23:04:14<123:12:10, 30.11s/it] 15%|█▍ | 2556/17285 [23:04:40<118:55:46, 29.07s/it] 15%|█▍ | 2557/17285 [23:05:05<113:46:17, 27.81s/it] 15%|█▍ | 2558/17285 [23:05:32<112:39:41, 27.54s/it] 15%|█▍ | 2559/17285 [23:06:05<119:04:53, 29.11s/it] 15%|█▍ | 2560/17285 [23:06:33<117:16:56, 28.67s/it] {'loss': 1.6864, 'learning_rate': 0.00019496874750645754, 'epoch': 0.44} + 15%|█▍ | 2560/17285 [23:06:33<117:16:56, 28.67s/it] 15%|█▍ | 2561/17285 [23:07:03<119:27:53, 29.21s/it] 15%|█▍ | 2562/17285 [23:07:34<121:04:22, 29.60s/it] 15%|█▍ | 2563/17285 [23:08:11<129:58:25, 31.78s/it] 15%|█▍ | 2564/17285 [23:08:47<135:39:04, 33.17s/it] 15%|█▍ | 2565/17285 [23:09:13<126:37:41, 30.97s/it] 15%|█▍ | 2566/17285 [23:09:43<125:56:15, 30.80s/it] 15%|█▍ | 2567/17285 [23:10:20<132:42:51, 32.46s/it] 15%|█▍ | 2568/17285 [23:10:49<128:36:24, 31.46s/it] 15%|█▍ | 2569/17285 [23:11:27<136:45:28, 33.46s/it] 15%|█▍ | 2570/17285 [23:12:02<138:21:04, 33.85s/it] {'loss': 1.6562, 'learning_rate': 0.00019490865020672837, 'epoch': 0.45} + 15%|█▍ | 2570/17285 [23:12:02<138:21:04, 33.85s/it] 15%|█▍ | 2571/17285 [23:12:35<137:27:10, 33.63s/it] 15%|█▍ | 2572/17285 [23:13:04<131:50:03, 32.26s/it] 15%|█▍ | 2573/17285 [23:13:37<133:32:33, 32.68s/it] 15%|█▍ | 2574/17285 [23:14:10<133:49:36, 32.75s/it] 15%|█▍ | 2575/17285 [23:14:40<129:55:41, 31.80s/it] 15%|█▍ | 2576/17285 [23:15:19<138:57:16, 34.01s/it] 15%|█▍ | 2577/17285 [23:15:53<138:33:49, 33.92s/it] 15%|█▍ | 2578/17285 [23:16:36<150:15:13, 36.78s/it] 15%|█▍ | 2579/17285 [23:17:04<139:22:18, 34.12s/it] 15%|█▍ | 2580/17285 [23:17:44<146:17:18, 35.81s/it] {'loss': 1.6825, 'learning_rate': 0.00019484820548356873, 'epoch': 0.45} + 15%|█▍ | 2580/17285 [23:17:44<146:17:18, 35.81s/it] 15%|█▍ | 2581/17285 [23:18:21<148:30:13, 36.36s/it] 15%|█▍ | 2582/17285 [23:18:56<145:39:28, 35.66s/it] 15%|█▍ | 2583/17285 [23:19:27<140:47:35, 34.48s/it] 15%|█▍ | 2584/17285 [23:19:55<132:57:34, 32.56s/it] 15%|█▍ | 2585/17285 [23:20:27<131:29:35, 32.20s/it] 15%|█▍ | 2586/17285 [23:20:57<128:57:02, 31.58s/it] 15%|█▍ | 2587/17285 [23:21:28<127:53:51, 31.33s/it] 15%|█▍ | 2588/17285 [23:21:57<125:54:20, 30.84s/it] 15%|█▍ | 2589/17285 [23:22:32<130:44:55, 32.03s/it] 15%|█▍ | 2590/17285 [23:23:03<129:54:57, 31.83s/it] {'loss': 1.7296, 'learning_rate': 0.00019478741355824313, 'epoch': 0.45} + 15%|█▍ | 2590/17285 [23:23:03<129:54:57, 31.83s/it] 15%|█▍ | 2591/17285 [23:23:30<123:30:18, 30.26s/it] 15%|█▍ | 2592/17285 [23:24:06<130:24:55, 31.95s/it] 15%|█▌ | 2593/17285 [23:24:32<122:59:43, 30.14s/it] 15%|█▌ | 2594/17285 [23:25:04<125:49:08, 30.83s/it] 15%|█▌ | 2595/17285 [23:25:40<131:52:53, 32.32s/it] 15%|█▌ | 2596/17285 [23:26:09<127:36:57, 31.28s/it] 15%|█▌ | 2597/17285 [23:26:47<135:46:00, 33.28s/it] 15%|█▌ | 2598/17285 [23:27:17<132:01:21, 32.36s/it] 15%|█▌ | 2599/17285 [23:27:49<131:15:41, 32.18s/it] 15%|█▌ | 2600/17285 [23:28:19<128:36:18, 31.53s/it] {'loss': 1.7077, 'learning_rate': 0.00019472627465328692, 'epoch': 0.45} + 15%|█▌ | 2600/17285 [23:28:19<128:36:18, 31.53s/it] 15%|█▌ | 2601/17285 [23:28:55<134:43:29, 33.03s/it] 15%|█▌ | 2602/17285 [23:29:22<126:51:40, 31.10s/it] 15%|█▌ | 2603/17285 [23:29:53<127:10:36, 31.18s/it] 15%|█▌ | 2604/17285 [23:30:21<122:21:01, 30.00s/it] 15%|█▌ | 2605/17285 [23:30:58<131:34:38, 32.27s/it] 15%|█▌ | 2606/17285 [23:31:30<131:15:02, 32.19s/it] 15%|█▌ | 2607/17285 [23:32:00<128:12:09, 31.44s/it] 15%|█▌ | 2608/17285 [23:32:27<123:01:59, 30.18s/it] 15%|█▌ | 2609/17285 [23:33:03<129:54:12, 31.87s/it] 15%|█▌ | 2610/17285 [23:33:35<129:46:48, 31.84s/it] {'loss': 1.7098, 'learning_rate': 0.0001946647889925058, 'epoch': 0.45} + 15%|█▌ | 2610/17285 [23:33:35<129:46:48, 31.84s/it] 15%|█▌ | 2611/17285 [23:34:01<123:34:52, 30.32s/it] 15%|█▌ | 2612/17285 [23:34:30<121:49:38, 29.89s/it] 15%|█▌ | 2613/17285 [23:35:02<123:55:42, 30.41s/it][2023-08-23 23:30:09,122] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 15%|█▌ | 2614/17285 [23:35:31<122:49:06, 30.14s/it] 15%|█▌ | 2615/17285 [23:36:02<123:46:05, 30.37s/it] 15%|█▌ | 2616/17285 [23:36:40<132:25:11, 32.50s/it] 15%|█▌ | 2617/17285 [23:37:16<136:52:05, 33.59s/it] 15%|█▌ | 2618/17285 [23:37:49<136:05:21, 33.40s/it] 15%|█▌ | 2619/17285 [23:38:19<132:24:19, 32.50s/it] 15%|█▌ | 2620/17285 [23:38:49<128:41:46, 31.59s/it] {'loss': 1.6647, 'learning_rate': 0.00019460915560757066, 'epoch': 0.45} + 15%|█▌ | 2620/17285 [23:38:49<128:41:46, 31.59s/it] 15%|█▌ | 2621/17285 [23:39:17<124:50:06, 30.65s/it] 15%|█▌ | 2622/17285 [23:39:49<125:59:58, 30.93s/it] 15%|█▌ | 2623/17285 [23:40:20<126:14:49, 31.00s/it] 15%|█▌ | 2624/17285 [23:41:00<137:38:59, 33.80s/it] 15%|█▌ | 2625/17285 [23:41:30<132:37:11, 32.57s/it] 15%|█▌ | 2626/17285 [23:41:58<127:24:21, 31.29s/it] 15%|█▌ | 2627/17285 [23:42:22<118:34:18, 29.12s/it] 15%|█▌ | 2628/17285 [23:42:49<115:06:13, 28.27s/it] 15%|█▌ | 2629/17285 [23:43:23<123:04:52, 30.23s/it] 15%|█▌ | 2630/17285 [23:43:52<120:51:57, 29.69s/it] {'loss': 1.6919, 'learning_rate': 0.000194547011731852, 'epoch': 0.46} + 15%|█▌ | 2630/17285 [23:43:52<120:51:57, 29.69s/it] 15%|█▌ | 2631/17285 [23:44:26<126:39:23, 31.12s/it] 15%|█▌ | 2632/17285 [23:44:52<120:15:28, 29.55s/it] 15%|█▌ | 2633/17285 [23:45:28<127:15:15, 31.27s/it] 15%|█▌ | 2634/17285 [23:46:04<133:37:19, 32.83s/it] 15%|█▌ | 2635/17285 [23:46:35<131:23:20, 32.29s/it] 15%|█▌ | 2636/17285 [23:47:08<132:10:50, 32.48s/it] 15%|█▌ | 2637/17285 [23:47:37<128:18:19, 31.53s/it] 15%|█▌ | 2638/17285 [23:48:05<124:15:41, 30.54s/it] 15%|█▌ | 2639/17285 [23:48:48<139:26:53, 34.28s/it] 15%|█▌ | 2640/17285 [23:49:19<134:19:37, 33.02s/it] {'loss': 1.6805, 'learning_rate': 0.00019448452175651983, 'epoch': 0.46} + 15%|█▌ | 2640/17285 [23:49:19<134:19:37, 33.02s/it] 15%|█▌ | 2641/17285 [23:50:01<146:06:12, 35.92s/it] 15%|█▌ | 2642/17285 [23:50:29<135:57:59, 33.43s/it] 15%|█▌ | 2643/17285 [23:51:00<132:46:53, 32.65s/it] 15%|█▌ | 2644/17285 [23:51:32<132:01:26, 32.46s/it] 15%|█▌ | 2645/17285 [23:52:03<130:48:58, 32.17s/it] 15%|█▌ | 2646/17285 [23:52:34<128:53:07, 31.70s/it] 15%|█▌ | 2647/17285 [23:52:59<120:53:04, 29.73s/it] 15%|█▌ | 2648/17285 [23:53:31<124:14:12, 30.56s/it] 15%|█▌ | 2649/17285 [23:54:04<126:28:13, 31.11s/it] 15%|█▌ | 2650/17285 [23:54:36<127:16:27, 31.31s/it] {'loss': 1.7276, 'learning_rate': 0.0001944216859103255, 'epoch': 0.46} + 15%|█▌ | 2650/17285 [23:54:36<127:16:27, 31.31s/it] 15%|█▌ | 2651/17285 [23:55:04<123:47:13, 30.45s/it] 15%|█▌ | 2652/17285 [23:55:42<133:15:16, 32.78s/it] 15%|█▌ | 2653/17285 [23:56:16<133:57:05, 32.96s/it] 15%|█▌ | 2654/17285 [23:56:49<134:04:45, 32.99s/it] 15%|█▌ | 2655/17285 [23:57:22<134:46:06, 33.16s/it] 15%|█▌ | 2656/17285 [23:57:55<133:55:07, 32.96s/it] 15%|█▌ | 2657/17285 [23:58:24<129:14:57, 31.81s/it] 15%|█▌ | 2658/17285 [23:58:52<124:27:00, 30.63s/it] 15%|█▌ | 2659/17285 [23:59:30<133:20:57, 32.82s/it] 15%|█▌ | 2660/17285 [23:59:57<126:21:12, 31.10s/it] {'loss': 1.6987, 'learning_rate': 0.00019435850442328637, 'epoch': 0.46} + 15%|█▌ | 2660/17285 [23:59:57<126:21:12, 31.10s/it] 15%|█▌ | 2661/17285 [24:00:27<124:55:47, 30.75s/it] 15%|█▌ | 2662/17285 [24:01:02<130:56:10, 32.23s/it] 15%|█▌ | 2663/17285 [24:01:36<133:02:24, 32.76s/it] 15%|█▌ | 2664/17285 [24:02:12<136:57:10, 33.72s/it] 15%|█▌ | 2665/17285 [24:02:38<127:24:50, 31.37s/it] 15%|█▌ | 2666/17285 [24:03:04<120:54:52, 29.78s/it] 15%|█▌ | 2667/17285 [24:03:33<119:53:29, 29.53s/it] 15%|█▌ | 2668/17285 [24:04:11<129:29:52, 31.89s/it] 15%|█▌ | 2669/17285 [24:04:44<131:03:56, 32.28s/it] 15%|█▌ | 2670/17285 [24:05:17<132:00:45, 32.52s/it] {'loss': 1.6923, 'learning_rate': 0.00019429497752668516, 'epoch': 0.46} + 15%|█▌ | 2670/17285 [24:05:17<132:00:45, 32.52s/it] 15%|█▌ | 2671/17285 [24:05:50<133:02:43, 32.77s/it] 15%|█▌ | 2672/17285 [24:06:26<136:54:20, 33.73s/it] 15%|█▌ | 2673/17285 [24:06:51<126:22:42, 31.14s/it] 15%|█▌ | 2674/17285 [24:07:20<123:19:57, 30.39s/it] 15%|█▌ | 2675/17285 [24:07:52<125:19:47, 30.88s/it] 15%|█▌ | 2676/17285 [24:08:22<124:41:20, 30.73s/it] 15%|█▌ | 2677/17285 [24:08:50<121:06:43, 29.85s/it] 15%|█▌ | 2678/17285 [24:09:25<126:41:46, 31.23s/it] 15%|█▌ | 2679/17285 [24:09:57<127:50:46, 31.51s/it] 16%|█▌ | 2680/17285 [24:10:26<124:28:17, 30.68s/it] {'loss': 1.6908, 'learning_rate': 0.00019423110545306908, 'epoch': 0.47} + 16%|█▌ | 2680/17285 [24:10:26<124:28:17, 30.68s/it] 16%|█▌ | 2681/17285 [24:10:55<123:06:16, 30.35s/it] 16%|█▌ | 2682/17285 [24:11:34<133:14:53, 32.85s/it] 16%|█▌ | 2683/17285 [24:12:08<134:59:04, 33.28s/it] 16%|█▌ | 2684/17285 [24:12:43<137:02:57, 33.79s/it] 16%|█▌ | 2685/17285 [24:13:21<141:58:53, 35.01s/it] 16%|█▌ | 2686/17285 [24:13:52<136:57:01, 33.77s/it] 16%|█▌ | 2687/17285 [24:14:18<127:11:05, 31.36s/it] 16%|█▌ | 2688/17285 [24:14:48<125:48:44, 31.03s/it] 16%|█▌ | 2689/17285 [24:15:18<124:14:14, 30.64s/it] 16%|█▌ | 2690/17285 [24:15:54<131:42:54, 32.49s/it] {'loss': 1.6799, 'learning_rate': 0.00019416688843624873, 'epoch': 0.47} + 16%|█▌ | 2690/17285 [24:15:54<131:42:54, 32.49s/it] 16%|█▌ | 2691/17285 [24:16:22<125:51:24, 31.05s/it] 16%|█▌ | 2692/17285 [24:16:47<118:29:11, 29.23s/it] 16%|█▌ | 2693/17285 [24:17:14<116:12:49, 28.67s/it] 16%|█▌ | 2694/17285 [24:17:50<124:13:00, 30.65s/it] 16%|█▌ | 2695/17285 [24:18:17<120:48:16, 29.81s/it] 16%|█▌ | 2696/17285 [24:18:50<124:01:06, 30.60s/it] 16%|█▌ | 2697/17285 [24:19:16<118:14:20, 29.18s/it] 16%|█▌ | 2698/17285 [24:19:44<117:13:42, 28.93s/it] 16%|█▌ | 2699/17285 [24:20:10<113:04:42, 27.91s/it] 16%|█▌ | 2700/17285 [24:20:37<112:29:51, 27.77s/it] {'loss': 1.7065, 'learning_rate': 0.00019410232671129745, 'epoch': 0.47} + 16%|█▌ | 2700/17285 [24:20:37<112:29:51, 27.77s/it] 16%|█▌ | 2701/17285 [24:21:06<113:24:46, 28.00s/it] 16%|█▌ | 2702/17285 [24:21:39<120:28:30, 29.74s/it] 16%|█▌ | 2703/17285 [24:22:08<119:09:44, 29.42s/it] 16%|█▌ | 2704/17285 [24:22:35<116:39:16, 28.80s/it] 16%|█▌ | 2705/17285 [24:23:10<123:10:32, 30.41s/it] 16%|█▌ | 2706/17285 [24:23:44<127:46:08, 31.55s/it] 16%|█▌ | 2707/17285 [24:24:18<131:12:53, 32.40s/it] 16%|█▌ | 2708/17285 [24:24:45<124:46:07, 30.81s/it] 16%|█▌ | 2709/17285 [24:25:25<135:03:57, 33.36s/it] 16%|█▌ | 2710/17285 [24:25:54<130:40:41, 32.28s/it] {'loss': 1.682, 'learning_rate': 0.0001940374205145505, 'epoch': 0.47} + 16%|█▌ | 2710/17285 [24:25:54<130:40:41, 32.28s/it] 16%|█▌ | 2711/17285 [24:26:32<136:41:24, 33.76s/it] 16%|█▌ | 2712/17285 [24:27:08<139:26:57, 34.45s/it] 16%|█▌ | 2713/17285 [24:27:39<135:14:28, 33.41s/it] 16%|█▌ | 2714/17285 [24:28:09<131:56:49, 32.60s/it] 16%|█▌ | 2715/17285 [24:28:36<124:31:46, 30.77s/it] 16%|█▌ | 2716/17285 [24:29:04<121:50:03, 30.11s/it] 16%|█▌ | 2717/17285 [24:29:30<116:37:41, 28.82s/it] 16%|█▌ | 2718/17285 [24:29:59<116:59:36, 28.91s/it] 16%|█▌ | 2719/17285 [24:30:42<134:07:16, 33.15s/it] 16%|█▌ | 2720/17285 [24:31:13<131:20:44, 32.46s/it] {'loss': 1.654, 'learning_rate': 0.00019397217008360404, 'epoch': 0.47} + 16%|█▌ | 2720/17285 [24:31:13<131:20:44, 32.46s/it] 16%|█▌ | 2721/17285 [24:31:42<126:25:02, 31.25s/it] 16%|█▌ | 2722/17285 [24:32:09<121:56:42, 30.15s/it] 16%|█▌ | 2723/17285 [24:32:36<117:45:48, 29.11s/it] 16%|█▌ | 2724/17285 [24:33:07<120:18:48, 29.75s/it] 16%|█▌ | 2725/17285 [24:33:38<121:05:14, 29.94s/it] 16%|█▌ | 2726/17285 [24:34:09<122:31:41, 30.30s/it] 16%|█▌ | 2727/17285 [24:34:43<127:07:19, 31.44s/it] 16%|█▌ | 2728/17285 [24:35:19<132:49:41, 32.85s/it] 16%|█▌ | 2729/17285 [24:35:51<131:16:38, 32.47s/it] 16%|█▌ | 2730/17285 [24:36:24<132:55:37, 32.88s/it] {'loss': 1.6809, 'learning_rate': 0.0001939065756573144, 'epoch': 0.47} + 16%|█▌ | 2730/17285 [24:36:24<132:55:37, 32.88s/it] 16%|█▌ | 2731/17285 [24:36:56<130:56:07, 32.39s/it] 16%|█▌ | 2732/17285 [24:37:21<122:01:36, 30.19s/it] 16%|█▌ | 2733/17285 [24:37:50<120:33:23, 29.82s/it] 16%|█▌ | 2734/17285 [24:38:25<127:33:17, 31.56s/it] 16%|█▌ | 2735/17285 [24:38:53<122:27:17, 30.30s/it] 16%|█▌ | 2736/17285 [24:39:29<129:29:10, 32.04s/it] 16%|█▌ | 2737/17285 [24:39:59<127:22:40, 31.52s/it] 16%|█▌ | 2738/17285 [24:40:33<130:48:28, 32.37s/it] 16%|█▌ | 2739/17285 [24:41:05<129:42:15, 32.10s/it] 16%|█▌ | 2740/17285 [24:41:36<128:04:33, 31.70s/it] {'loss': 1.6426, 'learning_rate': 0.00019384063747579706, 'epoch': 0.48} + 16%|█▌ | 2740/17285 [24:41:36<128:04:33, 31.70s/it] 16%|█▌ | 2741/17285 [24:42:11<132:01:35, 32.68s/it] 16%|█▌ | 2742/17285 [24:42:44<133:06:47, 32.95s/it] 16%|█▌ | 2743/17285 [24:43:11<125:19:34, 31.03s/it] 16%|█▌ | 2744/17285 [24:43:43<127:25:11, 31.55s/it] 16%|█▌ | 2745/17285 [24:44:17<129:17:33, 32.01s/it] 16%|█▌ | 2746/17285 [24:44:48<128:07:14, 31.72s/it] 16%|█▌ | 2747/17285 [24:45:16<124:28:22, 30.82s/it] 16%|█▌ | 2748/17285 [24:45:49<126:17:01, 31.27s/it] 16%|█▌ | 2749/17285 [24:46:13<117:47:09, 29.17s/it] 16%|█▌ | 2750/17285 [24:46:43<119:05:48, 29.50s/it] {'loss': 1.6453, 'learning_rate': 0.00019377435578042592, 'epoch': 0.48} + 16%|█▌ | 2750/17285 [24:46:43<119:05:48, 29.50s/it] 16%|█▌ | 2751/17285 [24:47:12<118:23:39, 29.33s/it] 16%|█▌ | 2752/17285 [24:47:46<124:19:17, 30.80s/it] 16%|█▌ | 2753/17285 [24:48:17<123:56:31, 30.70s/it] 16%|█▌ | 2754/17285 [24:48:47<122:55:35, 30.45s/it] 16%|█▌ | 2755/17285 [24:49:17<122:40:43, 30.40s/it] 16%|█▌ | 2756/17285 [24:49:55<132:32:27, 32.84s/it] 16%|█▌ | 2757/17285 [24:50:27<130:31:34, 32.34s/it] 16%|█▌ | 2758/17285 [24:51:00<132:01:09, 32.72s/it] 16%|█▌ | 2759/17285 [24:51:35<134:37:29, 33.36s/it] 16%|█▌ | 2760/17285 [24:52:08<133:52:15, 33.18s/it] {'loss': 1.6419, 'learning_rate': 0.00019370773081383235, 'epoch': 0.48} + 16%|█▌ | 2760/17285 [24:52:08<133:52:15, 33.18s/it] 16%|█▌ | 2761/17285 [24:52:47<141:18:52, 35.03s/it] 16%|█▌ | 2762/17285 [24:53:20<138:36:05, 34.36s/it] 16%|█▌ | 2763/17285 [24:53:56<140:44:52, 34.89s/it] 16%|█▌ | 2764/17285 [24:54:27<135:55:37, 33.70s/it] 16%|█▌ | 2765/17285 [24:54:55<129:02:52, 32.00s/it] 16%|█▌ | 2766/17285 [24:55:22<122:37:21, 30.40s/it] 16%|█▌ | 2767/17285 [24:56:06<139:05:33, 34.49s/it] 16%|█▌ | 2768/17285 [24:56:32<128:52:12, 31.96s/it] 16%|█▌ | 2769/17285 [24:57:04<129:10:26, 32.04s/it] 16%|█▌ | 2770/17285 [24:57:35<127:17:28, 31.57s/it] {'loss': 1.7025, 'learning_rate': 0.00019364076281990427, 'epoch': 0.48} + 16%|█▌ | 2770/17285 [24:57:35<127:17:28, 31.57s/it] 16%|█▌ | 2771/17285 [24:58:04<124:59:45, 31.00s/it] 16%|█▌ | 2772/17285 [24:58:33<122:19:18, 30.34s/it] 16%|█▌ | 2773/17285 [24:59:04<123:15:38, 30.58s/it] 16%|█▌ | 2774/17285 [24:59:34<121:59:11, 30.26s/it] 16%|█▌ | 2775/17285 [25:00:05<123:22:39, 30.61s/it] 16%|█▌ | 2776/17285 [25:00:36<124:16:26, 30.84s/it] 16%|█▌ | 2777/17285 [25:01:13<130:39:31, 32.42s/it] 16%|█▌ | 2778/17285 [25:01:45<130:12:12, 32.31s/it] 16%|█▌ | 2779/17285 [25:02:13<125:01:44, 31.03s/it] 16%|█▌ | 2780/17285 [25:02:45<126:09:05, 31.31s/it] {'loss': 1.6897, 'learning_rate': 0.0001935734520437853, 'epoch': 0.48} + 16%|█▌ | 2780/17285 [25:02:45<126:09:05, 31.31s/it] 16%|█▌ | 2781/17285 [25:03:22<133:52:38, 33.23s/it] 16%|█▌ | 2782/17285 [25:03:50<126:33:38, 31.42s/it] 16%|█▌ | 2783/17285 [25:04:23<129:35:29, 32.17s/it] 16%|█▌ | 2784/17285 [25:04:49<122:09:02, 30.32s/it] 16%|█▌ | 2785/17285 [25:05:19<120:55:36, 30.02s/it] 16%|█▌ | 2786/17285 [25:05:47<118:54:38, 29.52s/it] 16%|█▌ | 2787/17285 [25:06:15<116:22:57, 28.90s/it] 16%|█▌ | 2788/17285 [25:06:46<119:57:05, 29.79s/it] 16%|█▌ | 2789/17285 [25:07:17<120:29:38, 29.92s/it] 16%|█▌ | 2790/17285 [25:07:54<129:39:35, 32.20s/it] {'loss': 1.652, 'learning_rate': 0.00019350579873187384, 'epoch': 0.48} + 16%|█▌ | 2790/17285 [25:07:54<129:39:35, 32.20s/it] 16%|█▌ | 2791/17285 [25:08:20<121:37:07, 30.21s/it] 16%|█▌ | 2792/17285 [25:08:54<126:10:17, 31.34s/it] 16%|█▌ | 2793/17285 [25:09:34<137:15:56, 34.10s/it] 16%|█▌ | 2794/17285 [25:10:10<139:17:40, 34.60s/it] 16%|█▌ | 2795/17285 [25:10:36<129:17:37, 32.12s/it] 16%|█▌ | 2796/17285 [25:11:04<124:23:35, 30.91s/it] 16%|█▌ | 2797/17285 [25:11:33<120:58:23, 30.06s/it] 16%|█▌ | 2798/17285 [25:12:01<119:15:48, 29.64s/it] 16%|█▌ | 2799/17285 [25:12:36<124:53:35, 31.04s/it] 16%|█▌ | 2800/17285 [25:13:07<125:18:33, 31.14s/it] {'loss': 1.6461, 'learning_rate': 0.0001934378031318222, 'epoch': 0.49} + 16%|█▌ | 2800/17285 [25:13:07<125:18:33, 31.14s/it] 16%|█▌ | 2801/17285 [25:13:37<124:19:16, 30.90s/it] 16%|█▌ | 2802/17285 [25:14:11<127:36:12, 31.72s/it] 16%|█▌ | 2803/17285 [25:14:41<126:03:56, 31.34s/it] 16%|█▌ | 2804/17285 [25:15:15<129:20:20, 32.15s/it] 16%|█▌ | 2805/17285 [25:15:50<131:57:17, 32.81s/it] 16%|█▌ | 2806/17285 [25:16:21<130:34:42, 32.47s/it] 16%|█▌ | 2807/17285 [25:16:57<134:28:47, 33.44s/it] 16%|█▌ | 2808/17285 [25:17:37<142:12:01, 35.36s/it] 16%|█▋ | 2809/17285 [25:18:15<145:32:37, 36.19s/it] 16%|█▋ | 2810/17285 [25:18:44<136:32:21, 33.96s/it] {'loss': 1.6482, 'learning_rate': 0.00019336946549253567, 'epoch': 0.49} + 16%|█▋ | 2810/17285 [25:18:44<136:32:21, 33.96s/it] 16%|█▋ | 2811/17285 [25:19:14<131:31:41, 32.71s/it] 16%|█▋ | 2812/17285 [25:19:44<128:42:48, 32.02s/it] 16%|█▋ | 2813/17285 [25:20:16<128:23:53, 31.94s/it] 16%|█▋ | 2814/17285 [25:20:47<127:36:42, 31.75s/it] 16%|█▋ | 2815/17285 [25:21:17<125:54:22, 31.32s/it] 16%|█▋ | 2816/17285 [25:21:44<120:20:07, 29.94s/it] 16%|█▋ | 2817/17285 [25:22:22<130:00:43, 32.35s/it] 16%|█▋ | 2818/17285 [25:22:54<129:20:31, 32.19s/it] 16%|█▋ | 2819/17285 [25:23:21<123:09:53, 30.65s/it] 16%|█▋ | 2820/17285 [25:23:57<129:39:18, 32.27s/it] {'loss': 1.684, 'learning_rate': 0.00019330078606417164, 'epoch': 0.49} + 16%|█▋ | 2820/17285 [25:23:57<129:39:18, 32.27s/it] 16%|█▋ | 2821/17285 [25:24:26<125:36:34, 31.26s/it] 16%|█▋ | 2822/17285 [25:24:57<125:51:11, 31.33s/it] 16%|█▋ | 2823/17285 [25:25:30<127:05:40, 31.64s/it] 16%|█▋ | 2824/17285 [25:25:59<124:32:32, 31.00s/it] 16%|█▋ | 2825/17285 [25:26:29<122:53:06, 30.59s/it] 16%|█▋ | 2826/17285 [25:27:00<122:51:44, 30.59s/it] 16%|█▋ | 2827/17285 [25:27:34<127:05:43, 31.65s/it] 16%|█▋ | 2828/17285 [25:28:05<126:29:56, 31.50s/it] 16%|█▋ | 2829/17285 [25:28:38<129:07:43, 32.16s/it] 16%|█▋ | 2830/17285 [25:29:15<133:56:46, 33.36s/it] {'loss': 1.7247, 'learning_rate': 0.00019323176509813855, 'epoch': 0.49} + 16%|█▋ | 2830/17285 [25:29:15<133:56:46, 33.36s/it] 16%|█▋ | 2831/17285 [25:29:46<131:26:47, 32.74s/it] 16%|█▋ | 2832/17285 [25:30:18<131:03:55, 32.65s/it] 16%|█▋ | 2833/17285 [25:30:56<137:30:24, 34.25s/it] 16%|█▋ | 2834/17285 [25:31:34<141:19:29, 35.21s/it] 16%|█▋ | 2835/17285 [25:31:59<129:47:08, 32.33s/it] 16%|█▋ | 2836/17285 [25:32:29<126:44:41, 31.58s/it] 16%|█▋ | 2837/17285 [25:33:00<125:41:55, 31.32s/it] 16%|█▋ | 2838/17285 [25:33:27<120:19:09, 29.98s/it] 16%|█▋ | 2839/17285 [25:33:56<118:59:41, 29.65s/it] 16%|█▋ | 2840/17285 [25:34:34<129:36:13, 32.30s/it] {'loss': 1.6417, 'learning_rate': 0.0001931624028470952, 'epoch': 0.49} + 16%|█▋ | 2840/17285 [25:34:34<129:36:13, 32.30s/it] 16%|█▋ | 2841/17285 [25:35:07<129:49:47, 32.36s/it] 16%|█▋ | 2842/17285 [25:35:36<125:42:45, 31.33s/it] 16%|█▋ | 2843/17285 [25:36:14<134:21:42, 33.49s/it] 16%|█▋ | 2844/17285 [25:36:46<132:22:07, 33.00s/it] 16%|█▋ | 2845/17285 [25:37:21<134:21:27, 33.50s/it] 16%|█▋ | 2846/17285 [25:37:52<131:46:36, 32.86s/it] 16%|█▋ | 2847/17285 [25:38:19<124:35:37, 31.07s/it] 16%|█▋ | 2848/17285 [25:38:50<124:51:13, 31.13s/it] 16%|█▋ | 2849/17285 [25:39:15<117:28:31, 29.30s/it] 16%|█▋ | 2850/17285 [25:39:44<116:34:34, 29.07s/it] {'loss': 1.688, 'learning_rate': 0.00019309269956494963, 'epoch': 0.49} + 16%|█▋ | 2850/17285 [25:39:44<116:34:34, 29.07s/it] 16%|█▋ | 2851/17285 [25:40:14<118:27:55, 29.55s/it] 16%|█▋ | 2852/17285 [25:40:53<129:51:20, 32.39s/it] 17%|█▋ | 2853/17285 [25:41:25<128:18:07, 32.00s/it] 17%|█▋ | 2854/17285 [25:41:55<126:58:41, 31.68s/it] 17%|█▋ | 2855/17285 [25:42:30<130:21:20, 32.52s/it] 17%|█▋ | 2856/17285 [25:42:58<125:30:38, 31.31s/it] 17%|█▋ | 2857/17285 [25:43:33<129:21:23, 32.28s/it][2023-08-24 01:38:46,189] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 17%|█▋ | 2858/17285 [25:44:09<133:16:58, 33.26s/it] 17%|█▋ | 2859/17285 [25:44:40<131:42:16, 32.87s/it] 17%|█▋ | 2860/17285 [25:45:11<129:20:04, 32.28s/it] {'loss': 1.6498, 'learning_rate': 0.00019302967524028727, 'epoch': 0.5} + 17%|█▋ | 2860/17285 [25:45:11<129:20:04, 32.28s/it] 17%|█▋ | 2861/17285 [25:45:44<129:17:23, 32.27s/it] 17%|█▋ | 2862/17285 [25:46:17<130:50:38, 32.66s/it] 17%|█▋ | 2863/17285 [25:46:48<128:18:38, 32.03s/it] 17%|█▋ | 2864/17285 [25:47:26<135:16:56, 33.77s/it] 17%|█▋ | 2865/17285 [25:47:52<126:35:39, 31.60s/it] 17%|█▋ | 2866/17285 [25:48:18<119:53:03, 29.93s/it] 17%|█▋ | 2867/17285 [25:48:48<120:05:07, 29.98s/it] 17%|█▋ | 2868/17285 [25:49:21<123:50:59, 30.93s/it] 17%|█▋ | 2869/17285 [25:49:57<128:54:07, 32.19s/it] 17%|█▋ | 2870/17285 [25:50:27<126:37:25, 31.62s/it] {'loss': 1.6872, 'learning_rate': 0.00019295932470303454, 'epoch': 0.5} + 17%|█▋ | 2870/17285 [25:50:27<126:37:25, 31.62s/it] 17%|█▋ | 2871/17285 [25:51:00<128:12:52, 32.02s/it][2023-08-24 01:46:05,350] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 17%|█▋ | 2872/17285 [25:51:28<123:14:36, 30.78s/it] 17%|█▋ | 2873/17285 [25:51:58<122:30:10, 30.60s/it] 17%|█▋ | 2874/17285 [25:52:26<119:10:57, 29.77s/it] 17%|█▋ | 2875/17285 [25:52:51<114:11:28, 28.53s/it] 17%|█▋ | 2876/17285 [25:53:16<109:49:49, 27.44s/it] 17%|█▋ | 2877/17285 [25:53:47<113:47:02, 28.43s/it] 17%|█▋ | 2878/17285 [25:54:17<115:32:43, 28.87s/it] 17%|█▋ | 2879/17285 [25:54:46<116:14:10, 29.05s/it] 17%|█▋ | 2880/17285 [25:55:19<120:09:44, 30.03s/it] {'loss': 1.6668, 'learning_rate': 0.00019289571826614754, 'epoch': 0.5} + 17%|█▋ | 2880/17285 [25:55:19<120:09:44, 30.03s/it][2023-08-24 01:50:23,306] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 17%|█▋ | 2881/17285 [25:55:46<116:30:21, 29.12s/it] 17%|█▋ | 2882/17285 [25:56:11<112:01:52, 28.00s/it] 17%|█▋ | 2883/17285 [25:56:43<116:57:35, 29.24s/it] 17%|█▋ | 2884/17285 [25:57:15<120:41:54, 30.17s/it] 17%|█▋ | 2885/17285 [25:57:52<128:46:40, 32.19s/it] 17%|█▋ | 2886/17285 [25:58:31<136:06:13, 34.03s/it] 17%|█▋ | 2887/17285 [25:59:10<141:58:26, 35.50s/it] 17%|█▋ | 2888/17285 [25:59:46<142:31:04, 35.64s/it] 17%|█▋ | 2889/17285 [26:00:19<139:14:19, 34.82s/it] 17%|█▋ | 2890/17285 [26:00:44<128:02:44, 32.02s/it] {'loss': 1.652, 'learning_rate': 0.00019283183638479643, 'epoch': 0.5} + 17%|█▋ | 2890/17285 [26:00:44<128:02:44, 32.02s/it] 17%|█▋ | 2891/17285 [26:01:14<125:32:46, 31.40s/it] 17%|█▋ | 2892/17285 [26:01:50<130:40:36, 32.69s/it] 17%|█▋ | 2893/17285 [26:02:19<127:02:12, 31.78s/it] 17%|█▋ | 2894/17285 [26:02:50<125:59:47, 31.52s/it] 17%|█▋ | 2895/17285 [26:03:24<128:12:17, 32.07s/it] 17%|█▋ | 2896/17285 [26:03:54<125:46:28, 31.47s/it] 17%|█▋ | 2897/17285 [26:04:22<121:50:06, 30.48s/it] 17%|█▋ | 2898/17285 [26:04:48<116:09:37, 29.07s/it] 17%|█▋ | 2899/17285 [26:05:13<112:04:36, 28.05s/it] 17%|█▋ | 2900/17285 [26:05:49<121:27:01, 30.39s/it] {'loss': 1.6777, 'learning_rate': 0.00019276053369488895, 'epoch': 0.5} + 17%|█▋ | 2900/17285 [26:05:49<121:27:01, 30.39s/it] 17%|█▋ | 2901/17285 [26:06:20<121:58:56, 30.53s/it] 17%|█▋ | 2902/17285 [26:06:50<121:37:59, 30.44s/it] 17%|█▋ | 2903/17285 [26:07:17<117:11:20, 29.33s/it] 17%|█▋ | 2904/17285 [26:07:45<115:08:57, 28.83s/it] 17%|█▋ | 2905/17285 [26:08:15<117:33:37, 29.43s/it] 17%|█▋ | 2906/17285 [26:08:46<119:01:56, 29.80s/it] 17%|█▋ | 2907/17285 [26:09:30<136:24:16, 34.15s/it] 17%|█▋ | 2908/17285 [26:10:07<139:18:52, 34.88s/it] 17%|█▋ | 2909/17285 [26:10:45<143:07:07, 35.84s/it] 17%|█▋ | 2910/17285 [26:11:15<136:10:35, 34.10s/it] {'loss': 1.6466, 'learning_rate': 0.000192688891444965, 'epoch': 0.51} + 17%|█▋ | 2910/17285 [26:11:15<136:10:35, 34.10s/it] 17%|█▋ | 2911/17285 [26:11:52<138:53:52, 34.79s/it] 17%|█▋ | 2912/17285 [26:12:36<150:41:16, 37.74s/it] 17%|█▋ | 2913/17285 [26:13:11<147:41:54, 37.00s/it] 17%|█▋ | 2914/17285 [26:13:49<148:51:59, 37.29s/it] 17%|█▋ | 2915/17285 [26:14:23<144:36:12, 36.23s/it] 17%|█▋ | 2916/17285 [26:14:53<136:40:58, 34.24s/it] 17%|█▋ | 2917/17285 [26:15:22<130:33:01, 32.71s/it] 17%|█▋ | 2918/17285 [26:15:51<126:15:34, 31.64s/it] 17%|█▋ | 2919/17285 [26:16:20<123:26:20, 30.93s/it] 17%|█▋ | 2920/17285 [26:16:59<132:41:27, 33.25s/it] {'loss': 1.6432, 'learning_rate': 0.00019261690989727875, 'epoch': 0.51} + 17%|█▋ | 2920/17285 [26:16:59<132:41:27, 33.25s/it] 17%|█▋ | 2921/17285 [26:17:35<136:00:20, 34.09s/it] 17%|█▋ | 2922/17285 [26:18:05<131:20:06, 32.92s/it] 17%|█▋ | 2923/17285 [26:18:32<124:04:48, 31.10s/it] 17%|█▋ | 2924/17285 [26:19:12<135:11:08, 33.89s/it] 17%|█▋ | 2925/17285 [26:19:45<133:10:58, 33.39s/it] 17%|█▋ | 2926/17285 [26:20:22<137:43:49, 34.53s/it] 17%|█▋ | 2927/17285 [26:20:57<138:02:39, 34.61s/it] 17%|█▋ | 2928/17285 [26:21:30<136:34:33, 34.25s/it] 17%|█▋ | 2929/17285 [26:22:05<137:37:33, 34.51s/it] 17%|█▋ | 2930/17285 [26:22:39<136:28:22, 34.23s/it] {'loss': 1.6499, 'learning_rate': 0.00019254458931532655, 'epoch': 0.51} + 17%|█▋ | 2930/17285 [26:22:39<136:28:22, 34.23s/it] 17%|█▋ | 2931/17285 [26:23:10<133:17:12, 33.43s/it] 17%|█▋ | 2932/17285 [26:23:48<138:01:09, 34.62s/it] 17%|█▋ | 2933/17285 [26:24:17<132:12:23, 33.16s/it] 17%|█▋ | 2934/17285 [26:24:46<127:00:12, 31.86s/it] 17%|█▋ | 2935/17285 [26:25:24<134:17:45, 33.69s/it] 17%|█▋ | 2936/17285 [26:25:56<132:14:52, 33.18s/it] 17%|█▋ | 2937/17285 [26:26:28<131:00:24, 32.87s/it] 17%|█▋ | 2938/17285 [26:26:59<127:45:54, 32.06s/it] 17%|█▋ | 2939/17285 [26:27:39<138:14:34, 34.69s/it] 17%|█▋ | 2940/17285 [26:28:07<129:20:40, 32.46s/it] {'loss': 1.6599, 'learning_rate': 0.00019247192996384572, 'epoch': 0.51} + 17%|█▋ | 2940/17285 [26:28:07<129:20:40, 32.46s/it] 17%|█▋ | 2941/17285 [26:28:43<133:32:09, 33.51s/it] 17%|█▋ | 2942/17285 [26:29:18<135:36:56, 34.04s/it] 17%|█▋ | 2943/17285 [26:29:52<135:29:37, 34.01s/it] 17%|█▋ | 2944/17285 [26:30:18<125:37:52, 31.54s/it] 17%|█▋ | 2945/17285 [26:30:47<122:44:47, 30.82s/it] 17%|█▋ | 2946/17285 [26:31:14<118:28:31, 29.74s/it] 17%|█▋ | 2947/17285 [26:31:51<126:45:14, 31.83s/it] 17%|█▋ | 2948/17285 [26:32:25<129:34:01, 32.53s/it] 17%|█▋ | 2949/17285 [26:33:01<134:19:08, 33.73s/it] 17%|█▋ | 2950/17285 [26:33:40<139:54:22, 35.14s/it] {'loss': 1.6458, 'learning_rate': 0.00019239893210881373, 'epoch': 0.51} + 17%|█▋ | 2950/17285 [26:33:40<139:54:22, 35.14s/it] 17%|█▋ | 2951/17285 [26:34:24<151:13:20, 37.98s/it] 17%|█▋ | 2952/17285 [26:34:52<138:59:45, 34.91s/it] 17%|█▋ | 2953/17285 [26:35:22<132:45:41, 33.35s/it] 17%|█▋ | 2954/17285 [26:35:47<122:56:03, 30.88s/it] 17%|█▋ | 2955/17285 [26:36:13<116:51:34, 29.36s/it] 17%|█▋ | 2956/17285 [26:36:56<133:37:29, 33.57s/it] 17%|█▋ | 2957/17285 [26:37:27<130:48:19, 32.87s/it] 17%|█▋ | 2958/17285 [26:38:02<132:48:23, 33.37s/it] 17%|█▋ | 2959/17285 [26:38:36<133:37:49, 33.58s/it] 17%|█▋ | 2960/17285 [26:39:05<128:11:48, 32.22s/it] {'loss': 1.69, 'learning_rate': 0.00019232559601744712, 'epoch': 0.51} + 17%|█▋ | 2960/17285 [26:39:05<128:11:48, 32.22s/it] 17%|█▋ | 2961/17285 [26:39:43<135:30:52, 34.06s/it] 17%|█▋ | 2962/17285 [26:40:15<132:31:15, 33.31s/it] 17%|█▋ | 2963/17285 [26:40:45<128:05:27, 32.20s/it] 17%|█▋ | 2964/17285 [26:41:15<126:09:09, 31.71s/it] 17%|█▋ | 2965/17285 [26:41:55<136:19:31, 34.27s/it] 17%|█▋ | 2966/17285 [26:42:28<134:10:02, 33.73s/it] 17%|█▋ | 2967/17285 [26:43:02<134:21:52, 33.78s/it] 17%|█▋ | 2968/17285 [26:43:35<133:17:05, 33.51s/it] 17%|█▋ | 2969/17285 [26:44:04<128:43:25, 32.37s/it] 17%|█▋ | 2970/17285 [26:44:38<129:48:40, 32.65s/it] {'loss': 1.6294, 'learning_rate': 0.00019225192195820067, 'epoch': 0.52} + 17%|█▋ | 2970/17285 [26:44:38<129:48:40, 32.65s/it] 17%|█▋ | 2971/17285 [26:45:05<123:18:32, 31.01s/it] 17%|█▋ | 2972/17285 [26:45:45<134:14:31, 33.76s/it] 17%|█▋ | 2973/17285 [26:46:15<129:30:52, 32.58s/it] 17%|█▋ | 2974/17285 [26:46:51<133:12:35, 33.51s/it] 17%|█▋ | 2975/17285 [26:47:21<130:02:23, 32.71s/it] 17%|█▋ | 2976/17285 [26:47:56<131:50:40, 33.17s/it] 17%|█▋ | 2977/17285 [26:48:24<126:23:09, 31.80s/it] 17%|█▋ | 2978/17285 [26:48:52<122:08:13, 30.73s/it] 17%|█▋ | 2979/17285 [26:49:26<125:02:33, 31.47s/it] 17%|█▋ | 2980/17285 [26:49:51<118:05:22, 29.72s/it] {'loss': 1.7088, 'learning_rate': 0.00019217791020076627, 'epoch': 0.52} + 17%|█▋ | 2980/17285 [26:49:51<118:05:22, 29.72s/it] 17%|█▋ | 2981/17285 [26:50:27<125:33:51, 31.60s/it] 17%|█▋ | 2982/17285 [26:50:58<125:01:47, 31.47s/it] 17%|█▋ | 2983/17285 [26:51:30<124:56:52, 31.45s/it] 17%|█▋ | 2984/17285 [26:52:05<129:14:30, 32.53s/it] 17%|█▋ | 2985/17285 [26:52:35<125:54:38, 31.70s/it] 17%|█▋ | 2986/17285 [26:53:15<136:23:54, 34.34s/it] 17%|█▋ | 2987/17285 [26:53:43<128:26:13, 32.34s/it] 17%|█▋ | 2988/17285 [26:54:09<120:35:13, 30.36s/it] 17%|█▋ | 2989/17285 [26:54:38<118:58:51, 29.96s/it] 17%|█▋ | 2990/17285 [26:55:10<121:54:14, 30.70s/it] {'loss': 1.6531, 'learning_rate': 0.000192103561016072, 'epoch': 0.52} + 17%|█▋ | 2990/17285 [26:55:10<121:54:14, 30.70s/it] 17%|█▋ | 2991/17285 [26:55:40<121:37:09, 30.63s/it] 17%|█▋ | 2992/17285 [26:56:16<127:58:13, 32.23s/it] 17%|█▋ | 2993/17285 [26:56:43<121:19:37, 30.56s/it] 17%|█▋ | 2994/17285 [26:57:22<130:46:54, 32.94s/it] 17%|█▋ | 2995/17285 [26:57:51<127:05:28, 32.02s/it] 17%|█▋ | 2996/17285 [26:58:24<128:00:29, 32.25s/it] 17%|█▋ | 2997/17285 [26:58:57<127:59:44, 32.25s/it] 17%|█▋ | 2998/17285 [26:59:37<138:15:47, 34.84s/it] 17%|█▋ | 2999/17285 [27:00:16<142:36:34, 35.94s/it] 17%|█▋ | 3000/17285 [27:00:53<143:27:03, 36.15s/it] {'loss': 1.6708, 'learning_rate': 0.00019202887467628115, 'epoch': 0.52} + 17%|█▋ | 3000/17285 [27:00:53<143:27:03, 36.15s/it][INFO|trainer.py:3081] 2023-08-24 02:55:30,265 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-24 02:55:30,265 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-24 02:55:30,265 >> Batch size = 2 + + 0%| | 0/33 [00:00> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-3000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-24 02:56:54,661 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-3000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-3000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-3000 + 17%|█▋ | 3001/17285 [27:02:51<241:21:51, 60.83s/it] 17%|█▋ | 3002/17285 [27:03:19<201:52:36, 50.88s/it] 17%|█▋ | 3003/17285 [27:03:50<179:06:12, 45.15s/it] 17%|█▋ | 3004/17285 [27:04:25<165:57:54, 41.84s/it] 17%|█▋ | 3005/17285 [27:05:07<166:08:14, 41.88s/it] 17%|█▋ | 3006/17285 [27:05:31<145:23:56, 36.66s/it] 17%|█▋ | 3007/17285 [27:06:03<139:28:58, 35.17s/it] 17%|█▋ | 3008/17285 [27:06:35<136:15:20, 34.36s/it] 17%|█▋ | 3009/17285 [27:07:07<133:07:05, 33.57s/it] 17%|█▋ | 3010/17285 [27:07:35<126:59:16, 32.03s/it] {'loss': 1.6732, 'learning_rate': 0.00019195385145479116, 'epoch': 0.52} + 17%|█▋ | 3010/17285 [27:07:35<126:59:16, 32.03s/it] 17%|█▋ | 3011/17285 [27:08:06<125:26:04, 31.64s/it] 17%|█▋ | 3012/17285 [27:08:42<130:15:42, 32.86s/it] 17%|█▋ | 3013/17285 [27:09:17<132:56:03, 33.53s/it] 17%|█▋ | 3014/17285 [27:09:47<128:36:37, 32.44s/it] 17%|█▋ | 3015/17285 [27:10:23<133:14:59, 33.62s/it] 17%|█▋ | 3016/17285 [27:10:50<124:43:12, 31.47s/it] 17%|█▋ | 3017/17285 [27:11:15<117:16:53, 29.59s/it] 17%|█▋ | 3018/17285 [27:11:42<114:04:15, 28.78s/it] 17%|█▋ | 3019/17285 [27:12:11<115:03:51, 29.04s/it] 17%|█▋ | 3020/17285 [27:12:44<120:01:31, 30.29s/it] {'loss': 1.6961, 'learning_rate': 0.0001918784916262327, 'epoch': 0.52} + 17%|█▋ | 3020/17285 [27:12:44<120:01:31, 30.29s/it] 17%|█▋ | 3021/17285 [27:13:15<119:59:03, 30.28s/it] 17%|█▋ | 3022/17285 [27:13:45<119:34:56, 30.18s/it] 17%|█▋ | 3023/17285 [27:14:14<119:03:52, 30.05s/it] 17%|█▋ | 3024/17285 [27:14:58<135:26:01, 34.19s/it] 18%|█▊ | 3025/17285 [27:15:24<124:53:04, 31.53s/it] 18%|█▊ | 3026/17285 [27:15:54<123:42:58, 31.23s/it] 18%|█▊ | 3027/17285 [27:16:28<127:09:37, 32.11s/it] 18%|█▊ | 3028/17285 [27:17:09<137:20:09, 34.68s/it] 18%|█▊ | 3029/17285 [27:17:37<129:46:18, 32.77s/it] 18%|█▊ | 3030/17285 [27:18:11<130:48:51, 33.04s/it] {'loss': 1.6361, 'learning_rate': 0.0001918027954664686, 'epoch': 0.53} + 18%|█▊ | 3030/17285 [27:18:11<130:48:51, 33.04s/it] 18%|█▊ | 3031/17285 [27:18:40<126:09:36, 31.86s/it] 18%|█▊ | 3032/17285 [27:19:09<122:49:08, 31.02s/it] 18%|█▊ | 3033/17285 [27:19:40<122:32:20, 30.95s/it] 18%|█▊ | 3034/17285 [27:20:13<124:54:11, 31.55s/it] 18%|█▊ | 3035/17285 [27:20:45<125:41:20, 31.75s/it] 18%|█▊ | 3036/17285 [27:21:15<123:02:37, 31.09s/it] 18%|█▊ | 3037/17285 [27:21:48<125:30:57, 31.71s/it] 18%|█▊ | 3038/17285 [27:22:27<134:46:01, 34.05s/it] 18%|█▊ | 3039/17285 [27:23:02<134:56:26, 34.10s/it] 18%|█▊ | 3040/17285 [27:23:32<130:56:08, 33.09s/it] {'loss': 1.708, 'learning_rate': 0.00019172676325259288, 'epoch': 0.53} + 18%|█▊ | 3040/17285 [27:23:32<130:56:08, 33.09s/it] 18%|█▊ | 3041/17285 [27:24:01<125:41:49, 31.77s/it] 18%|█▊ | 3042/17285 [27:24:31<124:08:17, 31.38s/it] 18%|█▊ | 3043/17285 [27:24:59<119:08:00, 30.11s/it] 18%|█▊ | 3044/17285 [27:25:26<115:24:22, 29.17s/it] 18%|█▊ | 3045/17285 [27:25:58<119:41:06, 30.26s/it] 18%|█▊ | 3046/17285 [27:26:29<119:49:15, 30.29s/it] 18%|█▊ | 3047/17285 [27:27:08<130:11:03, 32.92s/it] 18%|█▊ | 3048/17285 [27:27:45<135:03:27, 34.15s/it] 18%|█▊ | 3049/17285 [27:28:22<138:55:04, 35.13s/it] 18%|█▊ | 3050/17285 [27:28:55<135:41:06, 34.31s/it] {'loss': 1.6377, 'learning_rate': 0.00019165039526292975, 'epoch': 0.53} + 18%|█▊ | 3050/17285 [27:28:55<135:41:06, 34.31s/it] 18%|█▊ | 3051/17285 [27:29:23<128:04:57, 32.39s/it] 18%|█▊ | 3052/17285 [27:29:59<132:47:01, 33.59s/it] 18%|█▊ | 3053/17285 [27:30:32<132:33:01, 33.53s/it] 18%|█▊ | 3054/17285 [27:31:03<128:53:41, 32.61s/it] 18%|█▊ | 3055/17285 [27:31:34<126:56:20, 32.11s/it] 18%|█▊ | 3056/17285 [27:32:06<127:09:30, 32.17s/it] 18%|█▊ | 3057/17285 [27:32:43<132:40:52, 33.57s/it] 18%|█▊ | 3058/17285 [27:33:19<136:10:22, 34.46s/it] 18%|█▊ | 3059/17285 [27:33:52<133:40:54, 33.83s/it] 18%|█▊ | 3060/17285 [27:34:21<127:56:53, 32.38s/it] {'loss': 1.667, 'learning_rate': 0.0001915736917770325, 'epoch': 0.53} + 18%|█▊ | 3060/17285 [27:34:21<127:56:53, 32.38s/it] 18%|█▊ | 3061/17285 [27:34:57<132:34:33, 33.55s/it] 18%|█▊ | 3062/17285 [27:35:24<124:55:06, 31.62s/it] 18%|█▊ | 3063/17285 [27:35:57<125:54:21, 31.87s/it] 18%|█▊ | 3064/17285 [27:36:30<128:00:48, 32.41s/it] 18%|█▊ | 3065/17285 [27:37:01<125:43:51, 31.83s/it] 18%|█▊ | 3066/17285 [27:37:31<124:11:05, 31.44s/it] 18%|█▊ | 3067/17285 [27:38:04<125:09:59, 31.69s/it] 18%|█▊ | 3068/17285 [27:38:31<120:29:04, 30.51s/it] 18%|█▊ | 3069/17285 [27:39:07<126:49:43, 32.12s/it] 18%|█▊ | 3070/17285 [27:39:37<124:08:57, 31.44s/it] {'loss': 1.6649, 'learning_rate': 0.00019149665307568263, 'epoch': 0.53} + 18%|█▊ | 3070/17285 [27:39:37<124:08:57, 31.44s/it] 18%|█▊ | 3071/17285 [27:40:11<127:32:11, 32.30s/it] 18%|█▊ | 3072/17285 [27:40:37<119:05:07, 30.16s/it] 18%|█▊ | 3073/17285 [27:41:02<113:40:39, 28.80s/it] 18%|█▊ | 3074/17285 [27:41:36<119:56:01, 30.38s/it] 18%|█▊ | 3075/17285 [27:42:07<120:37:31, 30.56s/it] 18%|█▊ | 3076/17285 [27:42:34<115:39:29, 29.30s/it] 18%|█▊ | 3077/17285 [27:43:10<124:18:43, 31.50s/it] 18%|█▊ | 3078/17285 [27:43:42<125:10:11, 31.72s/it] 18%|█▊ | 3079/17285 [27:44:11<120:56:36, 30.65s/it] 18%|█▊ | 3080/17285 [27:44:37<116:14:12, 29.46s/it] {'loss': 1.6981, 'learning_rate': 0.00019141927944088863, 'epoch': 0.53} + 18%|█▊ | 3080/17285 [27:44:37<116:14:12, 29.46s/it] 18%|█▊ | 3081/17285 [27:45:09<118:42:14, 30.09s/it] 18%|█▊ | 3082/17285 [27:45:39<118:42:28, 30.09s/it] 18%|█▊ | 3083/17285 [27:46:20<132:15:56, 33.53s/it] 18%|█▊ | 3084/17285 [27:46:49<125:52:38, 31.91s/it] 18%|█▊ | 3085/17285 [27:47:20<125:10:12, 31.73s/it] 18%|█▊ | 3086/17285 [27:47:56<130:25:49, 33.07s/it] 18%|█▊ | 3087/17285 [27:48:35<137:21:16, 34.83s/it] 18%|█▊ | 3088/17285 [27:49:15<142:54:59, 36.24s/it] 18%|█▊ | 3089/17285 [27:49:41<131:37:20, 33.38s/it] 18%|█▊ | 3090/17285 [27:50:10<126:04:02, 31.97s/it] {'loss': 1.6095, 'learning_rate': 0.0001913415711558851, 'epoch': 0.54} + 18%|█▊ | 3090/17285 [27:50:10<126:04:02, 31.97s/it] 18%|█▊ | 3091/17285 [27:50:48<133:35:50, 33.88s/it] 18%|█▊ | 3092/17285 [27:51:18<128:11:11, 32.51s/it] 18%|█▊ | 3093/17285 [27:51:47<124:49:44, 31.66s/it][2023-08-24 03:46:59,868] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 18%|█▊ | 3094/17285 [27:52:22<128:39:20, 32.64s/it] 18%|█▊ | 3095/17285 [27:52:58<131:56:10, 33.47s/it][2023-08-24 03:48:05,522] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 18%|█▊ | 3096/17285 [27:53:28<128:05:55, 32.50s/it] 18%|█▊ | 3097/17285 [27:53:53<119:53:43, 30.42s/it] 18%|█▊ | 3098/17285 [27:54:28<124:38:11, 31.63s/it] 18%|█▊ | 3099/17285 [27:54:56<120:11:49, 30.50s/it] 18%|█▊ | 3100/17285 [27:55:27<120:45:27, 30.65s/it] {'loss': 1.6629, 'learning_rate': 0.00019127916377084718, 'epoch': 0.54} + 18%|█▊ | 3100/17285 [27:55:27<120:45:27, 30.65s/it] 18%|█▊ | 3101/17285 [27:56:02<126:46:54, 32.18s/it] 18%|█▊ | 3102/17285 [27:56:38<130:24:12, 33.10s/it] 18%|█▊ | 3103/17285 [27:57:10<129:42:35, 32.93s/it] 18%|█▊ | 3104/17285 [27:57:37<121:49:51, 30.93s/it] 18%|█▊ | 3105/17285 [27:58:08<122:31:37, 31.11s/it] 18%|█▊ | 3106/17285 [27:58:38<121:06:16, 30.75s/it] 18%|█▊ | 3107/17285 [27:59:09<121:09:17, 30.76s/it] 18%|█▊ | 3108/17285 [27:59:45<127:06:39, 32.28s/it] 18%|█▊ | 3109/17285 [28:00:12<121:03:02, 30.74s/it] 18%|█▊ | 3110/17285 [28:00:46<125:40:11, 31.92s/it] {'loss': 1.6908, 'learning_rate': 0.00019120085383312737, 'epoch': 0.54} + 18%|█▊ | 3110/17285 [28:00:46<125:40:11, 31.92s/it] 18%|█▊ | 3111/17285 [28:01:19<126:56:18, 32.24s/it] 18%|█▊ | 3112/17285 [28:01:45<119:07:26, 30.26s/it] 18%|█▊ | 3113/17285 [28:02:13<117:00:42, 29.72s/it] 18%|█▊ | 3114/17285 [28:02:42<115:47:37, 29.42s/it] 18%|█▊ | 3115/17285 [28:03:09<112:29:14, 28.58s/it] 18%|█▊ | 3116/17285 [28:03:47<124:08:20, 31.54s/it] 18%|█▊ | 3117/17285 [28:04:17<122:34:20, 31.14s/it] 18%|█▊ | 3118/17285 [28:04:50<124:17:23, 31.58s/it] 18%|█▊ | 3119/17285 [28:05:21<122:56:10, 31.24s/it] 18%|█▊ | 3120/17285 [28:05:49<119:56:55, 30.48s/it] {'loss': 1.6602, 'learning_rate': 0.00019112221004476872, 'epoch': 0.54} + 18%|█▊ | 3120/17285 [28:05:49<119:56:55, 30.48s/it] 18%|█▊ | 3121/17285 [28:06:23<123:52:41, 31.49s/it] 18%|█▊ | 3122/17285 [28:06:55<124:08:08, 31.55s/it] 18%|█▊ | 3123/17285 [28:07:30<128:52:02, 32.76s/it] 18%|█▊ | 3124/17285 [28:08:09<135:53:27, 34.55s/it] 18%|█▊ | 3125/17285 [28:08:43<135:14:03, 34.38s/it] 18%|█▊ | 3126/17285 [28:09:14<131:37:44, 33.47s/it] 18%|█▊ | 3127/17285 [28:09:44<127:10:25, 32.34s/it] 18%|█▊ | 3128/17285 [28:10:23<135:18:07, 34.41s/it] 18%|█▊ | 3129/17285 [28:10:56<133:22:42, 33.92s/it] 18%|█▊ | 3130/17285 [28:11:36<140:26:29, 35.72s/it] {'loss': 1.6378, 'learning_rate': 0.00019104323269365537, 'epoch': 0.54} + 18%|█▊ | 3130/17285 [28:11:36<140:26:29, 35.72s/it] 18%|█▊ | 3131/17285 [28:12:04<131:22:18, 33.41s/it] 18%|█▊ | 3132/17285 [28:12:36<129:27:33, 32.93s/it] 18%|█▊ | 3133/17285 [28:13:08<128:13:10, 32.62s/it] 18%|█▊ | 3134/17285 [28:13:50<139:43:27, 35.55s/it] 18%|█▊ | 3135/17285 [28:14:16<127:57:18, 32.55s/it] 18%|█▊ | 3136/17285 [28:14:50<129:52:18, 33.04s/it] 18%|█▊ | 3137/17285 [28:15:15<120:49:00, 30.74s/it] 18%|█▊ | 3138/17285 [28:15:46<120:22:44, 30.63s/it] 18%|█▊ | 3139/17285 [28:16:22<126:45:37, 32.26s/it] 18%|█▊ | 3140/17285 [28:16:51<123:17:56, 31.38s/it] {'loss': 1.642, 'learning_rate': 0.00019096392206889248, 'epoch': 0.54} + 18%|█▊ | 3140/17285 [28:16:51<123:17:56, 31.38s/it] 18%|█▊ | 3141/17285 [28:17:28<130:16:15, 33.16s/it] 18%|█▊ | 3142/17285 [28:18:03<132:20:34, 33.69s/it] 18%|█▊ | 3143/17285 [28:18:31<125:54:00, 32.05s/it] 18%|█▊ | 3144/17285 [28:19:01<123:24:36, 31.42s/it] 18%|█▊ | 3145/17285 [28:19:37<128:12:22, 32.64s/it] 18%|█▊ | 3146/17285 [28:20:06<123:29:11, 31.44s/it] 18%|█▊ | 3147/17285 [28:20:31<116:08:22, 29.57s/it] 18%|█▊ | 3148/17285 [28:20:58<113:31:16, 28.91s/it] 18%|█▊ | 3149/17285 [28:21:30<117:14:30, 29.86s/it] 18%|█▊ | 3150/17285 [28:22:00<117:33:54, 29.94s/it] {'loss': 1.6605, 'learning_rate': 0.00019088427846080527, 'epoch': 0.55} + 18%|█▊ | 3150/17285 [28:22:00<117:33:54, 29.94s/it] 18%|█▊ | 3151/17285 [28:22:35<122:45:52, 31.27s/it] 18%|█▊ | 3152/17285 [28:23:10<127:08:51, 32.39s/it] 18%|█▊ | 3153/17285 [28:23:39<123:09:24, 31.37s/it] 18%|█▊ | 3154/17285 [28:24:11<124:03:22, 31.60s/it] 18%|█▊ | 3155/17285 [28:24:45<127:09:16, 32.40s/it] 18%|█▊ | 3156/17285 [28:25:20<129:58:25, 33.12s/it] 18%|█▊ | 3157/17285 [28:25:57<134:06:17, 34.17s/it] 18%|█▊ | 3158/17285 [28:26:26<128:36:58, 32.78s/it] 18%|█▊ | 3159/17285 [28:27:00<130:08:12, 33.17s/it] 18%|█▊ | 3160/17285 [28:27:29<124:39:06, 31.77s/it] {'loss': 1.6055, 'learning_rate': 0.00019080430216093778, 'epoch': 0.55} + 18%|█▊ | 3160/17285 [28:27:29<124:39:06, 31.77s/it] 18%|█▊ | 3161/17285 [28:28:02<126:31:26, 32.25s/it] 18%|█▊ | 3162/17285 [28:28:33<124:47:18, 31.81s/it] 18%|█▊ | 3163/17285 [28:28:58<117:02:09, 29.84s/it] 18%|█▊ | 3164/17285 [28:29:34<124:37:42, 31.77s/it] 18%|█▊ | 3165/17285 [28:30:10<128:59:53, 32.89s/it] 18%|█▊ | 3166/17285 [28:30:47<133:56:44, 34.15s/it] 18%|█▊ | 3167/17285 [28:31:16<127:45:05, 32.58s/it] 18%|█▊ | 3168/17285 [28:31:41<119:38:54, 30.51s/it] 18%|█▊ | 3169/17285 [28:32:15<123:08:59, 31.41s/it] 18%|█▊ | 3170/17285 [28:32:52<129:49:56, 33.11s/it] {'loss': 1.6423, 'learning_rate': 0.00019072399346205197, 'epoch': 0.55} + 18%|█▊ | 3170/17285 [28:32:52<129:49:56, 33.11s/it] 18%|█▊ | 3171/17285 [28:33:34<140:20:54, 35.80s/it] 18%|█▊ | 3172/17285 [28:34:07<136:59:12, 34.94s/it] 18%|█▊ | 3173/17285 [28:34:43<137:55:23, 35.18s/it] 18%|█▊ | 3174/17285 [28:35:18<137:42:39, 35.13s/it] 18%|█▊ | 3175/17285 [28:35:51<134:48:16, 34.39s/it] 18%|█▊ | 3176/17285 [28:36:25<135:26:32, 34.56s/it] 18%|█▊ | 3177/17285 [28:36:52<125:31:48, 32.03s/it] 18%|█▊ | 3178/17285 [28:37:27<129:04:31, 32.94s/it] 18%|█▊ | 3179/17285 [28:38:05<135:48:44, 34.66s/it] 18%|█▊ | 3180/17285 [28:38:45<141:06:50, 36.02s/it] {'loss': 1.6856, 'learning_rate': 0.00019064335265812652, 'epoch': 0.55} + 18%|█▊ | 3180/17285 [28:38:45<141:06:50, 36.02s/it] 18%|█▊ | 3181/17285 [28:39:27<148:39:27, 37.94s/it] 18%|█▊ | 3182/17285 [28:40:11<155:48:00, 39.77s/it] 18%|█▊ | 3183/17285 [28:40:46<150:23:18, 38.39s/it] 18%|█▊ | 3184/17285 [28:41:16<140:08:44, 35.78s/it] 18%|█▊ | 3185/17285 [28:41:45<132:16:58, 33.77s/it] 18%|█▊ | 3186/17285 [28:42:11<123:09:24, 31.45s/it] 18%|█▊ | 3187/17285 [28:42:43<123:25:03, 31.52s/it] 18%|█▊ | 3188/17285 [28:43:11<119:42:07, 30.57s/it] 18%|█▊ | 3189/17285 [28:43:40<118:16:44, 30.21s/it] 18%|█▊ | 3190/17285 [28:44:06<113:29:53, 28.99s/it] {'loss': 1.6115, 'learning_rate': 0.00019056238004435593, 'epoch': 0.55} + 18%|█▊ | 3190/17285 [28:44:07<113:29:53, 28.99s/it] 18%|█▊ | 3191/17285 [28:44:33<110:36:42, 28.25s/it] 18%|█▊ | 3192/17285 [28:45:06<115:45:39, 29.57s/it] 18%|█▊ | 3193/17285 [28:45:35<115:16:37, 29.45s/it] 18%|█▊ | 3194/17285 [28:46:03<113:32:31, 29.01s/it] 18%|█▊ | 3195/17285 [28:46:36<118:51:55, 30.37s/it] 18%|█▊ | 3196/17285 [28:47:09<122:01:28, 31.18s/it] 18%|█▊ | 3197/17285 [28:47:46<128:32:20, 32.85s/it] 19%|█▊ | 3198/17285 [28:48:20<129:51:58, 33.19s/it] 19%|█▊ | 3199/17285 [28:48:51<126:31:34, 32.34s/it] 19%|█▊ | 3200/17285 [28:49:18<120:22:22, 30.77s/it] {'loss': 1.6145, 'learning_rate': 0.0001904810759171492, 'epoch': 0.56} + 19%|█▊ | 3200/17285 [28:49:18<120:22:22, 30.77s/it] 19%|█▊ | 3201/17285 [28:49:51<123:04:31, 31.46s/it] 19%|█▊ | 3202/17285 [28:50:24<125:12:58, 32.01s/it] 19%|█▊ | 3203/17285 [28:50:54<122:43:42, 31.37s/it] 19%|█▊ | 3204/17285 [28:51:28<126:03:40, 32.23s/it] 19%|█▊ | 3205/17285 [28:52:04<130:26:30, 33.35s/it][2023-08-24 04:47:09,001] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 19%|█▊ | 3206/17285 [28:52:31<123:16:27, 31.52s/it] 19%|█▊ | 3207/17285 [28:53:00<120:15:45, 30.75s/it][2023-08-24 04:48:03,367] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 19%|█▊ | 3208/17285 [28:53:26<113:58:46, 29.15s/it] 19%|█▊ | 3209/17285 [28:53:52<110:06:57, 28.16s/it][2023-08-24 04:49:01,052] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 19%|█▊ | 3210/17285 [28:54:23<114:24:05, 29.26s/it] {'loss': 1.6301, 'learning_rate': 0.00019042396593693816, 'epoch': 0.56} + 19%|█▊ | 3210/17285 [28:54:23<114:24:05, 29.26s/it] 19%|█▊ | 3211/17285 [28:54:59<122:07:52, 31.24s/it] 19%|█▊ | 3212/17285 [28:55:31<122:58:52, 31.46s/it] 19%|█▊ | 3213/17285 [28:56:01<121:34:21, 31.10s/it] 19%|█▊ | 3214/17285 [28:56:35<124:19:16, 31.81s/it] 19%|█▊ | 3215/17285 [28:57:03<119:32:05, 30.58s/it] 19%|█▊ | 3216/17285 [28:57:35<121:33:44, 31.11s/it] 19%|█▊ | 3217/17285 [28:58:07<122:59:46, 31.47s/it] 19%|█▊ | 3218/17285 [28:58:32<115:23:08, 29.53s/it] 19%|█▊ | 3219/17285 [28:59:06<120:05:20, 30.74s/it] 19%|█▊ | 3220/17285 [28:59:45<130:17:06, 33.35s/it] {'loss': 1.6615, 'learning_rate': 0.00019034209892058318, 'epoch': 0.56} + 19%|█▊ | 3220/17285 [28:59:45<130:17:06, 33.35s/it] 19%|█▊ | 3221/17285 [29:00:21<133:07:29, 34.08s/it] 19%|█▊ | 3222/17285 [29:00:53<131:00:26, 33.54s/it] 19%|█▊ | 3223/17285 [29:01:29<132:56:54, 34.04s/it] 19%|█▊ | 3224/17285 [29:02:12<143:42:38, 36.79s/it] 19%|█▊ | 3225/17285 [29:02:39<132:55:54, 34.04s/it] 19%|█▊ | 3226/17285 [29:03:13<132:02:17, 33.81s/it] 19%|█▊ | 3227/17285 [29:03:47<132:28:36, 33.92s/it] 19%|█▊ | 3228/17285 [29:04:21<132:40:57, 33.98s/it] 19%|█▊ | 3229/17285 [29:04:53<130:14:32, 33.36s/it] 19%|█▊ | 3230/17285 [29:05:29<133:29:37, 34.19s/it] {'loss': 1.6515, 'learning_rate': 0.00019025990119715506, 'epoch': 0.56} + 19%|█▊ | 3230/17285 [29:05:29<133:29:37, 34.19s/it] 19%|█▊ | 3231/17285 [29:06:00<129:55:08, 33.28s/it] 19%|█▊ | 3232/17285 [29:06:26<121:16:42, 31.07s/it] 19%|█▊ | 3233/17285 [29:07:05<130:19:46, 33.39s/it] 19%|█▊ | 3234/17285 [29:07:33<124:42:25, 31.95s/it] 19%|█▊ | 3235/17285 [29:08:05<124:39:53, 31.94s/it] 19%|█▊ | 3236/17285 [29:08:39<127:05:21, 32.57s/it] 19%|█▊ | 3237/17285 [29:09:09<123:22:05, 31.61s/it] 19%|█▊ | 3238/17285 [29:09:43<126:06:32, 32.32s/it] 19%|█▊ | 3239/17285 [29:10:12<122:26:43, 31.38s/it] 19%|█▊ | 3240/17285 [29:10:42<120:16:27, 30.83s/it] {'loss': 1.7024, 'learning_rate': 0.00019017737306754754, 'epoch': 0.56} + 19%|█▊ | 3240/17285 [29:10:42<120:16:27, 30.83s/it] 19%|█▉ | 3241/17285 [29:11:07<114:01:06, 29.23s/it] 19%|█▉ | 3242/17285 [29:11:34<111:23:54, 28.56s/it] 19%|█▉ | 3243/17285 [29:12:05<114:09:34, 29.27s/it] 19%|█▉ | 3244/17285 [29:12:34<114:14:40, 29.29s/it] 19%|█▉ | 3245/17285 [29:13:01<111:36:01, 28.62s/it] 19%|█▉ | 3246/17285 [29:13:38<121:23:48, 31.13s/it] 19%|█▉ | 3247/17285 [29:14:12<123:49:56, 31.76s/it] 19%|█▉ | 3248/17285 [29:14:41<121:39:32, 31.20s/it] 19%|█▉ | 3249/17285 [29:15:14<123:33:09, 31.69s/it] 19%|█▉ | 3250/17285 [29:15:46<124:06:42, 31.83s/it] {'loss': 1.6598, 'learning_rate': 0.00019009451483386375, 'epoch': 0.56} + 19%|█▉ | 3250/17285 [29:15:46<124:06:42, 31.83s/it] 19%|█▉ | 3251/17285 [29:16:17<122:46:09, 31.49s/it] 19%|█▉ | 3252/17285 [29:16:47<120:27:08, 30.90s/it] 19%|█▉ | 3253/17285 [29:17:21<123:57:07, 31.80s/it] 19%|█▉ | 3254/17285 [29:17:51<122:10:40, 31.35s/it] 19%|█▉ | 3255/17285 [29:18:28<128:56:30, 33.09s/it] 19%|█▉ | 3256/17285 [29:18:55<121:25:09, 31.16s/it] 19%|█▉ | 3257/17285 [29:19:30<125:58:55, 32.33s/it] 19%|█▉ | 3258/17285 [29:20:04<127:59:40, 32.85s/it] 19%|█▉ | 3259/17285 [29:20:36<127:21:32, 32.69s/it] 19%|█▉ | 3260/17285 [29:21:07<124:49:22, 32.04s/it] {'loss': 1.6995, 'learning_rate': 0.0001900113267994153, 'epoch': 0.57} + 19%|█▉ | 3260/17285 [29:21:07<124:49:22, 32.04s/it] 19%|█▉ | 3261/17285 [29:21:33<118:23:52, 30.39s/it] 19%|█▉ | 3262/17285 [29:22:03<117:11:15, 30.08s/it] 19%|█▉ | 3263/17285 [29:22:29<112:45:29, 28.95s/it] 19%|█▉ | 3264/17285 [29:22:58<113:28:42, 29.14s/it] 19%|█▉ | 3265/17285 [29:23:31<118:03:27, 30.31s/it] 19%|█▉ | 3266/17285 [29:24:04<120:24:16, 30.92s/it] 19%|█▉ | 3267/17285 [29:24:31<115:34:32, 29.68s/it] 19%|█▉ | 3268/17285 [29:24:59<113:33:24, 29.16s/it] 19%|█▉ | 3269/17285 [29:25:28<113:31:22, 29.16s/it] 19%|█▉ | 3270/17285 [29:25:59<115:37:59, 29.70s/it] {'loss': 1.684, 'learning_rate': 0.00018992780926872102, 'epoch': 0.57} + 19%|█▉ | 3270/17285 [29:25:59<115:37:59, 29.70s/it] 19%|█▉ | 3271/17285 [29:26:40<129:24:50, 33.24s/it] 19%|█▉ | 3272/17285 [29:27:12<127:58:10, 32.88s/it] 19%|█▉ | 3273/17285 [29:27:38<120:17:50, 30.91s/it] 19%|█▉ | 3274/17285 [29:28:13<124:02:09, 31.87s/it] 19%|█▉ | 3275/17285 [29:28:48<128:01:43, 32.90s/it] 19%|█▉ | 3276/17285 [29:29:20<126:55:34, 32.62s/it] 19%|█▉ | 3277/17285 [29:29:50<124:16:21, 31.94s/it] 19%|█▉ | 3278/17285 [29:30:22<124:17:29, 31.94s/it] 19%|█▉ | 3279/17285 [29:30:53<123:30:46, 31.75s/it] 19%|█▉ | 3280/17285 [29:31:24<122:20:26, 31.45s/it] {'loss': 1.6553, 'learning_rate': 0.00018984396254750593, 'epoch': 0.57} + 19%|█▉ | 3280/17285 [29:31:24<122:20:26, 31.45s/it] 19%|█▉ | 3281/17285 [29:32:08<136:25:20, 35.07s/it] 19%|█▉ | 3282/17285 [29:32:33<125:30:30, 32.27s/it] 19%|█▉ | 3283/17285 [29:33:04<123:11:12, 31.67s/it] 19%|█▉ | 3284/17285 [29:33:33<120:21:31, 30.95s/it] 19%|█▉ | 3285/17285 [29:34:03<119:28:12, 30.72s/it] 19%|█▉ | 3286/17285 [29:34:45<132:01:24, 33.95s/it] 19%|█▉ | 3287/17285 [29:35:21<134:13:22, 34.52s/it] 19%|█▉ | 3288/17285 [29:36:00<139:37:28, 35.91s/it] 19%|█▉ | 3289/17285 [29:36:27<129:14:36, 33.24s/it] 19%|█▉ | 3290/17285 [29:37:03<133:10:47, 34.26s/it] {'loss': 1.6515, 'learning_rate': 0.00018975978694270003, 'epoch': 0.57} + 19%|█▉ | 3290/17285 [29:37:03<133:10:47, 34.26s/it] 19%|█▉ | 3291/17285 [29:37:44<140:11:41, 36.07s/it] 19%|█▉ | 3292/17285 [29:38:19<139:06:22, 35.79s/it] 19%|█▉ | 3293/17285 [29:38:53<136:59:29, 35.25s/it] 19%|█▉ | 3294/17285 [29:39:20<127:43:26, 32.86s/it] 19%|█▉ | 3295/17285 [29:39:55<129:50:44, 33.41s/it] 19%|█▉ | 3296/17285 [29:40:22<122:25:56, 31.51s/it] 19%|█▉ | 3297/17285 [29:40:48<116:37:06, 30.01s/it] 19%|█▉ | 3298/17285 [29:41:19<117:29:36, 30.24s/it] 19%|█▉ | 3299/17285 [29:41:51<119:37:02, 30.79s/it] 19%|█▉ | 3300/17285 [29:42:24<122:23:47, 31.51s/it] {'loss': 1.6754, 'learning_rate': 0.00018967528276243734, 'epoch': 0.57} + 19%|█▉ | 3300/17285 [29:42:24<122:23:47, 31.51s/it] 19%|█▉ | 3301/17285 [29:43:01<127:59:24, 32.95s/it] 19%|█▉ | 3302/17285 [29:43:32<125:41:57, 32.36s/it] 19%|█▉ | 3303/17285 [29:44:11<134:04:46, 34.52s/it] 19%|█▉ | 3304/17285 [29:44:48<136:56:07, 35.26s/it] 19%|█▉ | 3305/17285 [29:45:15<127:12:56, 32.76s/it] 19%|█▉ | 3306/17285 [29:45:49<128:38:05, 33.13s/it] 19%|█▉ | 3307/17285 [29:46:19<125:24:38, 32.30s/it] 19%|█▉ | 3308/17285 [29:46:50<123:20:22, 31.77s/it] 19%|█▉ | 3309/17285 [29:47:14<114:37:27, 29.53s/it] 19%|█▉ | 3310/17285 [29:47:44<114:52:05, 29.59s/it] {'loss': 1.6483, 'learning_rate': 0.00018959045031605453, 'epoch': 0.57} + 19%|█▉ | 3310/17285 [29:47:44<114:52:05, 29.59s/it] 19%|█▉ | 3311/17285 [29:48:15<115:57:19, 29.87s/it] 19%|█▉ | 3312/17285 [29:48:46<117:18:18, 30.22s/it] 19%|█▉ | 3313/17285 [29:49:24<126:25:23, 32.57s/it] 19%|█▉ | 3314/17285 [29:49:56<126:14:42, 32.53s/it] 19%|█▉ | 3315/17285 [29:50:21<117:37:41, 30.31s/it] 19%|█▉ | 3316/17285 [29:50:50<115:47:28, 29.84s/it] 19%|█▉ | 3317/17285 [29:51:23<119:01:22, 30.68s/it] 19%|█▉ | 3318/17285 [29:51:52<117:17:02, 30.23s/it] 19%|█▉ | 3319/17285 [29:52:22<117:38:47, 30.33s/it] 19%|█▉ | 3320/17285 [29:52:55<120:42:26, 31.12s/it] {'loss': 1.6569, 'learning_rate': 0.00018950528991409, 'epoch': 0.58} + 19%|█▉ | 3320/17285 [29:52:55<120:42:26, 31.12s/it] 19%|█▉ | 3321/17285 [29:53:38<134:24:11, 34.65s/it] 19%|█▉ | 3322/17285 [29:54:10<131:36:16, 33.93s/it] 19%|█▉ | 3323/17285 [29:54:44<130:42:44, 33.70s/it] 19%|█▉ | 3324/17285 [29:55:13<125:55:12, 32.47s/it] 19%|█▉ | 3325/17285 [29:55:43<122:53:45, 31.69s/it] 19%|█▉ | 3326/17285 [29:56:23<132:25:21, 34.15s/it] 19%|█▉ | 3327/17285 [29:56:53<127:34:39, 32.90s/it] 19%|█▉ | 3328/17285 [29:57:31<133:16:09, 34.37s/it] 19%|█▉ | 3329/17285 [29:58:04<131:35:47, 33.95s/it] 19%|█▉ | 3330/17285 [29:58:35<128:48:50, 33.23s/it] {'loss': 1.6626, 'learning_rate': 0.00018941980186828263, 'epoch': 0.58} + 19%|█▉ | 3330/17285 [29:58:35<128:48:50, 33.23s/it] 19%|█▉ | 3331/17285 [29:59:10<131:03:26, 33.81s/it] 19%|█▉ | 3332/17285 [29:59:51<138:30:26, 35.74s/it] 19%|█▉ | 3333/17285 [30:00:26<138:26:40, 35.72s/it] 19%|█▉ | 3334/17285 [30:00:55<130:42:42, 33.73s/it] 19%|█▉ | 3335/17285 [30:01:25<126:25:36, 32.63s/it] 19%|█▉ | 3336/17285 [30:01:54<121:16:02, 31.30s/it] 19%|█▉ | 3337/17285 [30:02:18<113:37:59, 29.33s/it] 19%|█▉ | 3338/17285 [30:03:01<129:24:16, 33.40s/it] 19%|█▉ | 3339/17285 [30:03:28<121:03:38, 31.25s/it] 19%|█▉ | 3340/17285 [30:04:00<121:58:37, 31.49s/it] {'loss': 1.6508, 'learning_rate': 0.0001893339864915708, 'epoch': 0.58} + 19%|█▉ | 3340/17285 [30:04:00<121:58:37, 31.49s/it] 19%|█▉ | 3341/17285 [30:04:28<118:36:54, 30.62s/it] 19%|█▉ | 3342/17285 [30:05:01<120:49:27, 31.20s/it] 19%|█▉ | 3343/17285 [30:05:30<118:43:07, 30.65s/it] 19%|█▉ | 3344/17285 [30:06:06<125:04:56, 32.30s/it] 19%|█▉ | 3345/17285 [30:06:37<123:27:41, 31.88s/it] 19%|█▉ | 3346/17285 [30:07:05<118:47:57, 30.68s/it] 19%|█▉ | 3347/17285 [30:07:31<113:22:35, 29.28s/it] 19%|█▉ | 3348/17285 [30:08:06<119:20:07, 30.82s/it] 19%|█▉ | 3349/17285 [30:08:39<122:43:36, 31.70s/it] 19%|█▉ | 3350/17285 [30:09:04<114:45:34, 29.65s/it] {'loss': 1.6662, 'learning_rate': 0.00018924784409809093, 'epoch': 0.58} + 19%|█▉ | 3350/17285 [30:09:04<114:45:34, 29.65s/it] 19%|█▉ | 3351/17285 [30:09:29<108:43:25, 28.09s/it] 19%|█▉ | 3352/17285 [30:10:02<114:46:30, 29.66s/it] 19%|█▉ | 3353/17285 [30:10:31<113:34:59, 29.35s/it] 19%|█▉ | 3354/17285 [30:11:00<113:41:41, 29.38s/it] 19%|█▉ | 3355/17285 [30:11:28<111:40:27, 28.86s/it] 19%|█▉ | 3356/17285 [30:11:57<112:41:42, 29.13s/it] 19%|█▉ | 3357/17285 [30:12:29<115:52:54, 29.95s/it] 19%|█▉ | 3358/17285 [30:12:58<114:18:51, 29.55s/it] 19%|█▉ | 3359/17285 [30:13:32<119:46:55, 30.96s/it] 19%|█▉ | 3360/17285 [30:14:11<128:39:42, 33.26s/it] {'loss': 1.6426, 'learning_rate': 0.0001891613750031767, 'epoch': 0.58} + 19%|█▉ | 3360/17285 [30:14:11<128:39:42, 33.26s/it] 19%|█▉ | 3361/17285 [30:14:39<122:35:48, 31.70s/it] 19%|█▉ | 3362/17285 [30:15:08<119:36:41, 30.93s/it] 19%|█▉ | 3363/17285 [30:15:37<117:32:42, 30.40s/it] 19%|█▉ | 3364/17285 [30:16:07<117:27:36, 30.38s/it] 19%|█▉ | 3365/17285 [30:16:46<126:59:50, 32.84s/it] 19%|█▉ | 3366/17285 [30:17:25<133:45:57, 34.60s/it] 19%|█▉ | 3367/17285 [30:17:50<123:25:06, 31.92s/it] 19%|█▉ | 3368/17285 [30:18:25<126:21:13, 32.68s/it] 19%|█▉ | 3369/17285 [30:18:57<125:44:21, 32.53s/it] 19%|█▉ | 3370/17285 [30:19:35<132:37:40, 34.31s/it] {'loss': 1.6468, 'learning_rate': 0.00018907457952335754, 'epoch': 0.58} + 19%|█▉ | 3370/17285 [30:19:35<132:37:40, 34.31s/it] 20%|█▉ | 3371/17285 [30:20:10<133:06:41, 34.44s/it] 20%|█▉ | 3372/17285 [30:20:43<130:53:49, 33.87s/it] 20%|█▉ | 3373/17285 [30:21:08<121:18:48, 31.39s/it] 20%|█▉ | 3374/17285 [30:21:44<126:23:15, 32.71s/it] 20%|█▉ | 3375/17285 [30:22:08<116:39:18, 30.19s/it] 20%|█▉ | 3376/17285 [30:22:46<124:50:10, 32.31s/it] 20%|█▉ | 3377/17285 [30:23:15<121:51:30, 31.54s/it] 20%|█▉ | 3378/17285 [30:23:47<121:55:28, 31.56s/it] 20%|█▉ | 3379/17285 [30:24:18<120:40:04, 31.24s/it] 20%|█▉ | 3380/17285 [30:24:42<113:19:32, 29.34s/it] {'loss': 1.6326, 'learning_rate': 0.0001889874579763578, 'epoch': 0.59} + 20%|█▉ | 3380/17285 [30:24:42<113:19:32, 29.34s/it] 20%|█▉ | 3381/17285 [30:25:14<115:23:11, 29.88s/it] 20%|█▉ | 3382/17285 [30:25:42<113:46:48, 29.46s/it] 20%|█▉ | 3383/17285 [30:26:10<112:00:02, 29.00s/it] 20%|█▉ | 3384/17285 [30:26:35<107:37:10, 27.87s/it] 20%|█▉ | 3385/17285 [30:27:05<109:50:37, 28.45s/it] 20%|█▉ | 3386/17285 [30:27:37<114:19:13, 29.61s/it] 20%|█▉ | 3387/17285 [30:28:04<110:57:46, 28.74s/it] 20%|█▉ | 3388/17285 [30:28:32<109:45:39, 28.43s/it] 20%|█▉ | 3389/17285 [30:29:05<114:56:46, 29.78s/it] 20%|█▉ | 3390/17285 [30:29:40<121:37:18, 31.51s/it] {'loss': 1.6034, 'learning_rate': 0.00018890001068109534, 'epoch': 0.59} + 20%|█▉ | 3390/17285 [30:29:40<121:37:18, 31.51s/it] 20%|█▉ | 3391/17285 [30:30:08<117:00:38, 30.32s/it] 20%|█▉ | 3392/17285 [30:30:40<118:53:56, 30.81s/it] 20%|█▉ | 3393/17285 [30:31:11<119:35:07, 30.99s/it] 20%|█▉ | 3394/17285 [30:31:41<118:28:38, 30.70s/it] 20%|█▉ | 3395/17285 [30:32:11<117:31:18, 30.46s/it] 20%|█▉ | 3396/17285 [30:32:46<122:45:13, 31.82s/it] 20%|█▉ | 3397/17285 [30:33:13<116:38:17, 30.23s/it] 20%|█▉ | 3398/17285 [30:33:49<123:19:45, 31.97s/it] 20%|█▉ | 3399/17285 [30:34:22<124:45:17, 32.34s/it] 20%|█▉ | 3400/17285 [30:34:55<125:44:13, 32.60s/it] {'loss': 1.6951, 'learning_rate': 0.00018881223795768053, 'epoch': 0.59} + 20%|█▉ | 3400/17285 [30:34:55<125:44:13, 32.60s/it] 20%|█▉ | 3401/17285 [30:35:25<122:31:18, 31.77s/it] 20%|█▉ | 3402/17285 [30:35:59<125:02:45, 32.43s/it] 20%|█▉ | 3403/17285 [30:36:24<116:15:55, 30.15s/it] 20%|█▉ | 3404/17285 [30:36:53<115:34:03, 29.97s/it] 20%|█▉ | 3405/17285 [30:37:22<114:14:38, 29.63s/it] 20%|█▉ | 3406/17285 [30:37:57<120:37:53, 31.29s/it] 20%|█▉ | 3407/17285 [30:38:35<127:58:07, 33.20s/it] 20%|█▉ | 3408/17285 [30:39:12<132:32:47, 34.39s/it] 20%|█▉ | 3409/17285 [30:39:43<128:40:19, 33.38s/it] 20%|█▉ | 3410/17285 [30:40:17<128:49:35, 33.43s/it] {'loss': 1.5691, 'learning_rate': 0.00018872414012741494, 'epoch': 0.59} + 20%|█▉ | 3410/17285 [30:40:17<128:49:35, 33.43s/it] 20%|█▉ | 3411/17285 [30:40:51<130:23:34, 33.83s/it] 20%|█▉ | 3412/17285 [30:41:24<128:40:59, 33.39s/it] 20%|█▉ | 3413/17285 [30:41:53<124:10:30, 32.23s/it] 20%|█▉ | 3414/17285 [30:42:24<122:53:28, 31.89s/it] 20%|█▉ | 3415/17285 [30:43:06<134:10:25, 34.83s/it][2023-08-24 06:38:19,096] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 20%|█▉ | 3416/17285 [30:43:41<134:47:14, 34.99s/it] 20%|█▉ | 3417/17285 [30:44:06<122:38:29, 31.84s/it] 20%|█▉ | 3418/17285 [30:44:36<120:31:48, 31.29s/it] 20%|█▉ | 3419/17285 [30:45:04<116:51:54, 30.34s/it] 20%|█▉ | 3420/17285 [30:45:39<122:17:02, 31.75s/it] {'loss': 1.6343, 'learning_rate': 0.0001886445743803333, 'epoch': 0.59} + 20%|█▉ | 3420/17285 [30:45:39<122:17:02, 31.75s/it] 20%|█▉ | 3421/17285 [30:46:11<122:09:11, 31.72s/it][2023-08-24 06:41:22,824] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 20%|█▉ | 3422/17285 [30:46:45<125:15:35, 32.53s/it] 20%|█▉ | 3423/17285 [30:47:18<125:04:50, 32.48s/it] 20%|█▉ | 3424/17285 [30:47:54<129:58:42, 33.76s/it] 20%|█▉ | 3425/17285 [30:48:38<140:57:08, 36.61s/it] 20%|█▉ | 3426/17285 [30:49:09<135:04:54, 35.09s/it] 20%|█▉ | 3427/17285 [30:49:41<131:35:27, 34.18s/it] 20%|█▉ | 3428/17285 [30:50:12<128:07:15, 33.29s/it][2023-08-24 06:45:23,230] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 20%|█▉ | 3429/17285 [30:50:46<128:02:50, 33.27s/it] 20%|█▉ | 3430/17285 [30:51:15<123:07:15, 31.99s/it] {'loss': 1.6143, 'learning_rate': 0.00018857362860914253, 'epoch': 0.6} + 20%|█▉ | 3430/17285 [30:51:15<123:07:15, 31.99s/it] 20%|█▉ | 3431/17285 [30:51:46<122:17:36, 31.78s/it] 20%|█▉ | 3432/17285 [30:52:14<118:01:12, 30.67s/it] 20%|█▉ | 3433/17285 [30:52:40<112:15:02, 29.17s/it] 20%|█▉ | 3434/17285 [30:53:07<110:24:24, 28.70s/it] 20%|█▉ | 3435/17285 [30:53:48<124:50:00, 32.45s/it] 20%|█▉ | 3436/17285 [30:54:28<132:39:10, 34.48s/it] 20%|█▉ | 3437/17285 [30:54:59<128:48:10, 33.48s/it] 20%|█▉ | 3438/17285 [30:55:25<119:53:45, 31.17s/it] 20%|█▉ | 3439/17285 [30:55:56<120:42:57, 31.39s/it] 20%|█▉ | 3440/17285 [30:56:23<114:34:05, 29.79s/it] {'loss': 1.663, 'learning_rate': 0.00018848465460459042, 'epoch': 0.6} + 20%|█▉ | 3440/17285 [30:56:23<114:34:05, 29.79s/it] 20%|█▉ | 3441/17285 [30:57:02<125:21:56, 32.60s/it] 20%|█▉ | 3442/17285 [30:57:26<116:00:13, 30.17s/it] 20%|█▉ | 3443/17285 [30:57:53<112:20:15, 29.22s/it] 20%|█▉ | 3444/17285 [30:58:33<124:33:10, 32.40s/it] 20%|█▉ | 3445/17285 [30:59:00<118:02:22, 30.70s/it] 20%|█▉ | 3446/17285 [30:59:28<115:16:30, 29.99s/it] 20%|█▉ | 3447/17285 [30:59:53<109:44:16, 28.55s/it] 20%|█▉ | 3448/17285 [31:00:34<123:59:49, 32.26s/it] 20%|█▉ | 3449/17285 [31:01:04<121:01:13, 31.49s/it] 20%|█▉ | 3450/17285 [31:01:45<131:43:46, 34.28s/it] {'loss': 1.617, 'learning_rate': 0.00018839535669234195, 'epoch': 0.6} + 20%|█▉ | 3450/17285 [31:01:45<131:43:46, 34.28s/it] 20%|█▉ | 3451/17285 [31:02:21<134:39:54, 35.04s/it] 20%|█▉ | 3452/17285 [31:02:48<124:39:13, 32.44s/it] 20%|█▉ | 3453/17285 [31:03:27<132:50:09, 34.57s/it] 20%|█▉ | 3454/17285 [31:03:59<129:36:55, 33.74s/it] 20%|█▉ | 3455/17285 [31:04:27<122:57:13, 32.01s/it] 20%|█▉ | 3456/17285 [31:04:54<116:45:36, 30.40s/it] 20%|██ | 3457/17285 [31:05:28<121:10:40, 31.55s/it] 20%|██ | 3458/17285 [31:05:54<114:58:03, 29.93s/it] 20%|██ | 3459/17285 [31:06:31<123:23:37, 32.13s/it] 20%|██ | 3460/17285 [31:07:01<119:58:56, 31.24s/it] {'loss': 1.6374, 'learning_rate': 0.00018830573519928195, 'epoch': 0.6} + 20%|██ | 3460/17285 [31:07:01<119:58:56, 31.24s/it] 20%|██ | 3461/17285 [31:07:34<122:57:47, 32.02s/it] 20%|██ | 3462/17285 [31:08:09<125:47:26, 32.76s/it] 20%|██ | 3463/17285 [31:08:39<123:05:25, 32.06s/it] 20%|██ | 3464/17285 [31:09:19<132:06:25, 34.41s/it] 20%|██ | 3465/17285 [31:09:51<128:36:26, 33.50s/it] 20%|██ | 3466/17285 [31:10:24<128:17:28, 33.42s/it] 20%|██ | 3467/17285 [31:11:03<134:39:03, 35.08s/it] 20%|██ | 3468/17285 [31:11:29<124:21:00, 32.40s/it] 20%|██ | 3469/17285 [31:12:04<127:09:35, 33.13s/it] 20%|██ | 3470/17285 [31:12:40<130:43:07, 34.06s/it] {'loss': 1.6472, 'learning_rate': 0.0001882157904534795, 'epoch': 0.6} + 20%|██ | 3470/17285 [31:12:40<130:43:07, 34.06s/it] 20%|██ | 3471/17285 [31:13:11<127:43:31, 33.29s/it] 20%|██ | 3472/17285 [31:13:40<121:50:45, 31.76s/it] 20%|██ | 3473/17285 [31:14:21<132:43:26, 34.59s/it] 20%|██ | 3474/17285 [31:14:46<121:19:19, 31.62s/it] 20%|██ | 3475/17285 [31:15:17<120:35:55, 31.44s/it] 20%|██ | 3476/17285 [31:15:55<128:51:19, 33.59s/it] 20%|██ | 3477/17285 [31:16:26<125:18:48, 32.67s/it] 20%|██ | 3478/17285 [31:17:05<132:47:40, 34.62s/it] 20%|██ | 3479/17285 [31:17:32<124:24:09, 32.44s/it] 20%|██ | 3480/17285 [31:18:09<129:39:07, 33.81s/it] {'loss': 1.6326, 'learning_rate': 0.00018812552278418726, 'epoch': 0.6} + 20%|██ | 3480/17285 [31:18:09<129:39:07, 33.81s/it] 20%|██ | 3481/17285 [31:18:42<128:04:06, 33.40s/it] 20%|██ | 3482/17285 [31:19:12<124:17:59, 32.42s/it] 20%|██ | 3483/17285 [31:19:42<121:10:13, 31.61s/it] 20%|██ | 3484/17285 [31:20:15<123:13:25, 32.14s/it] 20%|██ | 3485/17285 [31:20:56<133:03:21, 34.71s/it] 20%|██ | 3486/17285 [31:21:40<144:09:13, 37.61s/it] 20%|██ | 3487/17285 [31:22:12<137:21:46, 35.84s/it] 20%|██ | 3488/17285 [31:22:45<134:49:50, 35.18s/it] 20%|██ | 3489/17285 [31:23:14<126:53:55, 33.11s/it] 20%|██ | 3490/17285 [31:23:50<130:09:36, 33.97s/it] {'loss': 1.6444, 'learning_rate': 0.00018803493252183976, 'epoch': 0.61} + 20%|██ | 3490/17285 [31:23:50<130:09:36, 33.97s/it] 20%|██ | 3491/17285 [31:24:21<127:17:19, 33.22s/it] 20%|██ | 3492/17285 [31:24:49<121:32:01, 31.72s/it] 20%|██ | 3493/17285 [31:25:19<119:04:50, 31.08s/it] 20%|██ | 3494/17285 [31:25:55<124:25:11, 32.48s/it] 20%|██ | 3495/17285 [31:26:30<127:33:26, 33.30s/it] 20%|██ | 3496/17285 [31:27:03<127:52:28, 33.39s/it] 20%|██ | 3497/17285 [31:27:32<122:49:08, 32.07s/it] 20%|██ | 3498/17285 [31:28:08<127:09:30, 33.20s/it] 20%|██ | 3499/17285 [31:28:40<125:29:32, 32.77s/it] 20%|██ | 3500/17285 [31:29:18<131:36:20, 34.37s/it] {'loss': 1.6167, 'learning_rate': 0.00018794401999805248, 'epoch': 0.61} + 20%|██ | 3500/17285 [31:29:18<131:36:20, 34.37s/it] 20%|██ | 3501/17285 [31:29:49<127:46:45, 33.37s/it] 20%|██ | 3502/17285 [31:30:18<122:46:08, 32.07s/it] 20%|██ | 3503/17285 [31:30:45<117:06:16, 30.59s/it] 20%|██ | 3504/17285 [31:31:19<120:33:40, 31.49s/it] 20%|██ | 3505/17285 [31:31:59<130:03:34, 33.98s/it] 20%|██ | 3506/17285 [31:32:26<122:49:40, 32.09s/it] 20%|██ | 3507/17285 [31:32:55<119:03:21, 31.11s/it] 20%|██ | 3508/17285 [31:33:28<120:43:56, 31.55s/it] 20%|██ | 3509/17285 [31:33:56<116:35:33, 30.47s/it] 20%|██ | 3510/17285 [31:34:24<114:08:06, 29.83s/it] {'loss': 1.6498, 'learning_rate': 0.00018785278554562065, 'epoch': 0.61} + 20%|██ | 3510/17285 [31:34:24<114:08:06, 29.83s/it] 20%|██ | 3511/17285 [31:34:52<112:01:11, 29.28s/it] 20%|██ | 3512/17285 [31:35:18<108:31:06, 28.36s/it] 20%|██ | 3513/17285 [31:35:55<117:38:44, 30.75s/it] 20%|██ | 3514/17285 [31:36:28<120:30:01, 31.50s/it] 20%|██ | 3515/17285 [31:37:04<125:43:09, 32.87s/it] 20%|██ | 3516/17285 [31:37:30<117:55:00, 30.83s/it] 20%|██ | 3517/17285 [31:38:06<123:47:53, 32.37s/it] 20%|██ | 3518/17285 [31:38:35<120:09:19, 31.42s/it] 20%|██ | 3519/17285 [31:39:05<118:12:07, 30.91s/it] 20%|██ | 3520/17285 [31:39:33<114:29:39, 29.94s/it] {'loss': 1.6605, 'learning_rate': 0.00018776122949851792, 'epoch': 0.61} + 20%|██ | 3520/17285 [31:39:33<114:29:39, 29.94s/it] 20%|██ | 3521/17285 [31:40:07<119:51:54, 31.35s/it] 20%|██ | 3522/17285 [31:40:39<120:05:46, 31.41s/it] 20%|██ | 3523/17285 [31:41:08<117:37:18, 30.77s/it] 20%|██ | 3524/17285 [31:41:39<118:18:40, 30.95s/it] 20%|██ | 3525/17285 [31:42:12<119:58:10, 31.39s/it] 20%|██ | 3526/17285 [31:42:45<121:44:42, 31.85s/it] 20%|██ | 3527/17285 [31:43:12<115:55:27, 30.33s/it] 20%|██ | 3528/17285 [31:43:42<116:17:06, 30.43s/it] 20%|██ | 3529/17285 [31:44:17<120:57:55, 31.66s/it] 20%|██ | 3530/17285 [31:44:53<126:07:37, 33.01s/it] {'loss': 1.6455, 'learning_rate': 0.00018766935219189507, 'epoch': 0.61} + 20%|██ | 3530/17285 [31:44:53<126:07:37, 33.01s/it] 20%|██ | 3531/17285 [31:45:28<128:14:45, 33.57s/it] 20%|██ | 3532/17285 [31:45:58<123:49:38, 32.41s/it] 20%|██ | 3533/17285 [31:46:29<122:52:38, 32.17s/it] 20%|██ | 3534/17285 [31:47:01<123:07:54, 32.24s/it] 20%|██ | 3535/17285 [31:47:39<129:21:48, 33.87s/it] 20%|██ | 3536/17285 [31:48:16<132:39:53, 34.74s/it] 20%|██ | 3537/17285 [31:48:41<121:46:33, 31.89s/it] 20%|██ | 3538/17285 [31:49:13<122:12:40, 32.00s/it] 20%|██ | 3539/17285 [31:49:40<115:38:22, 30.29s/it] 20%|██ | 3540/17285 [31:50:20<127:01:14, 33.27s/it] {'loss': 1.671, 'learning_rate': 0.00018757715396207903, 'epoch': 0.61} + 20%|██ | 3540/17285 [31:50:20<127:01:14, 33.27s/it] 20%|██ | 3541/17285 [31:50:47<119:19:07, 31.25s/it] 20%|██ | 3542/17285 [31:51:15<116:16:17, 30.46s/it] 20%|██ | 3543/17285 [31:51:44<114:14:23, 29.93s/it] 21%|██ | 3544/17285 [31:52:17<117:32:33, 30.79s/it] 21%|██ | 3545/17285 [31:52:43<112:33:44, 29.49s/it] 21%|██ | 3546/17285 [31:53:17<117:23:28, 30.76s/it] 21%|██ | 3547/17285 [31:53:48<117:46:19, 30.86s/it] 21%|██ | 3548/17285 [31:54:13<110:42:13, 29.01s/it] 21%|██ | 3549/17285 [31:54:39<108:17:36, 28.38s/it] 21%|██ | 3550/17285 [31:55:16<117:39:43, 30.84s/it] {'loss': 1.6176, 'learning_rate': 0.00018748463514657146, 'epoch': 0.62} + 21%|██ | 3550/17285 [31:55:16<117:39:43, 30.84s/it] 21%|██ | 3551/17285 [31:55:46<117:07:30, 30.70s/it] 21%|██ | 3552/17285 [31:56:11<109:31:28, 28.71s/it] 21%|██ | 3553/17285 [31:56:37<106:46:11, 27.99s/it] 21%|██ | 3554/17285 [31:57:18<121:25:53, 31.84s/it] 21%|██ | 3555/17285 [31:57:43<113:49:40, 29.85s/it] 21%|██ | 3556/17285 [31:58:13<114:00:43, 29.90s/it] 21%|██ | 3557/17285 [31:58:43<114:24:28, 30.00s/it] 21%|██ | 3558/17285 [31:59:28<131:21:26, 34.45s/it] 21%|██ | 3559/17285 [31:59:59<127:25:04, 33.42s/it] 21%|██ | 3560/17285 [32:00:40<136:15:00, 35.74s/it] {'loss': 1.6459, 'learning_rate': 0.00018739179608404747, 'epoch': 0.62} + 21%|██ | 3560/17285 [32:00:40<136:15:00, 35.74s/it] 21%|██ | 3561/17285 [32:01:20<141:27:57, 37.11s/it] 21%|██ | 3562/17285 [32:01:50<132:50:07, 34.85s/it] 21%|██ | 3563/17285 [32:02:33<141:46:50, 37.20s/it] 21%|██ | 3564/17285 [32:02:58<128:20:35, 33.67s/it] 21%|██ | 3565/17285 [32:03:33<129:22:17, 33.95s/it] 21%|██ | 3566/17285 [32:04:11<134:16:56, 35.24s/it] 21%|██ | 3567/17285 [32:04:38<125:04:23, 32.82s/it] 21%|██ | 3568/17285 [32:05:05<118:35:50, 31.13s/it] 21%|██ | 3569/17285 [32:05:37<119:02:09, 31.24s/it] 21%|██ | 3570/17285 [32:06:07<117:20:16, 30.80s/it] {'loss': 1.6481, 'learning_rate': 0.00018729863711435457, 'epoch': 0.62} + 21%|██ | 3570/17285 [32:06:07<117:20:16, 30.80s/it] 21%|██ | 3571/17285 [32:06:40<120:22:14, 31.60s/it] 21%|██ | 3572/17285 [32:07:15<123:43:35, 32.48s/it] 21%|██ | 3573/17285 [32:07:44<119:54:51, 31.48s/it] 21%|██ | 3574/17285 [32:08:12<115:42:43, 30.38s/it] 21%|██ | 3575/17285 [32:08:43<116:42:04, 30.64s/it] 21%|██ | 3576/17285 [32:09:08<110:37:12, 29.05s/it] 21%|██ | 3577/17285 [32:09:42<115:49:08, 30.42s/it] 21%|██ | 3578/17285 [32:10:17<121:11:49, 31.83s/it] 21%|██ | 3579/17285 [32:10:44<115:50:27, 30.43s/it] 21%|██ | 3580/17285 [32:11:22<124:15:14, 32.64s/it] {'loss': 1.6823, 'learning_rate': 0.00018720515857851132, 'epoch': 0.62} + 21%|██ | 3580/17285 [32:11:22<124:15:14, 32.64s/it] 21%|██ | 3581/17285 [32:11:49<117:38:56, 30.91s/it] 21%|██ | 3582/17285 [32:12:23<121:50:21, 32.01s/it] 21%|██ | 3583/17285 [32:13:02<130:02:12, 34.17s/it] 21%|██ | 3584/17285 [32:13:40<133:57:41, 35.20s/it] 21%|██ | 3585/17285 [32:14:06<123:07:45, 32.36s/it] 21%|██ | 3586/17285 [32:14:40<125:36:53, 33.01s/it] 21%|██ | 3587/17285 [32:15:10<122:06:21, 32.09s/it] 21%|██ | 3588/17285 [32:15:42<121:45:09, 32.00s/it] 21%|██ | 3589/17285 [32:16:18<125:44:18, 33.05s/it] 21%|██ | 3590/17285 [32:16:45<119:37:27, 31.45s/it] {'loss': 1.6239, 'learning_rate': 0.00018711136081870605, 'epoch': 0.62} + 21%|██ | 3590/17285 [32:16:45<119:37:27, 31.45s/it] 21%|██ | 3591/17285 [32:17:20<123:44:25, 32.53s/it] 21%|██ | 3592/17285 [32:18:01<133:30:19, 35.10s/it] 21%|██ | 3593/17285 [32:18:36<132:49:29, 34.92s/it] 21%|██ | 3594/17285 [32:19:10<132:02:17, 34.72s/it] 21%|██ | 3595/17285 [32:19:45<132:38:51, 34.88s/it] 21%|██ | 3596/17285 [32:20:11<121:41:11, 32.00s/it] 21%|██ | 3597/17285 [32:20:42<120:21:18, 31.65s/it] 21%|██ | 3598/17285 [32:21:09<115:57:41, 30.50s/it] 21%|██ | 3599/17285 [32:21:50<127:27:46, 33.53s/it] 21%|██ | 3600/17285 [32:22:32<137:29:11, 36.17s/it] {'loss': 1.6209, 'learning_rate': 0.00018701724417829565, 'epoch': 0.62} + 21%|██ | 3600/17285 [32:22:32<137:29:11, 36.17s/it] 21%|██ | 3601/17285 [32:23:08<137:19:01, 36.13s/it] 21%|██ | 3602/17285 [32:23:36<127:50:24, 33.63s/it] 21%|██ | 3603/17285 [32:24:08<126:19:22, 33.24s/it] 21%|██ | 3604/17285 [32:24:44<128:30:50, 33.82s/it] 21%|██ | 3605/17285 [32:25:13<123:34:06, 32.52s/it] 21%|██ | 3606/17285 [32:25:39<116:05:22, 30.55s/it] 21%|██ | 3607/17285 [32:26:13<120:19:24, 31.67s/it] 21%|██ | 3608/17285 [32:26:50<125:36:31, 33.06s/it] 21%|██ | 3609/17285 [32:27:21<123:29:06, 32.51s/it] 21%|██ | 3610/17285 [32:27:59<130:20:51, 34.31s/it] {'loss': 1.6381, 'learning_rate': 0.0001869228090018043, 'epoch': 0.63} + 21%|██ | 3610/17285 [32:27:59<130:20:51, 34.31s/it] 21%|██ | 3611/17285 [32:28:29<124:31:47, 32.79s/it] 21%|██ | 3612/17285 [32:28:59<121:26:39, 31.98s/it] 21%|██ | 3613/17285 [32:29:29<119:38:38, 31.50s/it] 21%|██ | 3614/17285 [32:29:56<114:26:47, 30.14s/it] 21%|██ | 3615/17285 [32:30:26<113:52:37, 29.99s/it] 21%|██ | 3616/17285 [32:30:58<116:50:18, 30.77s/it] 21%|██ | 3617/17285 [32:31:33<120:58:13, 31.86s/it] 21%|██ | 3618/17285 [32:32:10<126:49:15, 33.41s/it] 21%|██ | 3619/17285 [32:32:43<127:03:31, 33.47s/it] 21%|██ | 3620/17285 [32:33:18<128:00:01, 33.72s/it] {'loss': 1.6498, 'learning_rate': 0.00018682805563492225, 'epoch': 0.63} + 21%|██ | 3620/17285 [32:33:18<128:00:01, 33.72s/it] 21%|██ | 3621/17285 [32:34:06<144:23:22, 38.04s/it] 21%|██ | 3622/17285 [32:34:33<131:49:06, 34.73s/it] 21%|██ | 3623/17285 [32:35:08<132:18:13, 34.86s/it] 21%|██ | 3624/17285 [32:35:39<127:42:46, 33.66s/it] 21%|██ | 3625/17285 [32:36:12<127:37:08, 33.63s/it] 21%|██ | 3626/17285 [32:36:59<142:09:17, 37.47s/it] 21%|██ | 3627/17285 [32:37:32<137:03:56, 36.13s/it] 21%|██ | 3628/17285 [32:38:01<128:59:08, 34.00s/it] 21%|██ | 3629/17285 [32:38:37<131:12:12, 34.59s/it] 21%|██ | 3630/17285 [32:39:12<131:33:19, 34.68s/it] {'loss': 1.6377, 'learning_rate': 0.00018673298442450448, 'epoch': 0.63} + 21%|██ | 3630/17285 [32:39:12<131:33:19, 34.68s/it] 21%|██ | 3631/17285 [32:39:46<131:12:22, 34.59s/it] 21%|██ | 3632/17285 [32:40:14<123:06:59, 32.46s/it] 21%|██ | 3633/17285 [32:40:44<121:15:44, 31.98s/it] 21%|██ | 3634/17285 [32:41:13<117:23:35, 30.96s/it] 21%|██ | 3635/17285 [32:41:42<115:06:32, 30.36s/it] 21%|██ | 3636/17285 [32:42:14<117:22:19, 30.96s/it] 21%|██ | 3637/17285 [32:42:52<125:01:00, 32.98s/it] 21%|██ | 3638/17285 [32:43:23<122:27:12, 32.30s/it] 21%|██ | 3639/17285 [32:44:00<128:00:06, 33.77s/it] 21%|██ | 3640/17285 [32:44:30<124:22:56, 32.82s/it] {'loss': 1.6513, 'learning_rate': 0.00018663759571856952, 'epoch': 0.63} + 21%|██ | 3640/17285 [32:44:30<124:22:56, 32.82s/it] 21%|██ | 3641/17285 [32:45:04<125:23:43, 33.09s/it] 21%|██ | 3642/17285 [32:45:45<134:21:25, 35.45s/it] 21%|██ | 3643/17285 [32:46:21<134:19:20, 35.45s/it] 21%|██ | 3644/17285 [32:46:52<129:23:13, 34.15s/it] 21%|██ | 3645/17285 [32:47:24<126:41:39, 33.44s/it] 21%|██ | 3646/17285 [32:48:02<132:52:20, 35.07s/it] 21%|██ | 3647/17285 [32:48:42<138:04:39, 36.45s/it][2023-08-24 08:43:45,192] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 21%|██ | 3648/17285 [32:49:08<125:35:03, 33.15s/it] 21%|██ | 3649/17285 [32:49:49<134:49:11, 35.59s/it] 21%|██ | 3650/17285 [32:50:21<131:11:07, 34.64s/it] {'loss': 1.6385, 'learning_rate': 0.0001865514747131347, 'epoch': 0.63} + 21%|██ | 3650/17285 [32:50:21<131:11:07, 34.64s/it] 21%|██ | 3651/17285 [32:50:56<131:07:30, 34.62s/it][2023-08-24 08:46:07,446] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 21%|██ | 3652/17285 [32:51:30<130:22:27, 34.43s/it] 21%|██ | 3653/17285 [32:51:55<120:24:55, 31.80s/it] 21%|██ | 3654/17285 [32:52:26<118:37:37, 31.33s/it] 21%|██ | 3655/17285 [32:52:57<118:49:37, 31.38s/it] 21%|██ | 3656/17285 [32:53:32<123:15:07, 32.56s/it] 21%|██ | 3657/17285 [32:54:04<121:38:53, 32.13s/it] 21%|██ | 3658/17285 [32:54:31<116:21:50, 30.74s/it] 21%|██ | 3659/17285 [32:55:14<130:33:46, 34.49s/it] 21%|██ | 3660/17285 [32:55:47<127:57:09, 33.81s/it] {'loss': 1.6137, 'learning_rate': 0.00018646509707450926, 'epoch': 0.64} + 21%|██ | 3660/17285 [32:55:47<127:57:09, 33.81s/it] 21%|██ | 3661/17285 [32:56:25<133:41:33, 35.33s/it] 21%|██ | 3662/17285 [32:56:56<128:00:26, 33.83s/it] 21%|██ | 3663/17285 [32:57:29<127:19:24, 33.65s/it] 21%|██ | 3664/17285 [32:58:03<127:55:59, 33.81s/it] 21%|██ | 3665/17285 [32:58:36<126:23:05, 33.41s/it] 21%|██ | 3666/17285 [32:59:03<119:45:22, 31.66s/it] 21%|██ | 3667/17285 [32:59:35<120:16:43, 31.80s/it] 21%|██ | 3668/17285 [33:00:12<125:38:15, 33.22s/it] 21%|██ | 3669/17285 [33:00:38<117:58:44, 31.19s/it] 21%|██ | 3670/17285 [33:01:03<110:49:54, 29.31s/it] {'loss': 1.6402, 'learning_rate': 0.00018636882124247248, 'epoch': 0.64} + 21%|██ | 3670/17285 [33:01:03<110:49:54, 29.31s/it] 21%|██ | 3671/17285 [33:01:44<124:07:46, 32.82s/it] 21%|██ | 3672/17285 [33:02:22<129:47:37, 34.32s/it] 21%|██ | 3673/17285 [33:02:54<126:27:40, 33.45s/it] 21%|██▏ | 3674/17285 [33:03:20<118:19:03, 31.29s/it] 21%|██▏ | 3675/17285 [33:03:45<111:48:27, 29.57s/it] 21%|██▏ | 3676/17285 [33:04:13<109:13:54, 28.90s/it] 21%|██▏ | 3677/17285 [33:04:44<112:24:07, 29.74s/it] 21%|██▏ | 3678/17285 [33:05:12<109:33:01, 28.98s/it][2023-08-24 09:00:22,199] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 21%|██▏ | 3679/17285 [33:05:45<114:01:41, 30.17s/it] 21%|██▏ | 3680/17285 [33:06:16<115:16:37, 30.50s/it] {'loss': 1.6949, 'learning_rate': 0.0001862819026646694, 'epoch': 0.64} + 21%|██▏ | 3680/17285 [33:06:16<115:16:37, 30.50s/it] 21%|██▏ | 3681/17285 [33:06:51<120:06:21, 31.78s/it] 21%|██▏ | 3682/17285 [33:07:24<122:25:05, 32.40s/it] 21%|██▏ | 3683/17285 [33:07:49<114:04:08, 30.19s/it] 21%|██▏ | 3684/17285 [33:08:21<116:09:46, 30.75s/it] 21%|██▏ | 3685/17285 [33:08:50<114:02:43, 30.19s/it] 21%|██▏ | 3686/17285 [33:09:19<112:28:33, 29.78s/it] 21%|██▏ | 3687/17285 [33:09:44<106:25:23, 28.17s/it] 21%|██▏ | 3688/17285 [33:10:21<116:54:35, 30.95s/it] 21%|██▏ | 3689/17285 [33:10:54<119:35:50, 31.67s/it] 21%|██▏ | 3690/17285 [33:11:30<123:47:40, 32.78s/it] {'loss': 1.6283, 'learning_rate': 0.0001861850264262445, 'epoch': 0.64} + 21%|██▏ | 3690/17285 [33:11:30<123:47:40, 32.78s/it] 21%|██▏ | 3691/17285 [33:12:05<126:55:07, 33.61s/it] 21%|██▏ | 3692/17285 [33:12:36<123:15:22, 32.64s/it] 21%|██▏ | 3693/17285 [33:13:02<116:21:39, 30.82s/it] 21%|██▏ | 3694/17285 [33:13:33<115:46:20, 30.67s/it] 21%|██▏ | 3695/17285 [33:14:05<118:12:29, 31.31s/it] 21%|██▏ | 3696/17285 [33:14:41<123:32:01, 32.73s/it] 21%|██▏ | 3697/17285 [33:15:09<117:32:39, 31.14s/it] 21%|██▏ | 3698/17285 [33:15:33<109:07:45, 28.91s/it] 21%|██▏ | 3699/17285 [33:16:06<114:25:14, 30.32s/it] 21%|██▏ | 3700/17285 [33:16:41<119:21:38, 31.63s/it] {'loss': 1.633, 'learning_rate': 0.00018608783469816221, 'epoch': 0.64} + 21%|██▏ | 3700/17285 [33:16:41<119:21:38, 31.63s/it] 21%|██▏ | 3701/17285 [33:17:10<116:45:32, 30.94s/it] 21%|██▏ | 3702/17285 [33:17:52<129:26:22, 34.31s/it] 21%|██▏ | 3703/17285 [33:18:27<130:04:22, 34.48s/it] 21%|██▏ | 3704/17285 [33:18:58<126:11:59, 33.45s/it] 21%|██▏ | 3705/17285 [33:19:28<122:28:00, 32.47s/it] 21%|██▏ | 3706/17285 [33:19:58<119:28:40, 31.68s/it] 21%|██▏ | 3707/17285 [33:20:32<121:21:45, 32.18s/it] 21%|██▏ | 3708/17285 [33:21:08<125:45:35, 33.35s/it] 21%|██▏ | 3709/17285 [33:21:34<117:35:21, 31.18s/it] 21%|██▏ | 3710/17285 [33:22:01<112:43:08, 29.89s/it] {'loss': 1.6442, 'learning_rate': 0.00018599032783620342, 'epoch': 0.64} + 21%|██▏ | 3710/17285 [33:22:01<112:43:08, 29.89s/it] 21%|██▏ | 3711/17285 [33:22:28<109:52:45, 29.14s/it] 21%|██▏ | 3712/17285 [33:22:54<106:36:21, 28.28s/it] 21%|██▏ | 3713/17285 [33:23:25<109:09:59, 28.96s/it] 21%|██▏ | 3714/17285 [33:23:52<106:45:28, 28.32s/it] 21%|██▏ | 3715/17285 [33:24:28<115:25:00, 30.62s/it] 21%|██▏ | 3716/17285 [33:25:06<123:41:21, 32.82s/it] 22%|██▏ | 3717/17285 [33:25:33<117:47:43, 31.25s/it] 22%|██▏ | 3718/17285 [33:26:00<112:56:57, 29.97s/it] 22%|██▏ | 3719/17285 [33:26:29<111:39:39, 29.63s/it] 22%|██▏ | 3720/17285 [33:27:08<122:00:24, 32.38s/it] {'loss': 1.629, 'learning_rate': 0.00018589250619730253, 'epoch': 0.65} + 22%|██▏ | 3720/17285 [33:27:08<122:00:24, 32.38s/it] 22%|██▏ | 3721/17285 [33:27:37<118:48:37, 31.53s/it] 22%|██▏ | 3722/17285 [33:28:07<116:07:17, 30.82s/it] 22%|██▏ | 3723/17285 [33:28:31<108:49:01, 28.89s/it] 22%|██▏ | 3724/17285 [33:29:02<110:46:54, 29.41s/it] 22%|██▏ | 3725/17285 [33:29:27<106:09:52, 28.19s/it] 22%|██▏ | 3726/17285 [33:29:56<107:01:15, 28.41s/it] 22%|██▏ | 3727/17285 [33:30:33<116:22:39, 30.90s/it][2023-08-24 09:25:38,341] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 + 22%|██▏ | 3728/17285 [33:31:01<113:09:23, 30.05s/it] 22%|██▏ | 3729/17285 [33:31:38<121:23:14, 32.24s/it] 22%|██▏ | 3730/17285 [33:32:07<118:12:07, 31.39s/it] {'loss': 1.599, 'learning_rate': 0.00018580419788394125, 'epoch': 0.65} + 22%|██▏ | 3730/17285 [33:32:07<118:12:07, 31.39s/it] 22%|██▏ | 3731/17285 [33:32:35<113:21:17, 30.11s/it] 22%|██▏ | 3732/17285 [33:33:16<126:07:14, 33.50s/it] 22%|██▏ | 3733/17285 [33:33:51<127:23:32, 33.84s/it] 22%|██▏ | 3734/17285 [33:34:20<122:55:28, 32.66s/it] 22%|██▏ | 3735/17285 [33:34:47<115:56:09, 30.80s/it] 22%|██▏ | 3736/17285 [33:35:14<112:11:41, 29.81s/it] 22%|██▏ | 3737/17285 [33:35:46<114:11:11, 30.34s/it] 22%|██▏ | 3738/17285 [33:36:17<114:24:31, 30.40s/it] 22%|██▏ | 3739/17285 [33:36:46<112:50:24, 29.99s/it] 22%|██▏ | 3740/17285 [33:37:29<127:46:11, 33.96s/it] {'loss': 1.6526, 'learning_rate': 0.00018570577915633075, 'epoch': 0.65} + 22%|██▏ | 3740/17285 [33:37:29<127:46:11, 33.96s/it] 22%|██▏ | 3741/17285 [33:38:01<125:39:27, 33.40s/it] 22%|██▏ | 3742/17285 [33:38:42<134:44:08, 35.82s/it] 22%|██▏ | 3743/17285 [33:39:21<137:36:02, 36.58s/it] 22%|██▏ | 3744/17285 [33:39:46<124:51:04, 33.19s/it] 22%|██▏ | 3745/17285 [33:40:21<127:12:08, 33.82s/it] 22%|██▏ | 3746/17285 [33:40:47<118:22:42, 31.48s/it] 22%|██▏ | 3747/17285 [33:41:18<117:14:05, 31.17s/it] 22%|██▏ | 3748/17285 [33:41:49<117:22:11, 31.21s/it] 22%|██▏ | 3749/17285 [33:42:20<117:01:13, 31.12s/it] 22%|██▏ | 3750/17285 [33:42:57<123:06:32, 32.74s/it] {'loss': 1.6345, 'learning_rate': 0.00018560704669339962, 'epoch': 0.65} + 22%|██▏ | 3750/17285 [33:42:57<123:06:32, 32.74s/it] 22%|██▏ | 3751/17285 [33:43:29<122:28:46, 32.58s/it] 22%|██▏ | 3752/17285 [33:43:55<115:34:33, 30.75s/it] 22%|██▏ | 3753/17285 [33:44:31<120:45:33, 32.13s/it] 22%|██▏ | 3754/17285 [33:45:01<118:55:33, 31.64s/it] 22%|██▏ | 3755/17285 [33:45:31<117:25:28, 31.24s/it] 22%|██▏ | 3756/17285 [33:45:58<112:27:51, 29.93s/it] 22%|██▏ | 3757/17285 [33:46:31<115:38:22, 30.77s/it] 22%|██▏ | 3758/17285 [33:47:09<123:42:27, 32.92s/it] 22%|██▏ | 3759/17285 [33:47:44<126:37:13, 33.70s/it] 22%|██▏ | 3760/17285 [33:48:17<124:58:34, 33.27s/it] {'loss': 1.6609, 'learning_rate': 0.00018550800085656875, 'epoch': 0.65} + 22%|██▏ | 3760/17285 [33:48:17<124:58:34, 33.27s/it] 22%|██▏ | 3761/17285 [33:48:48<123:16:51, 32.82s/it] 22%|██▏ | 3762/17285 [33:49:30<132:55:35, 35.39s/it] 22%|██▏ | 3763/17285 [33:50:03<130:19:10, 34.70s/it] 22%|██▏ | 3764/17285 [33:50:33<125:12:00, 33.33s/it] 22%|██▏ | 3765/17285 [33:50:58<115:52:01, 30.85s/it] 22%|██▏ | 3766/17285 [33:51:25<110:59:18, 29.56s/it] 22%|██▏ | 3767/17285 [33:51:58<114:48:33, 30.58s/it] 22%|██▏ | 3768/17285 [33:52:36<123:58:19, 33.02s/it] 22%|██▏ | 3769/17285 [33:53:05<118:54:01, 31.67s/it] 22%|██▏ | 3770/17285 [33:53:37<119:20:17, 31.79s/it] {'loss': 1.6241, 'learning_rate': 0.00018540864200840615, 'epoch': 0.65} + 22%|██▏ | 3770/17285 [33:53:37<119:20:17, 31.79s/it] 22%|██▏ | 3771/17285 [33:54:09<119:21:40, 31.80s/it] 22%|██▏ | 3772/17285 [33:54:41<119:37:04, 31.87s/it] 22%|██▏ | 3773/17285 [33:55:12<118:33:46, 31.59s/it] 22%|██▏ | 3774/17285 [33:55:38<112:08:54, 29.88s/it] 22%|██▏ | 3775/17285 [33:56:10<115:14:15, 30.71s/it] 22%|██▏ | 3776/17285 [33:56:40<114:02:49, 30.39s/it] 22%|██▏ | 3777/17285 [33:57:14<118:33:29, 31.60s/it] 22%|██▏ | 3778/17285 [33:57:51<124:47:17, 33.26s/it] 22%|██▏ | 3779/17285 [33:58:23<122:58:17, 32.78s/it] 22%|██▏ | 3780/17285 [33:58:58<125:23:53, 33.43s/it] {'loss': 1.6081, 'learning_rate': 0.0001853089705126257, 'epoch': 0.66} + 22%|██▏ | 3780/17285 [33:58:58<125:23:53, 33.43s/it] 22%|██▏ | 3781/17285 [33:59:28<121:28:37, 32.38s/it] 22%|██▏ | 3782/17285 [33:59:58<119:20:52, 31.82s/it] 22%|██▏ | 3783/17285 [34:00:29<117:35:09, 31.35s/it] 22%|██▏ | 3784/17285 [34:00:59<116:39:25, 31.11s/it] 22%|██▏ | 3785/17285 [34:01:26<111:27:20, 29.72s/it] 22%|██▏ | 3786/17285 [34:01:52<107:16:04, 28.61s/it] 22%|██▏ | 3787/17285 [34:02:22<109:10:09, 29.12s/it] 22%|██▏ | 3788/17285 [34:02:50<107:55:50, 28.79s/it] 22%|██▏ | 3789/17285 [34:03:25<115:20:23, 30.77s/it] 22%|██▏ | 3790/17285 [34:04:08<128:29:35, 34.28s/it] {'loss': 1.6591, 'learning_rate': 0.00018520898673408576, 'epoch': 0.66} + 22%|██▏ | 3790/17285 [34:04:08<128:29:35, 34.28s/it] 22%|██▏ | 3791/17285 [34:04:36<121:48:15, 32.50s/it] 22%|██▏ | 3792/17285 [34:05:12<125:15:38, 33.42s/it] 22%|██▏ | 3793/17285 [34:05:40<119:20:47, 31.84s/it] 22%|██▏ | 3794/17285 [34:06:07<113:50:47, 30.38s/it] 22%|██▏ | 3795/17285 [34:06:40<116:19:00, 31.04s/it] 22%|██▏ | 3796/17285 [34:07:13<119:06:34, 31.79s/it] 22%|██▏ | 3797/17285 [34:07:40<113:25:10, 30.27s/it] 22%|██▏ | 3798/17285 [34:08:19<123:03:53, 32.85s/it] 22%|██▏ | 3799/17285 [34:08:57<129:19:37, 34.52s/it] 22%|██▏ | 3800/17285 [34:09:24<120:13:08, 32.09s/it] {'loss': 1.6196, 'learning_rate': 0.00018510869103878781, 'epoch': 0.66} + 22%|██▏ | 3800/17285 [34:09:24<120:13:08, 32.09s/it] 22%|██▏ | 3801/17285 [34:09:59<123:27:22, 32.96s/it] 22%|██▏ | 3802/17285 [34:10:25<115:55:49, 30.95s/it] 22%|██▏ | 3803/17285 [34:10:54<113:52:41, 30.41s/it] 22%|██▏ | 3804/17285 [34:11:20<109:08:55, 29.15s/it] 22%|██▏ | 3805/17285 [34:11:49<108:56:39, 29.09s/it] 22%|██▏ | 3806/17285 [34:12:21<111:36:20, 29.81s/it] 22%|██▏ | 3807/17285 [34:12:52<113:20:46, 30.28s/it] 22%|██▏ | 3808/17285 [34:13:24<115:49:04, 30.94s/it] 22%|██▏ | 3809/17285 [34:13:58<118:13:12, 31.58s/it] 22%|██▏ | 3810/17285 [34:14:27<115:16:46, 30.80s/it] {'loss': 1.6015, 'learning_rate': 0.00018500808379387515, 'epoch': 0.66} + 22%|██▏ | 3810/17285 [34:14:27<115:16:46, 30.80s/it] 22%|██▏ | 3811/17285 [34:14:51<108:39:17, 29.03s/it] 22%|██▏ | 3812/17285 [34:15:18<106:18:32, 28.41s/it] 22%|██▏ | 3813/17285 [34:15:55<115:06:29, 30.76s/it] 22%|██▏ | 3814/17285 [34:16:28<118:14:27, 31.60s/it] 22%|██▏ | 3815/17285 [34:17:03<121:49:12, 32.56s/it] 22%|██▏ | 3816/17285 [34:17:34<120:01:19, 32.08s/it] 22%|██▏ | 3817/17285 [34:18:13<127:26:48, 34.07s/it] 22%|██▏ | 3818/17285 [34:18:44<124:07:12, 33.18s/it] 22%|██▏ | 3819/17285 [34:19:17<124:42:16, 33.34s/it] 22%|██▏ | 3820/17285 [34:19:49<122:40:39, 32.80s/it] {'loss': 1.6196, 'learning_rate': 0.00018490716536763153, 'epoch': 0.66} + 22%|██▏ | 3820/17285 [34:19:49<122:40:39, 32.80s/it] 22%|██▏ | 3821/17285 [34:20:16<115:46:13, 30.95s/it] 22%|██▏ | 3822/17285 [34:20:41<110:00:49, 29.42s/it] 22%|██▏ | 3823/17285 [34:21:17<117:12:31, 31.34s/it] 22%|██▏ | 3824/17285 [34:21:53<121:36:18, 32.52s/it] 22%|██▏ | 3825/17285 [34:22:22<117:38:54, 31.47s/it] 22%|██▏ | 3826/17285 [34:23:00<124:54:29, 33.41s/it] 22%|██▏ | 3827/17285 [34:23:33<124:50:04, 33.39s/it] 22%|██▏ | 3828/17285 [34:24:02<119:44:06, 32.03s/it] 22%|██▏ | 3829/17285 [34:24:36<122:26:34, 32.76s/it] 22%|██▏ | 3830/17285 [34:25:17<131:22:21, 35.15s/it] {'loss': 1.6504, 'learning_rate': 0.00018480593612947978, 'epoch': 0.66} + 22%|██▏ | 3830/17285 [34:25:17<131:22:21, 35.15s/it] 22%|██▏ | 3831/17285 [34:25:53<132:31:00, 35.46s/it] 22%|██▏ | 3832/17285 [34:26:18<120:51:43, 32.34s/it] 22%|██▏ | 3833/17285 [34:26:57<128:36:01, 34.42s/it] 22%|██▏ | 3834/17285 [34:27:35<132:03:02, 35.34s/it] 22%|██▏ | 3835/17285 [34:28:02<122:41:08, 32.84s/it] 22%|██▏ | 3836/17285 [34:28:29<116:30:58, 31.19s/it] 22%|██▏ | 3837/17285 [34:29:01<116:49:46, 31.28s/it] 22%|██▏ | 3838/17285 [34:29:31<116:09:40, 31.10s/it] 22%|██▏ | 3839/17285 [34:30:11<125:59:11, 33.73s/it] 22%|██▏ | 3840/17285 [34:30:40<120:44:44, 32.33s/it] {'loss': 1.6474, 'learning_rate': 0.00018470439644998062, 'epoch': 0.67} + 22%|██▏ | 3840/17285 [34:30:40<120:44:44, 32.33s/it] 22%|██▏ | 3841/17285 [34:31:28<138:18:36, 37.04s/it] 22%|██▏ | 3842/17285 [34:32:02<135:00:44, 36.16s/it] 22%|██▏ | 3843/17285 [34:32:32<127:28:59, 34.14s/it] 22%|██▏ | 3844/17285 [34:33:02<122:45:08, 32.88s/it] 22%|██▏ | 3845/17285 [34:33:34<122:19:03, 32.76s/it] 22%|██▏ | 3846/17285 [34:34:09<124:48:29, 33.43s/it] 22%|██▏ | 3847/17285 [34:34:40<121:26:57, 32.54s/it] 22%|██▏ | 3848/17285 [34:35:13<122:10:54, 32.73s/it] 22%|██▏ | 3849/17285 [34:35:44<119:48:27, 32.10s/it] 22%|██▏ | 3850/17285 [34:36:19<123:31:36, 33.10s/it] {'loss': 1.6038, 'learning_rate': 0.00018460254670083103, 'epoch': 0.67} + 22%|██▏ | 3850/17285 [34:36:19<123:31:36, 33.10s/it] 22%|██▏ | 3851/17285 [34:36:45<115:41:18, 31.00s/it] 22%|██▏ | 3852/17285 [34:37:16<115:16:00, 30.89s/it] 22%|██▏ | 3853/17285 [34:37:45<113:31:42, 30.43s/it] 22%|██▏ | 3854/17285 [34:38:27<125:59:48, 33.77s/it] 22%|██▏ | 3855/17285 [34:38:54<118:19:33, 31.72s/it] 22%|██▏ | 3856/17285 [34:39:20<112:27:36, 30.15s/it] 22%|██▏ | 3857/17285 [34:39:52<114:34:20, 30.72s/it] 22%|██▏ | 3858/17285 [34:40:28<119:58:28, 32.17s/it] 22%|██▏ | 3859/17285 [34:41:02<121:55:43, 32.69s/it] 22%|██▏ | 3860/17285 [34:41:30<117:15:02, 31.44s/it] {'loss': 1.712, 'learning_rate': 0.00018450038725486306, 'epoch': 0.67} + 22%|██▏ | 3860/17285 [34:41:30<117:15:02, 31.44s/it] 22%|██▏ | 3861/17285 [34:42:00<115:29:32, 30.97s/it] 22%|██▏ | 3862/17285 [34:42:35<120:23:03, 32.29s/it] 22%|██▏ | 3863/17285 [34:43:04<115:48:00, 31.06s/it] 22%|██▏ | 3864/17285 [34:43:39<120:36:25, 32.35s/it] 22%|██▏ | 3865/17285 [34:44:12<121:47:45, 32.67s/it] 22%|██▏ | 3866/17285 [34:44:42<118:24:37, 31.77s/it] 22%|██▏ | 3867/17285 [34:45:07<110:29:30, 29.64s/it] 22%|██▏ | 3868/17285 [34:45:41<115:37:55, 31.03s/it] 22%|██▏ | 3869/17285 [34:46:17<121:27:26, 32.59s/it] 22%|██▏ | 3870/17285 [34:46:55<127:45:11, 34.28s/it] {'loss': 1.6311, 'learning_rate': 0.00018439791848604253, 'epoch': 0.67} + 22%|██▏ | 3870/17285 [34:46:55<127:45:11, 34.28s/it] 22%|██▏ | 3871/17285 [34:47:29<126:26:17, 33.93s/it] 22%|██▏ | 3872/17285 [34:48:03<126:42:00, 34.01s/it] 22%|██▏ | 3873/17285 [34:48:32<121:31:54, 32.62s/it] 22%|██▏ | 3874/17285 [34:49:08<125:36:11, 33.72s/it] 22%|██▏ | 3875/17285 [34:49:40<123:10:14, 33.07s/it] 22%|██▏ | 3876/17285 [34:50:10<119:20:29, 32.04s/it] 22%|██▏ | 3877/17285 [34:50:50<128:53:28, 34.61s/it] 22%|██▏ | 3878/17285 [34:51:16<118:42:04, 31.87s/it] 22%|██▏ | 3879/17285 [34:51:48<119:16:29, 32.03s/it] 22%|██▏ | 3880/17285 [34:52:23<122:41:30, 32.95s/it] {'loss': 1.626, 'learning_rate': 0.00018429514076946746, 'epoch': 0.67} + 22%|██▏ | 3880/17285 [34:52:23<122:41:30, 32.95s/it] 22%|██▏ | 3881/17285 [34:52:49<115:01:17, 30.89s/it] 22%|██▏ | 3882/17285 [34:53:22<117:22:57, 31.53s/it] 22%|██▏ | 3883/17285 [34:53:51<114:38:30, 30.79s/it] 22%|██▏ | 3884/17285 [34:54:26<118:27:50, 31.82s/it] 22%|██▏ | 3885/17285 [34:54:57<117:49:06, 31.65s/it] 22%|██▏ | 3886/17285 [34:55:27<115:43:58, 31.09s/it] 22%|██▏ | 3887/17285 [34:56:02<120:54:07, 32.49s/it] 22%|██▏ | 3888/17285 [34:56:41<127:45:24, 34.33s/it] 22%|██▏ | 3889/17285 [34:57:11<122:39:21, 32.96s/it] 23%|██▎ | 3890/17285 [34:57:42<120:34:43, 32.41s/it] {'loss': 1.6255, 'learning_rate': 0.00018419205448136686, 'epoch': 0.68} + 23%|██▎ | 3890/17285 [34:57:42<120:34:43, 32.41s/it] 23%|██▎ | 3891/17285 [34:58:21<127:48:09, 34.35s/it] 23%|██▎ | 3892/17285 [34:59:04<137:24:24, 36.93s/it] 23%|██▎ | 3893/17285 [34:59:40<136:08:44, 36.60s/it] 23%|██▎ | 3894/17285 [35:00:09<128:18:23, 34.49s/it] 23%|██▎ | 3895/17285 [35:00:51<136:20:02, 36.65s/it] 23%|██▎ | 3896/17285 [35:01:16<124:01:57, 33.35s/it] 23%|██▎ | 3897/17285 [35:01:46<119:16:12, 32.07s/it] 23%|██▎ | 3898/17285 [35:02:21<122:34:05, 32.96s/it] 23%|██▎ | 3899/17285 [35:02:57<126:25:18, 34.00s/it] 23%|██▎ | 3900/17285 [35:03:27<121:56:29, 32.80s/it] {'loss': 1.6269, 'learning_rate': 0.00018408865999909932, 'epoch': 0.68} + 23%|██▎ | 3900/17285 [35:03:27<121:56:29, 32.80s/it] 23%|██▎ | 3901/17285 [35:03:53<114:07:19, 30.70s/it] 23%|██▎ | 3902/17285 [35:04:27<117:36:07, 31.63s/it] 23%|██▎ | 3903/17285 [35:04:53<112:04:43, 30.15s/it] 23%|██▎ | 3904/17285 [35:05:31<120:27:38, 32.41s/it] 23%|██▎ | 3905/17285 [35:05:59<115:50:23, 31.17s/it] 23%|██▎ | 3906/17285 [35:06:32<117:08:49, 31.52s/it] 23%|██▎ | 3907/17285 [35:07:10<124:26:44, 33.49s/it] 23%|██▎ | 3908/17285 [35:07:42<123:22:19, 33.20s/it] 23%|██▎ | 3909/17285 [35:08:16<123:45:57, 33.31s/it] 23%|██▎ | 3910/17285 [35:08:50<124:26:39, 33.50s/it] {'loss': 1.5649, 'learning_rate': 0.00018398495770115153, 'epoch': 0.68} + 23%|██▎ | 3910/17285 [35:08:50<124:26:39, 33.50s/it] 23%|██▎ | 3911/17285 [35:09:25<126:10:12, 33.96s/it] 23%|██▎ | 3912/17285 [35:09:51<117:33:28, 31.65s/it] 23%|██▎ | 3913/17285 [35:10:17<111:33:50, 30.04s/it] 23%|██▎ | 3914/17285 [35:10:42<106:09:41, 28.58s/it] 23%|██▎ | 3915/17285 [35:11:11<105:33:45, 28.42s/it] 23%|██▎ | 3916/17285 [35:11:38<104:20:23, 28.10s/it] 23%|██▎ | 3917/17285 [35:12:13<111:58:18, 30.15s/it] 23%|██▎ | 3918/17285 [35:12:40<108:17:41, 29.17s/it] 23%|██▎ | 3919/17285 [35:13:16<116:16:14, 31.32s/it] 23%|██▎ | 3920/17285 [35:13:49<118:29:59, 31.92s/it] {'loss': 1.6243, 'learning_rate': 0.0001838809479671371, 'epoch': 0.68} + 23%|██▎ | 3920/17285 [35:13:49<118:29:59, 31.92s/it] 23%|██▎ | 3921/17285 [35:14:26<123:47:38, 33.35s/it] 23%|██▎ | 3922/17285 [35:14:58<121:54:49, 32.84s/it] 23%|██▎ | 3923/17285 [35:15:28<118:42:47, 31.98s/it] 23%|██▎ | 3924/17285 [35:15:55<113:14:19, 30.51s/it] 23%|██▎ | 3925/17285 [35:16:24<112:15:05, 30.25s/it] 23%|██▎ | 3926/17285 [35:16:50<107:36:18, 29.00s/it] 23%|██▎ | 3927/17285 [35:17:21<109:45:29, 29.58s/it] 23%|██▎ | 3928/17285 [35:17:57<116:12:28, 31.32s/it] 23%|██▎ | 3929/17285 [35:18:32<120:01:33, 32.35s/it] 23%|██▎ | 3930/17285 [35:19:00<115:30:21, 31.14s/it] {'loss': 1.6366, 'learning_rate': 0.0001837766311777949, 'epoch': 0.68} + 23%|██▎ | 3930/17285 [35:19:00<115:30:21, 31.14s/it] 23%|██▎ | 3931/17285 [35:19:31<115:32:33, 31.15s/it] 23%|██▎ | 3932/17285 [35:20:01<113:51:45, 30.70s/it] 23%|██▎ | 3933/17285 [35:20:37<120:02:15, 32.36s/it] 23%|██▎ | 3934/17285 [35:21:10<121:18:13, 32.71s/it] 23%|██▎ | 3935/17285 [35:21:45<123:30:28, 33.31s/it] 23%|██▎ | 3936/17285 [35:22:12<116:27:25, 31.41s/it] 23%|██▎ | 3937/17285 [35:22:47<119:58:50, 32.36s/it] 23%|██▎ | 3938/17285 [35:23:13<113:39:46, 30.66s/it] 23%|██▎ | 3939/17285 [35:23:39<108:10:28, 29.18s/it] 23%|██▎ | 3940/17285 [35:24:09<109:20:18, 29.50s/it] {'loss': 1.6387, 'learning_rate': 0.00018367200771498787, 'epoch': 0.68} + 23%|██▎ | 3940/17285 [35:24:09<109:20:18, 29.50s/it] 23%|██▎ | 3941/17285 [35:24:41<111:50:42, 30.17s/it] 23%|██▎ | 3942/17285 [35:25:15<115:44:02, 31.23s/it] 23%|██▎ | 3943/17285 [35:26:02<133:21:48, 35.98s/it] 23%|██▎ | 3944/17285 [35:26:36<131:20:39, 35.44s/it] 23%|██▎ | 3945/17285 [35:27:09<128:47:12, 34.76s/it] 23%|██▎ | 3946/17285 [35:27:34<117:43:43, 31.77s/it] 23%|██▎ | 3947/17285 [35:28:08<120:30:05, 32.52s/it] 23%|██▎ | 3948/17285 [35:28:42<121:50:59, 32.89s/it] 23%|██▎ | 3949/17285 [35:29:16<122:45:13, 33.14s/it] 23%|██▎ | 3950/17285 [35:29:46<119:07:01, 32.16s/it] {'loss': 1.6256, 'learning_rate': 0.00018356707796170161, 'epoch': 0.69} + 23%|██▎ | 3950/17285 [35:29:46<119:07:01, 32.16s/it] 23%|██▎ | 3951/17285 [35:30:23<124:59:03, 33.74s/it] 23%|██▎ | 3952/17285 [35:30:54<121:21:08, 32.77s/it] 23%|██▎ | 3953/17285 [35:31:26<120:37:29, 32.57s/it] 23%|██▎ | 3954/17285 [35:31:55<117:33:54, 31.75s/it] 23%|██▎ | 3955/17285 [35:32:28<118:37:33, 32.04s/it] 23%|██▎ | 3956/17285 [35:33:01<118:58:15, 32.13s/it] 23%|██▎ | 3957/17285 [35:33:26<111:52:17, 30.22s/it] 23%|██▎ | 3958/17285 [35:33:58<113:17:56, 30.61s/it] 23%|██▎ | 3959/17285 [35:34:31<115:55:33, 31.32s/it] 23%|██▎ | 3960/17285 [35:34:59<112:59:47, 30.53s/it] {'loss': 1.6158, 'learning_rate': 0.00018346184230204292, 'epoch': 0.69} + 23%|██▎ | 3960/17285 [35:34:59<112:59:47, 30.53s/it] 23%|██▎ | 3961/17285 [35:35:32<114:43:40, 31.00s/it] 23%|██▎ | 3962/17285 [35:36:06<118:58:22, 32.15s/it] 23%|██▎ | 3963/17285 [35:36:32<112:00:33, 30.27s/it] 23%|██▎ | 3964/17285 [35:36:58<106:46:47, 28.86s/it] 23%|██▎ | 3965/17285 [35:37:29<108:51:17, 29.42s/it] 23%|██▎ | 3966/17285 [35:38:01<112:39:09, 30.45s/it] 23%|██▎ | 3967/17285 [35:38:36<117:36:44, 31.79s/it] 23%|██▎ | 3968/17285 [35:39:09<118:13:37, 31.96s/it] 23%|██▎ | 3969/17285 [35:39:47<125:13:28, 33.85s/it] 23%|██▎ | 3970/17285 [35:40:15<118:23:43, 32.01s/it] {'loss': 1.6103, 'learning_rate': 0.0001833563011212383, 'epoch': 0.69} + 23%|██▎ | 3970/17285 [35:40:15<118:23:43, 32.01s/it] 23%|██▎ | 3971/17285 [35:40:54<125:58:11, 34.06s/it] 23%|██▎ | 3972/17285 [35:41:26<124:21:25, 33.63s/it] 23%|██▎ | 3973/17285 [35:41:58<122:30:30, 33.13s/it] 23%|██▎ | 3974/17285 [35:42:26<116:43:30, 31.57s/it] 23%|██▎ | 3975/17285 [35:42:54<112:29:53, 30.43s/it] 23%|██▎ | 3976/17285 [35:43:29<117:48:15, 31.87s/it] 23%|██▎ | 3977/17285 [35:44:07<125:07:13, 33.85s/it] 23%|██▎ | 3978/17285 [35:44:41<125:02:27, 33.83s/it] 23%|██▎ | 3979/17285 [35:45:07<116:34:53, 31.54s/it] 23%|██▎ | 3980/17285 [35:45:38<115:26:35, 31.24s/it] {'loss': 1.6038, 'learning_rate': 0.00018325045480563273, 'epoch': 0.69} + 23%|██▎ | 3980/17285 [35:45:38<115:26:35, 31.24s/it] 23%|██▎ | 3981/17285 [35:46:03<108:36:15, 29.39s/it] 23%|██▎ | 3982/17285 [35:46:34<110:25:30, 29.88s/it] 23%|██▎ | 3983/17285 [35:47:07<113:56:51, 30.84s/it] 23%|██▎ | 3984/17285 [35:47:36<111:47:54, 30.26s/it] 23%|██▎ | 3985/17285 [35:48:03<107:37:02, 29.13s/it] 23%|██▎ | 3986/17285 [35:48:33<109:00:03, 29.51s/it] 23%|██▎ | 3987/17285 [35:49:06<113:01:57, 30.60s/it] 23%|██▎ | 3988/17285 [35:49:41<117:32:05, 31.82s/it] 23%|██▎ | 3989/17285 [35:50:07<111:10:55, 30.10s/it] 23%|██▎ | 3990/17285 [35:50:39<113:24:12, 30.71s/it] {'loss': 1.5909, 'learning_rate': 0.00018314430374268817, 'epoch': 0.69} + 23%|██▎ | 3990/17285 [35:50:39<113:24:12, 30.71s/it] 23%|██▎ | 3991/17285 [35:51:09<112:30:53, 30.47s/it] 23%|██▎ | 3992/17285 [35:51:38<111:31:42, 30.20s/it] 23%|██▎ | 3993/17285 [35:52:09<111:58:48, 30.33s/it] 23%|██▎ | 3994/17285 [35:52:44<116:28:50, 31.55s/it] 23%|██▎ | 3995/17285 [35:53:11<111:45:32, 30.27s/it] 23%|██▎ | 3996/17285 [35:53:36<106:02:34, 28.73s/it] 23%|██▎ | 3997/17285 [35:54:17<120:00:33, 32.51s/it] 23%|██▎ | 3998/17285 [35:54:57<127:47:06, 34.62s/it] 23%|██▎ | 3999/17285 [35:55:39<135:42:50, 36.77s/it] 23%|██▎ | 4000/17285 [35:56:07<126:41:27, 34.33s/it] {'loss': 1.612, 'learning_rate': 0.0001830378483209821, 'epoch': 0.69} + 23%|██▎ | 4000/17285 [35:56:07<126:41:27, 34.33s/it][INFO|trainer.py:3081] 2023-08-24 11:50:44,955 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-24 11:50:44,956 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-24 11:50:44,956 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-1000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-4000 +[INFO|tokenization_utils_base.py:2210] 2023-08-24 11:52:11,186 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-4000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-24 11:52:11,190 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-4000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-4000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-4000 + 23%|██▎ | 4001/17285 [35:58:16<231:15:07, 62.67s/it] 23%|██▎ | 4002/17285 [35:58:45<194:04:52, 52.60s/it] 23%|██▎ | 4003/17285 [35:59:19<173:02:43, 46.90s/it] 23%|██▎ | 4004/17285 [35:59:47<152:28:28, 41.33s/it] 23%|██▎ | 4005/17285 [36:00:22<145:25:43, 39.42s/it] 23%|██▎ | 4006/17285 [36:00:53<135:41:02, 36.78s/it] 23%|██▎ | 4007/17285 [36:01:22<127:09:34, 34.48s/it] 23%|██▎ | 4008/17285 [36:01:55<125:40:42, 34.08s/it] 23%|██▎ | 4009/17285 [36:02:25<121:32:52, 32.96s/it] 23%|██▎ | 4010/17285 [36:02:57<120:16:05, 32.62s/it] {'loss': 1.6545, 'learning_rate': 0.0001829310889302062, 'epoch': 0.7} + 23%|██▎ | 4010/17285 [36:02:57<120:16:05, 32.62s/it][2023-08-24 11:58:02,775] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 23%|██▎ | 4011/17285 [36:03:25<115:10:05, 31.23s/it] 23%|██▎ | 4012/17285 [36:03:55<113:16:03, 30.72s/it] 23%|██▎ | 4013/17285 [36:04:23<110:44:34, 30.04s/it] 23%|██▎ | 4014/17285 [36:04:50<107:24:38, 29.14s/it] 23%|██▎ | 4015/17285 [36:05:28<117:11:37, 31.79s/it] 23%|██▎ | 4016/17285 [36:05:54<110:37:33, 30.01s/it] 23%|██▎ | 4017/17285 [36:06:20<106:36:47, 28.93s/it][2023-08-24 12:01:35,566] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 23%|██▎ | 4018/17285 [36:06:58<116:07:50, 31.51s/it] 23%|██▎ | 4019/17285 [36:07:33<120:10:06, 32.61s/it] 23%|██▎ | 4020/17285 [36:08:07<122:09:52, 33.15s/it] {'loss': 1.6391, 'learning_rate': 0.00018284546282243836, 'epoch': 0.7} + 23%|██▎ | 4020/17285 [36:08:09<122:09:52, 33.15s/it] 23%|██▎ | 4021/17285 [36:08:35<116:19:19, 31.57s/it] 23%|██▎ | 4022/17285 [36:09:11<120:44:28, 32.77s/it] 23%|██▎ | 4023/17285 [36:09:44<121:17:48, 32.93s/it] 23%|██▎ | 4024/17285 [36:10:26<130:53:38, 35.53s/it] 23%|██▎ | 4025/17285 [36:10:58<126:39:58, 34.39s/it] 23%|██▎ | 4026/17285 [36:11:22<116:02:43, 31.51s/it] 23%|██▎ | 4027/17285 [36:11:52<113:58:59, 30.95s/it] 23%|██▎ | 4028/17285 [36:12:27<118:11:24, 32.10s/it] 23%|██▎ | 4029/17285 [36:13:06<125:39:22, 34.13s/it] 23%|██▎ | 4030/17285 [36:13:30<115:20:07, 31.32s/it] {'loss': 1.6237, 'learning_rate': 0.00018273815727291054, 'epoch': 0.7} + 23%|██▎ | 4030/17285 [36:13:30<115:20:07, 31.32s/it] 23%|██▎ | 4031/17285 [36:14:17<132:29:05, 35.99s/it] 23%|██▎ | 4032/17285 [36:14:44<121:57:12, 33.13s/it] 23%|██▎ | 4033/17285 [36:15:18<123:34:44, 33.57s/it] 23%|██▎ | 4034/17285 [36:15:50<120:57:54, 32.86s/it] 23%|██▎ | 4035/17285 [36:16:28<126:53:39, 34.48s/it] 23%|██▎ | 4036/17285 [36:16:55<118:53:24, 32.30s/it] 23%|██▎ | 4037/17285 [36:17:24<115:28:57, 31.38s/it] 23%|██▎ | 4038/17285 [36:17:55<114:42:06, 31.17s/it] 23%|██▎ | 4039/17285 [36:18:30<118:46:21, 32.28s/it] 23%|██▎ | 4040/17285 [36:19:02<118:10:37, 32.12s/it] {'loss': 1.6281, 'learning_rate': 0.00018263054885136454, 'epoch': 0.7} + 23%|██▎ | 4040/17285 [36:19:02<118:10:37, 32.12s/it] 23%|██▎ | 4041/17285 [36:19:34<118:17:05, 32.15s/it] 23%|██▎ | 4042/17285 [36:20:08<120:42:39, 32.81s/it] 23%|██▎ | 4043/17285 [36:20:35<113:39:04, 30.90s/it] 23%|██▎ | 4044/17285 [36:21:12<120:43:42, 32.82s/it] 23%|██▎ | 4045/17285 [36:21:41<116:38:44, 31.72s/it] 23%|██▎ | 4046/17285 [36:22:17<121:09:07, 32.94s/it] 23%|██▎ | 4047/17285 [36:22:55<126:32:45, 34.41s/it] 23%|██▎ | 4048/17285 [36:23:34<131:49:09, 35.85s/it] 23%|██▎ | 4049/17285 [36:24:07<128:58:25, 35.08s/it] 23%|██▎ | 4050/17285 [36:24:39<125:42:31, 34.19s/it] {'loss': 1.6102, 'learning_rate': 0.00018252263795171263, 'epoch': 0.7} + 23%|██▎ | 4050/17285 [36:24:39<125:42:31, 34.19s/it] 23%|██▎ | 4051/17285 [36:25:11<122:54:30, 33.43s/it] 23%|██▎ | 4052/17285 [36:25:45<123:33:35, 33.61s/it] 23%|██▎ | 4053/17285 [36:26:15<119:44:07, 32.58s/it] 23%|██▎ | 4054/17285 [36:26:46<118:07:39, 32.14s/it] 23%|██▎ | 4055/17285 [36:27:23<123:07:26, 33.50s/it] 23%|██▎ | 4056/17285 [36:27:56<122:39:06, 33.38s/it] 23%|██▎ | 4057/17285 [36:28:36<130:07:20, 35.41s/it] 23%|██▎ | 4058/17285 [36:29:09<127:20:45, 34.66s/it] 23%|██▎ | 4059/17285 [36:29:39<121:35:39, 33.10s/it] 23%|██▎ | 4060/17285 [36:30:13<122:42:26, 33.40s/it] {'loss': 1.6246, 'learning_rate': 0.00018241442496897444, 'epoch': 0.7} + 23%|██▎ | 4060/17285 [36:30:13<122:42:26, 33.40s/it] 23%|██▎ | 4061/17285 [36:30:52<129:32:05, 35.26s/it] 24%|██▎ | 4062/17285 [36:31:27<128:44:47, 35.05s/it] 24%|██▎ | 4063/17285 [36:31:57<123:45:45, 33.70s/it] 24%|██▎ | 4064/17285 [36:32:29<122:03:13, 33.23s/it] 24%|██▎ | 4065/17285 [36:33:00<119:23:33, 32.51s/it] 24%|██▎ | 4066/17285 [36:33:31<117:25:44, 31.98s/it] 24%|██▎ | 4067/17285 [36:34:03<116:57:56, 31.86s/it] 24%|██▎ | 4068/17285 [36:34:33<115:51:58, 31.56s/it] 24%|██▎ | 4069/17285 [36:35:01<111:40:53, 30.42s/it] 24%|██▎ | 4070/17285 [36:35:38<118:17:07, 32.22s/it] {'loss': 1.5991, 'learning_rate': 0.00018230591029927537, 'epoch': 0.71} + 24%|██▎ | 4070/17285 [36:35:38<118:17:07, 32.22s/it] 24%|██▎ | 4071/17285 [36:36:09<117:21:30, 31.97s/it] 24%|██▎ | 4072/17285 [36:36:36<111:26:32, 30.36s/it] 24%|██▎ | 4073/17285 [36:37:18<125:01:53, 34.07s/it] 24%|██▎ | 4074/17285 [36:37:47<118:44:29, 32.36s/it] 24%|██▎ | 4075/17285 [36:38:21<120:22:20, 32.80s/it] 24%|██▎ | 4076/17285 [36:39:10<138:50:04, 37.84s/it] 24%|██▎ | 4077/17285 [36:39:40<129:57:35, 35.42s/it] 24%|██▎ | 4078/17285 [36:40:19<133:52:00, 36.49s/it] 24%|██▎ | 4079/17285 [36:40:56<133:58:49, 36.52s/it] 24%|██▎ | 4080/17285 [36:41:28<129:31:28, 35.31s/it] {'loss': 1.6252, 'learning_rate': 0.00018219709433984512, 'epoch': 0.71} + 24%|██▎ | 4080/17285 [36:41:28<129:31:28, 35.31s/it] 24%|██▎ | 4081/17285 [36:41:59<125:12:40, 34.14s/it] 24%|██▎ | 4082/17285 [36:42:30<121:21:43, 33.09s/it] 24%|██▎ | 4083/17285 [36:43:02<119:55:30, 32.70s/it] 24%|██��� | 4084/17285 [36:43:37<122:34:00, 33.42s/it] 24%|██▎ | 4085/17285 [36:44:06<118:06:00, 32.21s/it] 24%|██▎ | 4086/17285 [36:44:35<114:30:05, 31.23s/it] 24%|██▎ | 4087/17285 [36:45:11<119:35:33, 32.62s/it] 24%|██▎ | 4088/17285 [36:45:42<117:36:07, 32.08s/it] 24%|██▎ | 4089/17285 [36:46:21<125:14:02, 34.17s/it] 24%|██▎ | 4090/17285 [36:46:51<121:04:47, 33.03s/it] {'loss': 1.6047, 'learning_rate': 0.00018208797748901637, 'epoch': 0.71} + 24%|██▎ | 4090/17285 [36:46:51<121:04:47, 33.03s/it] 24%|██▎ | 4091/17285 [36:47:25<121:43:39, 33.21s/it] 24%|██▎ | 4092/17285 [36:47:55<118:31:49, 32.34s/it] 24%|██▎ | 4093/17285 [36:48:26<116:25:14, 31.77s/it] 24%|██▎ | 4094/17285 [36:49:00<118:54:17, 32.45s/it] 24%|██▎ | 4095/17285 [36:49:30<116:36:11, 31.82s/it] 24%|██▎ | 4096/17285 [36:50:06<120:43:54, 32.95s/it] 24%|██▎ | 4097/17285 [36:50:44<126:44:12, 34.60s/it] 24%|██▎ | 4098/17285 [36:51:14<121:58:25, 33.30s/it] 24%|██▎ | 4099/17285 [36:51:42<115:12:57, 31.46s/it] 24%|██▎ | 4100/17285 [36:52:08<109:03:47, 29.78s/it] {'loss': 1.6173, 'learning_rate': 0.0001819785601462232, 'epoch': 0.71} + 24%|██▎ | 4100/17285 [36:52:08<109:03:47, 29.78s/it] 24%|██▎ | 4101/17285 [36:52:42<114:21:17, 31.23s/it] 24%|██▎ | 4102/17285 [36:53:15<116:09:25, 31.72s/it] 24%|██▎ | 4103/17285 [36:53:48<117:06:14, 31.98s/it] 24%|██▎ | 4104/17285 [36:54:21<118:41:32, 32.42s/it] 24%|██▎ | 4105/17285 [36:55:02<128:00:18, 34.96s/it] 24%|██▍ | 4106/17285 [36:55:34<125:20:12, 34.24s/it] 24%|██▍ | 4107/17285 [36:56:08<124:59:20, 34.14s/it] 24%|██▍ | 4108/17285 [36:56:37<119:23:15, 32.62s/it] 24%|██▍ | 4109/17285 [36:57:09<117:44:34, 32.17s/it] 24%|██▍ | 4110/17285 [36:57:43<119:57:23, 32.78s/it] {'loss': 1.5678, 'learning_rate': 0.00018186884271199967, 'epoch': 0.71} + 24%|██▍ | 4110/17285 [36:57:43<119:57:23, 32.78s/it] 24%|██▍ | 4111/17285 [36:58:13<116:55:02, 31.95s/it] 24%|██▍ | 4112/17285 [36:58:47<119:53:41, 32.77s/it] 24%|██▍ | 4113/17285 [36:59:14<112:59:59, 30.88s/it] 24%|██▍ | 4114/17285 [36:59:46<114:25:56, 31.28s/it] 24%|██▍ | 4115/17285 [37:00:16<112:26:38, 30.74s/it] 24%|██▍ | 4116/17285 [37:00:54<120:58:02, 33.07s/it] 24%|██▍ | 4117/17285 [37:01:21<113:37:25, 31.06s/it] 24%|██▍ | 4118/17285 [37:01:51<113:00:14, 30.90s/it] 24%|██▍ | 4119/17285 [37:02:34<126:16:04, 34.53s/it] 24%|██▍ | 4120/17285 [37:03:10<127:53:31, 34.97s/it] {'loss': 1.6143, 'learning_rate': 0.0001817588255879784, 'epoch': 0.72} + 24%|██▍ | 4120/17285 [37:03:10<127:53:31, 34.97s/it] 24%|██▍ | 4121/17285 [37:03:41<123:52:50, 33.88s/it] 24%|██▍ | 4122/17285 [37:04:16<124:50:01, 34.14s/it] 24%|██▍ | 4123/17285 [37:04:53<127:29:07, 34.87s/it] 24%|██▍ | 4124/17285 [37:05:17<116:20:26, 31.82s/it] 24%|██▍ | 4125/17285 [37:05:48<114:39:56, 31.37s/it] 24%|██▍ | 4126/17285 [37:06:22<117:59:11, 32.28s/it] 24%|██▍ | 4127/17285 [37:06:59<122:48:37, 33.60s/it] 24%|██▍ | 4128/17285 [37:07:32<122:25:39, 33.50s/it] 24%|██▍ | 4129/17285 [37:08:03<119:08:07, 32.60s/it] 24%|██▍ | 4130/17285 [37:08:38<122:47:35, 33.60s/it] {'loss': 1.6248, 'learning_rate': 0.000181648509176889, 'epoch': 0.72} + 24%|██▍ | 4130/17285 [37:08:39<122:47:35, 33.60s/it] 24%|██▍ | 4131/17285 [37:09:09<118:51:32, 32.53s/it] 24%|██▍ | 4132/17285 [37:09:48<126:13:09, 34.55s/it] 24%|██▍ | 4133/17285 [37:10:20<123:28:49, 33.80s/it] 24%|██▍ | 4134/17285 [37:10:47<116:14:39, 31.82s/it] 24%|██▍ | 4135/17285 [37:11:17<114:10:13, 31.26s/it] 24%|██▍ | 4136/17285 [37:11:52<118:09:37, 32.35s/it] 24%|██▍ | 4137/17285 [37:12:20<113:05:34, 30.97s/it] 24%|██▍ | 4138/17285 [37:12:48<110:11:38, 30.17s/it] 24%|██▍ | 4139/17285 [37:13:23<116:03:45, 31.78s/it] 24%|██▍ | 4140/17285 [37:13:54<114:25:00, 31.34s/it] {'loss': 1.6552, 'learning_rate': 0.00018153789388255677, 'epoch': 0.72} + 24%|██▍ | 4140/17285 [37:13:54<114:25:00, 31.34s/it] 24%|██▍ | 4141/17285 [37:14:32<122:18:47, 33.50s/it] 24%|██▍ | 4142/17285 [37:15:07<123:24:19, 33.80s/it] 24%|██▍ | 4143/17285 [37:15:44<127:02:59, 34.80s/it] 24%|██▍ | 4144/17285 [37:16:15<122:59:27, 33.69s/it] 24%|██▍ | 4145/17285 [37:16:41<114:25:22, 31.35s/it] 24%|██▍ | 4146/17285 [37:17:13<114:40:08, 31.42s/it] 24%|██▍ | 4147/17285 [37:17:44<115:15:23, 31.58s/it] 24%|██▍ | 4148/17285 [37:18:17<116:00:08, 31.79s/it] 24%|██▍ | 4149/17285 [37:18:50<118:00:56, 32.34s/it] 24%|██▍ | 4150/17285 [37:19:27<122:52:24, 33.68s/it] {'loss': 1.626, 'learning_rate': 0.0001814269801099009, 'epoch': 0.72} + 24%|██▍ | 4150/17285 [37:19:27<122:52:24, 33.68s/it] 24%|██▍ | 4151/17285 [37:20:10<132:37:20, 36.35s/it] 24%|██▍ | 4152/17285 [37:20:44<129:48:42, 35.58s/it] 24%|██▍ | 4153/17285 [37:21:12<121:32:23, 33.32s/it] 24%|██▍ | 4154/17285 [37:21:44<120:39:28, 33.08s/it] 24%|██▍ | 4155/17285 [37:22:13<115:34:11, 31.69s/it] 24%|██▍ | 4156/17285 [37:22:44<115:44:26, 31.74s/it] 24%|██▍ | 4157/17285 [37:23:17<117:01:41, 32.09s/it] 24%|██▍ | 4158/17285 [37:23:52<120:07:28, 32.94s/it] 24%|██▍ | 4159/17285 [37:24:35<130:22:28, 35.76s/it] 24%|██▍ | 4160/17285 [37:25:09<129:19:13, 35.47s/it] {'loss': 1.6096, 'learning_rate': 0.00018131576826493337, 'epoch': 0.72} + 24%|██▍ | 4160/17285 [37:25:09<129:19:13, 35.47s/it] 24%|██▍ | 4161/17285 [37:25:39<123:22:48, 33.84s/it] 24%|██▍ | 4162/17285 [37:26:11<120:42:28, 33.11s/it] 24%|██▍ | 4163/17285 [37:26:44<120:44:28, 33.13s/it] 24%|██▍ | 4164/17285 [37:27:12<115:10:45, 31.60s/it] 24%|██▍ | 4165/17285 [37:27:45<116:35:24, 31.99s/it] 24%|██▍ | 4166/17285 [37:28:15<114:51:05, 31.52s/it] 24%|██▍ | 4167/17285 [37:28:42<109:08:10, 29.95s/it] 24%|██▍ | 4168/17285 [37:29:16<113:56:16, 31.27s/it] 24%|██▍ | 4169/17285 [37:29:50<116:22:50, 31.94s/it] 24%|██▍ | 4170/17285 [37:30:27<122:20:49, 33.58s/it] {'loss': 1.6182, 'learning_rate': 0.00018120425875475723, 'epoch': 0.72} + 24%|██▍ | 4170/17285 [37:30:27<122:20:49, 33.58s/it] 24%|██▍ | 4171/17285 [37:30:59<120:50:03, 33.17s/it] 24%|██▍ | 4172/17285 [37:31:34<123:06:37, 33.80s/it] 24%|██▍ | 4173/17285 [37:32:06<120:28:39, 33.08s/it] 24%|██▍ | 4174/17285 [37:32:32<112:45:27, 30.96s/it] 24%|██▍ | 4175/17285 [37:32:58<107:19:41, 29.47s/it] 24%|██▍ | 4176/17285 [37:33:38<119:21:39, 32.78s/it] 24%|██▍ | 4177/17285 [37:34:07<115:25:18, 31.70s/it] 24%|██▍ | 4178/17285 [37:34:43<119:39:34, 32.87s/it] 24%|██▍ | 4179/17285 [37:35:09<111:55:19, 30.74s/it] 24%|██▍ | 4180/17285 [37:35:38<110:06:21, 30.25s/it] {'loss': 1.6014, 'learning_rate': 0.00018109245198756518, 'epoch': 0.73} + 24%|██▍ | 4180/17285 [37:35:38<110:06:21, 30.25s/it] 24%|██▍ | 4181/17285 [37:36:12<114:34:51, 31.48s/it] 24%|██▍ | 4182/17285 [37:36:37<107:20:35, 29.49s/it] 24%|██▍ | 4183/17285 [37:37:04<104:12:21, 28.63s/it] 24%|██▍ | 4184/17285 [37:37:39<111:22:06, 30.60s/it] 24%|██▍ | 4185/17285 [37:38:17<119:06:06, 32.73s/it] 24%|██▍ | 4186/17285 [37:38:50<120:07:43, 33.02s/it] 24%|██▍ | 4187/17285 [37:39:23<119:57:16, 32.97s/it][2023-08-24 13:34:37,537] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 24%|██▍ | 4188/17285 [37:40:00<123:55:57, 34.07s/it] 24%|██▍ | 4189/17285 [37:40:35<124:47:17, 34.30s/it] 24%|██▍ | 4190/17285 [37:41:07<122:18:14, 33.62s/it] {'loss': 1.5923, 'learning_rate': 0.00018099157208059183, 'epoch': 0.73} + 24%|██▍ | 4190/17285 [37:41:07<122:18:14, 33.62s/it] 24%|██▍ | 4191/17285 [37:41:41<122:39:49, 33.72s/it] 24%|██▍ | 4192/17285 [37:42:16<124:04:49, 34.12s/it] 24%|██▍ | 4193/17285 [37:42:45<118:19:28, 32.54s/it] 24%|██▍ | 4194/17285 [37:43:17<117:45:28, 32.38s/it] 24%|██▍ | 4195/17285 [37:43:46<114:40:04, 31.54s/it] 24%|██▍ | 4196/17285 [37:44:25<122:23:19, 33.66s/it] 24%|██▍ | 4197/17285 [37:45:02<126:19:22, 34.75s/it] 24%|██▍ | 4198/17285 [37:45:34<122:54:19, 33.81s/it] 24%|██▍ | 4199/17285 [37:46:03<117:45:16, 32.39s/it] 24%|██▍ | 4200/17285 [37:46:29<111:32:25, 30.69s/it] {'loss': 1.5841, 'learning_rate': 0.0001808792016535363, 'epoch': 0.73} + 24%|██▍ | 4200/17285 [37:46:29<111:32:25, 30.69s/it] 24%|██▍ | 4201/17285 [37:47:02<113:39:19, 31.27s/it] 24%|██▍ | 4202/17285 [37:47:32<111:53:37, 30.79s/it] 24%|██▍ | 4203/17285 [37:48:09<118:26:38, 32.59s/it] 24%|██▍ | 4204/17285 [37:48:36<112:46:28, 31.04s/it] 24%|██▍ | 4205/17285 [37:49:15<120:53:51, 33.27s/it] 24%|██▍ | 4206/17285 [37:49:44<116:28:52, 32.06s/it] 24%|██▍ | 4207/17285 [37:50:11<111:28:33, 30.69s/it] 24%|██▍ | 4208/17285 [37:50:43<112:30:09, 30.97s/it] 24%|██▍ | 4209/17285 [37:51:20<118:57:25, 32.75s/it] 24%|██▍ | 4210/17285 [37:51:49<114:59:21, 31.66s/it] {'loss': 1.5837, 'learning_rate': 0.00018076653515937166, 'epoch': 0.73} + 24%|██▍ | 4210/17285 [37:51:49<114:59:21, 31.66s/it] 24%|██▍ | 4211/17285 [37:52:14<107:26:19, 29.58s/it] 24%|██▍ | 4212/17285 [37:52:42<105:48:59, 29.14s/it] 24%|██▍ | 4213/17285 [37:53:14<108:48:57, 29.97s/it] 24%|██▍ | 4214/17285 [37:53:54<119:49:02, 33.00s/it] 24%|██▍ | 4215/17285 [37:54:24<116:24:33, 32.06s/it] 24%|██▍ | 4216/17285 [37:54:54<114:42:20, 31.60s/it] 24%|██▍ | 4217/17285 [37:55:25<113:45:56, 31.34s/it] 24%|██▍ | 4218/17285 [37:55:56<113:40:03, 31.32s/it] 24%|██▍ | 4219/17285 [37:56:33<119:27:39, 32.91s/it] 24%|██▍ | 4220/17285 [37:57:13<127:57:01, 35.26s/it] {'loss': 1.6354, 'learning_rate': 0.00018065357301052593, 'epoch': 0.73} + 24%|██▍ | 4220/17285 [37:57:13<127:57:01, 35.26s/it] 24%|██▍ | 4221/17285 [37:57:47<126:04:45, 34.74s/it] 24%|██▍ | 4222/17285 [37:58:16<120:01:13, 33.08s/it] 24%|██▍ | 4223/17285 [37:58:53<123:59:20, 34.17s/it] 24%|██▍ | 4224/17285 [37:59:32<129:28:39, 35.69s/it] 24%|██▍ | 4225/17285 [38:00:09<130:37:55, 36.01s/it] 24%|██▍ | 4226/17285 [38:00:38<123:32:38, 34.06s/it] 24%|██▍ | 4227/17285 [38:01:13<124:07:41, 34.22s/it] 24%|██▍ | 4228/17285 [38:01:45<122:14:18, 33.70s/it] 24%|██▍ | 4229/17285 [38:02:30<133:50:45, 36.91s/it] 24%|██▍ | 4230/17285 [38:03:06<132:31:23, 36.54s/it] {'loss': 1.6433, 'learning_rate': 0.00018054031562050928, 'epoch': 0.73} + 24%|██▍ | 4230/17285 [38:03:06<132:31:23, 36.54s/it] 24%|██▍ | 4231/17285 [38:03:41<130:47:38, 36.07s/it] 24%|██▍ | 4232/17285 [38:04:15<129:09:38, 35.62s/it] 24%|██▍ | 4233/17285 [38:04:54<132:12:43, 36.47s/it] 24%|██▍ | 4234/17285 [38:05:21<122:46:13, 33.87s/it] 25%|██▍ | 4235/17285 [38:05:53<120:40:22, 33.29s/it] 25%|██▍ | 4236/17285 [38:06:27<120:59:21, 33.38s/it] 25%|██▍ | 4237/17285 [38:06:56<116:48:34, 32.23s/it] 25%|██▍ | 4238/17285 [38:07:37<125:54:29, 34.74s/it] 25%|██▍ | 4239/17285 [38:08:07<120:24:58, 33.23s/it] 25%|██▍ | 4240/17285 [38:08:35<114:47:05, 31.68s/it] {'loss': 1.6109, 'learning_rate': 0.0001804267634039127, 'epoch': 0.74} + 25%|██▍ | 4240/17285 [38:08:35<114:47:05, 31.68s/it] 25%|██▍ | 4241/17285 [38:09:05<113:16:17, 31.26s/it] 25%|██▍ | 4242/17285 [38:09:33<109:17:46, 30.17s/it] 25%|██▍ | 4243/17285 [38:10:02<108:32:11, 29.96s/it] 25%|██▍ | 4244/17285 [38:10:35<111:10:07, 30.69s/it] 25%|██▍ | 4245/17285 [38:11:03<109:02:20, 30.10s/it] 25%|██▍ | 4246/17285 [38:11:31<106:43:48, 29.47s/it] 25%|██▍ | 4247/17285 [38:12:05<111:10:03, 30.70s/it] 25%|██▍ | 4248/17285 [38:12:38<114:03:22, 31.50s/it][2023-08-24 14:07:40,062] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 25%|██▍ | 4249/17285 [38:13:02<106:07:17, 29.31s/it] 25%|██▍ | 4250/17285 [38:13:35<109:52:00, 30.34s/it] {'loss': 1.6059, 'learning_rate': 0.0001803243146757791, 'epoch': 0.74} + 25%|██▍ | 4250/17285 [38:13:35<109:52:00, 30.34s/it] 25%|██▍ | 4251/17285 [38:14:09<113:21:22, 31.31s/it] 25%|██▍ | 4252/17285 [38:14:45<118:40:14, 32.78s/it] 25%|██▍ | 4253/17285 [38:15:19<119:44:25, 33.08s/it] 25%|██▍ | 4254/17285 [38:15:51<118:24:59, 32.71s/it] 25%|██▍ | 4255/17285 [38:16:18<112:20:33, 31.04s/it] 25%|██▍ | 4256/17285 [38:16:46<109:38:45, 30.30s/it] 25%|██▍ | 4257/17285 [38:17:23<116:12:03, 32.11s/it] 25%|██▍ | 4258/17285 [38:17:50<111:04:24, 30.70s/it] 25%|██▍ | 4259/17285 [38:18:15<104:54:39, 28.99s/it] 25%|██▍ | 4260/17285 [38:18:42<102:54:53, 28.44s/it] {'loss': 1.6127, 'learning_rate': 0.00018021020343474294, 'epoch': 0.74} + 25%|██▍ | 4260/17285 [38:18:42<102:54:53, 28.44s/it] 25%|██▍ | 4261/17285 [38:19:16<108:53:52, 30.10s/it] 25%|██▍ | 4262/17285 [38:19:46<108:19:22, 29.94s/it] 25%|██▍ | 4263/17285 [38:20:24<117:16:35, 32.42s/it] 25%|██▍ | 4264/17285 [38:21:00<121:26:52, 33.58s/it] 25%|██▍ | 4265/17285 [38:21:39<127:11:04, 35.17s/it] 25%|██▍ | 4266/17285 [38:22:08<120:48:14, 33.40s/it] 25%|██▍ | 4267/17285 [38:22:45<124:18:29, 34.38s/it] 25%|██▍ | 4268/17285 [38:23:17<121:25:22, 33.58s/it] 25%|██▍ | 4269/17285 [38:23:46<116:16:14, 32.16s/it] 25%|██▍ | 4270/17285 [38:24:22<120:33:58, 33.35s/it] {'loss': 1.6006, 'learning_rate': 0.0001800957985755384, 'epoch': 0.74} + 25%|██▍ | 4270/17285 [38:24:22<120:33:58, 33.35s/it] 25%|██▍ | 4271/17285 [38:24:51<116:30:54, 32.23s/it] 25%|██▍ | 4272/17285 [38:25:29<122:04:38, 33.77s/it] 25%|██▍ | 4273/17285 [38:26:00<119:42:20, 33.12s/it] 25%|██▍ | 4274/17285 [38:26:28<113:45:24, 31.48s/it] 25%|██▍ | 4275/17285 [38:26:56<109:43:17, 30.36s/it] 25%|██▍ | 4276/17285 [38:27:21<103:58:30, 28.77s/it] 25%|██▍ | 4277/17285 [38:27:48<102:22:51, 28.33s/it] 25%|██▍ | 4278/17285 [38:28:16<101:26:33, 28.08s/it] 25%|██▍ | 4279/17285 [38:28:50<107:51:52, 29.86s/it] 25%|██▍ | 4280/17285 [38:29:18<106:45:57, 29.55s/it] {'loss': 1.5927, 'learning_rate': 0.00017998110051695688, 'epoch': 0.74} + 25%|██▍ | 4280/17285 [38:29:18<106:45:57, 29.55s/it] 25%|██▍ | 4281/17285 [38:29:49<108:11:45, 29.95s/it] 25%|██▍ | 4282/17285 [38:30:20<108:58:25, 30.17s/it] 25%|██▍ | 4283/17285 [38:30:51<110:19:15, 30.55s/it] 25%|██▍ | 4284/17285 [38:31:24<112:15:28, 31.08s/it] 25%|██▍ | 4285/17285 [38:32:05<123:32:22, 34.21s/it] 25%|██▍ | 4286/17285 [38:32:36<119:53:16, 33.20s/it] 25%|██▍ | 4287/17285 [38:33:06<116:09:53, 32.17s/it] 25%|██▍ | 4288/17285 [38:33:39<116:44:01, 32.33s/it] 25%|██▍ | 4289/17285 [38:34:16<122:11:59, 33.85s/it] 25%|██▍ | 4290/17285 [38:34:45<117:23:50, 32.52s/it] {'loss': 1.6142, 'learning_rate': 0.0001798661096788631, 'epoch': 0.74} + 25%|██▍ | 4290/17285 [38:34:45<117:23:50, 32.52s/it] 25%|██▍ | 4291/17285 [38:35:17<116:12:27, 32.20s/it] 25%|██▍ | 4292/17285 [38:35:52<118:56:17, 32.95s/it] 25%|██▍ | 4293/17285 [38:36:18<112:12:30, 31.09s/it] 25%|██▍ | 4294/17285 [38:36:48<111:09:18, 30.80s/it] 25%|██▍ | 4295/17285 [38:37:16<108:00:31, 29.93s/it] 25%|██▍ | 4296/17285 [38:37:51<113:26:25, 31.44s/it] 25%|██▍ | 4297/17285 [38:38:20<110:47:04, 30.71s/it] 25%|██▍ | 4298/17285 [38:38:50<109:14:43, 30.28s/it] 25%|██▍ | 4299/17285 [38:39:27<116:51:26, 32.40s/it] 25%|██▍ | 4300/17285 [38:40:06<123:57:46, 34.37s/it] {'loss': 1.6272, 'learning_rate': 0.00017975082648219356, 'epoch': 0.75} + 25%|██▍ | 4300/17285 [38:40:06<123:57:46, 34.37s/it] 25%|██▍ | 4301/17285 [38:40:40<123:32:12, 34.25s/it] 25%|██▍ | 4302/17285 [38:41:07<115:58:48, 32.16s/it] 25%|██▍ | 4303/17285 [38:41:36<112:13:19, 31.12s/it] 25%|██▍ | 4304/17285 [38:42:11<116:54:10, 32.42s/it] 25%|██▍ | 4305/17285 [38:42:46<119:00:48, 33.01s/it] 25%|██▍ | 4306/17285 [38:43:23<124:04:12, 34.41s/it] 25%|██▍ | 4307/17285 [38:43:52<117:56:29, 32.72s/it] 25%|██▍ | 4308/17285 [38:44:25<118:14:58, 32.80s/it] 25%|██▍ | 4309/17285 [38:44:54<114:25:47, 31.75s/it] 25%|██▍ | 4310/17285 [38:45:34<123:23:16, 34.23s/it] {'loss': 1.6459, 'learning_rate': 0.0001796352513489549, 'epoch': 0.75} + 25%|██▍ | 4310/17285 [38:45:34<123:23:16, 34.23s/it] 25%|██▍ | 4311/17285 [38:46:08<122:17:56, 33.94s/it] 25%|██▍ | 4312/17285 [38:46:34<114:08:50, 31.68s/it] 25%|██▍ | 4313/17285 [38:47:15<123:41:30, 34.33s/it] 25%|██▍ | 4314/17285 [38:47:41<114:48:08, 31.86s/it] 25%|██▍ | 4315/17285 [38:48:18<120:33:08, 33.46s/it] 25%|██▍ | 4316/17285 [38:48:54<123:05:09, 34.17s/it] 25%|██▍ | 4317/17285 [38:49:23<117:37:59, 32.66s/it] 25%|██▍ | 4318/17285 [38:50:08<130:58:15, 36.36s/it] 25%|██▍ | 4319/17285 [38:50:42<128:17:28, 35.62s/it] 25%|██▍ | 4320/17285 [38:51:10<119:51:02, 33.28s/it] {'loss': 1.6373, 'learning_rate': 0.00017951938470222247, 'epoch': 0.75} + 25%|██▍ | 4320/17285 [38:51:10<119:51:02, 33.28s/it] 25%|██▍ | 4321/17285 [38:51:46<122:57:57, 34.15s/it] 25%|██▌ | 4322/17285 [38:52:12<114:08:28, 31.70s/it] 25%|██▌ | 4323/17285 [38:52:45<115:53:18, 32.19s/it] 25%|██▌ | 4324/17285 [38:53:11<108:53:32, 30.25s/it] 25%|██▌ | 4325/17285 [38:53:41<109:09:39, 30.32s/it] 25%|██▌ | 4326/17285 [38:54:06<102:39:28, 28.52s/it] 25%|██▌ | 4327/17285 [38:54:33<101:06:51, 28.09s/it] 25%|██▌ | 4328/17285 [38:55:11<112:15:21, 31.19s/it] 25%|██▌ | 4329/17285 [38:55:40<109:24:12, 30.40s/it] 25%|██▌ | 4330/17285 [38:56:09<108:50:41, 30.25s/it] {'loss': 1.6331, 'learning_rate': 0.0001794032269661387, 'epoch': 0.75} + 25%|██▌ | 4330/17285 [38:56:09<108:50:41, 30.25s/it] 25%|██▌ | 4331/17285 [38:56:44<113:06:06, 31.43s/it] 25%|██▌ | 4332/17285 [38:57:17<114:51:13, 31.92s/it] 25%|██▌ | 4333/17285 [38:57:51<117:06:37, 32.55s/it] 25%|██▌ | 4334/17285 [38:58:16<109:33:03, 30.45s/it] 25%|██▌ | 4335/17285 [38:58:58<122:00:20, 33.92s/it] 25%|██▌ | 4336/17285 [38:59:33<122:59:02, 34.19s/it] 25%|██▌ | 4337/17285 [39:00:12<128:07:11, 35.62s/it] 25%|██▌ | 4338/17285 [39:00:42<122:16:27, 34.00s/it] 25%|██▌ | 4339/17285 [39:01:19<125:17:12, 34.84s/it] 25%|██▌ | 4340/17285 [39:01:56<126:59:17, 35.32s/it] {'loss': 1.6007, 'learning_rate': 0.00017928677856591163, 'epoch': 0.75} + 25%|██▌ | 4340/17285 [39:01:56<126:59:17, 35.32s/it] 25%|██▌ | 4341/17285 [39:02:22<117:38:46, 32.72s/it] 25%|██▌ | 4342/17285 [39:02:49<111:10:20, 30.92s/it] 25%|██▌ | 4343/17285 [39:03:26<117:56:57, 32.81s/it] 25%|██▌ | 4344/17285 [39:04:02<121:01:05, 33.67s/it] 25%|██▌ | 4345/17285 [39:04:35<120:56:59, 33.65s/it] 25%|██▌ | 4346/17285 [39:05:11<123:14:23, 34.29s/it] 25%|██▌ | 4347/17285 [39:05:37<113:37:06, 31.61s/it] 25%|██▌ | 4348/17285 [39:06:07<112:24:47, 31.28s/it] 25%|██▌ | 4349/17285 [39:06:36<110:03:27, 30.63s/it] 25%|██▌ | 4350/17285 [39:07:07<110:39:48, 30.80s/it] {'loss': 1.6359, 'learning_rate': 0.0001791700399278133, 'epoch': 0.75} + 25%|██▌ | 4350/17285 [39:07:07<110:39:48, 30.80s/it] 25%|██▌ | 4351/17285 [39:07:38<110:11:10, 30.67s/it] 25%|██▌ | 4352/17285 [39:08:03<103:55:36, 28.93s/it] 25%|██▌ | 4353/17285 [39:08:42<115:07:23, 32.05s/it] 25%|██▌ | 4354/17285 [39:09:16<116:52:57, 32.54s/it] 25%|██▌ | 4355/17285 [39:09:44<112:09:12, 31.23s/it] 25%|██▌ | 4356/17285 [39:10:12<108:29:28, 30.21s/it] 25%|██▌ | 4357/17285 [39:10:45<112:02:54, 31.20s/it] 25%|██▌ | 4358/17285 [39:11:18<114:13:48, 31.81s/it] 25%|██▌ | 4359/17285 [39:11:52<115:43:15, 32.23s/it] 25%|██▌ | 4360/17285 [39:12:26<117:56:07, 32.85s/it] {'loss': 1.5939, 'learning_rate': 0.00017905301147917816, 'epoch': 0.76} + 25%|██▌ | 4360/17285 [39:12:26<117:56:07, 32.85s/it] 25%|██▌ | 4361/17285 [39:12:59<118:06:51, 32.90s/it] 25%|██▌ | 4362/17285 [39:13:30<116:13:24, 32.38s/it] 25%|██▌ | 4363/17285 [39:13:57<110:11:12, 30.70s/it] 25%|██▌ | 4364/17285 [39:14:36<119:47:45, 33.38s/it] 25%|██▌ | 4365/17285 [39:15:03<112:41:52, 31.40s/it] 25%|██▌ | 4366/17285 [39:15:32<110:15:00, 30.72s/it] 25%|██▌ | 4367/17285 [39:16:10<117:54:29, 32.86s/it] 25%|██▌ | 4368/17285 [39:16:47<121:48:27, 33.95s/it] 25%|██▌ | 4369/17285 [39:17:16<117:15:22, 32.68s/it] 25%|██▌ | 4370/17285 [39:17:47<114:49:47, 32.01s/it] {'loss': 1.5889, 'learning_rate': 0.00017893569364840154, 'epoch': 0.76} + 25%|██▌ | 4370/17285 [39:17:47<114:49:47, 32.01s/it] 25%|██▌ | 4371/17285 [39:18:17<112:35:06, 31.39s/it] 25%|██▌ | 4372/17285 [39:18:42<105:56:48, 29.54s/it] 25%|██▌ | 4373/17285 [39:19:17<111:49:43, 31.18s/it] 25%|██▌ | 4374/17285 [39:19:44<107:27:07, 29.96s/it] 25%|██▌ | 4375/17285 [39:20:18<111:32:35, 31.10s/it] 25%|██▌ | 4376/17285 [39:20:48<110:43:16, 30.88s/it] 25%|██▌ | 4377/17285 [39:21:22<113:53:49, 31.77s/it] 25%|██▌ | 4378/17285 [39:21:59<119:01:08, 33.20s/it] 25%|██▌ | 4379/17285 [39:22:24<110:33:00, 30.84s/it] 25%|██▌ | 4380/17285 [39:23:03<119:43:08, 33.40s/it] {'loss': 1.6206, 'learning_rate': 0.0001788180868649382, 'epoch': 0.76} + 25%|██▌ | 4380/17285 [39:23:03<119:43:08, 33.40s/it] 25%|██▌ | 4381/17285 [39:23:40<122:55:32, 34.29s/it] 25%|██▌ | 4382/17285 [39:24:16<125:18:45, 34.96s/it] 25%|██▌ | 4383/17285 [39:24:44<117:38:13, 32.82s/it] 25%|██▌ | 4384/17285 [39:25:17<117:29:08, 32.78s/it] 25%|██▌ | 4385/17285 [39:25:47<114:30:29, 31.96s/it] 25%|██▌ | 4386/17285 [39:26:24<120:00:52, 33.50s/it] 25%|██▌ | 4387/17285 [39:26:55<117:52:16, 32.90s/it] 25%|██▌ | 4388/17285 [39:27:24<113:30:14, 31.68s/it] 25%|██▌ | 4389/17285 [39:27:56<113:16:09, 31.62s/it] 25%|██▌ | 4390/17285 [39:28:33<119:27:36, 33.35s/it] {'loss': 1.5902, 'learning_rate': 0.00017870019155930047, 'epoch': 0.76} + 25%|██▌ | 4390/17285 [39:28:33<119:27:36, 33.35s/it] 25%|██▌ | 4391/17285 [39:29:06<119:05:05, 33.25s/it] 25%|██▌ | 4392/17285 [39:29:42<122:13:13, 34.13s/it] 25%|██▌ | 4393/17285 [39:30:18<123:28:13, 34.48s/it] 25%|██▌ | 4394/17285 [39:30:50<121:36:20, 33.96s/it] 25%|██▌ | 4395/17285 [39:31:25<122:26:39, 34.20s/it] 25%|██▌ | 4396/17285 [39:31:57<119:41:44, 33.43s/it] 25%|██▌ | 4397/17285 [39:32:27<116:42:24, 32.60s/it] 25%|██▌ | 4398/17285 [39:33:02<118:25:27, 33.08s/it] 25%|██▌ | 4399/17285 [39:33:36<119:15:22, 33.32s/it] 25%|██▌ | 4400/17285 [39:34:01<110:51:45, 30.97s/it] {'loss': 1.6394, 'learning_rate': 0.00017858200816305697, 'epoch': 0.76} + 25%|██▌ | 4400/17285 [39:34:01<110:51:45, 30.97s/it] 25%|██▌ | 4401/17285 [39:34:29<107:53:38, 30.15s/it] 25%|██▌ | 4402/17285 [39:34:54<101:40:04, 28.41s/it] 25%|██▌ | 4403/17285 [39:35:27<107:16:29, 29.98s/it] 25%|██▌ | 4404/17285 [39:35:59<108:43:40, 30.39s/it] 25%|██▌ | 4405/17285 [39:36:25<103:57:22, 29.06s/it] 25%|██▌ | 4406/17285 [39:37:05<116:38:59, 32.61s/it] 25%|██▌ | 4407/17285 [39:37:38<116:59:13, 32.70s/it] 26%|██▌ | 4408/17285 [39:38:10<115:54:50, 32.41s/it] 26%|██▌ | 4409/17285 [39:38:45<119:00:51, 33.28s/it] 26%|██▌ | 4410/17285 [39:39:13<112:37:48, 31.49s/it] {'loss': 1.6193, 'learning_rate': 0.00017846353710883087, 'epoch': 0.77} + 26%|██▌ | 4410/17285 [39:39:13<112:37:48, 31.49s/it] 26%|██▌ | 4411/17285 [39:39:47<115:59:44, 32.44s/it] 26%|██▌ | 4412/17285 [39:40:20<115:57:08, 32.43s/it] 26%|██▌ | 4413/17285 [39:40:51<114:22:18, 31.99s/it] 26%|██▌ | 4414/17285 [39:41:25<117:14:12, 32.79s/it] 26%|██▌ | 4415/17285 [39:41:59<118:05:35, 33.03s/it] 26%|██▌ | 4416/17285 [39:42:30<116:23:58, 32.56s/it] 26%|██▌ | 4417/17285 [39:43:01<114:37:57, 32.07s/it][2023-08-24 15:38:03,736] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 26%|██▌ | 4418/17285 [39:43:26<106:44:04, 29.86s/it] 26%|██▌ | 4419/17285 [39:44:00<110:57:29, 31.05s/it] 26%|██▌ | 4420/17285 [39:44:30<109:59:33, 30.78s/it] {'loss': 1.6162, 'learning_rate': 0.00017835666757086383, 'epoch': 0.77} + 26%|██▌ | 4420/17285 [39:44:30<109:59:33, 30.78s/it] 26%|██▌ | 4421/17285 [39:44:59<107:34:16, 30.10s/it] 26%|██▌ | 4422/17285 [39:45:27<105:56:46, 29.65s/it] 26%|██▌ | 4423/17285 [39:46:01<110:30:23, 30.93s/it] 26%|██▌ | 4424/17285 [39:46:33<112:05:18, 31.38s/it] 26%|██▌ | 4425/17285 [39:47:04<111:06:31, 31.10s/it] 26%|██▌ | 4426/17285 [39:47:38<113:45:05, 31.85s/it] 26%|██▌ | 4427/17285 [39:48:08<112:20:08, 31.45s/it] 26%|██▌ | 4428/17285 [39:48:42<115:23:31, 32.31s/it] 26%|██▌ | 4429/17285 [39:49:14<114:26:35, 32.05s/it] 26%|██▌ | 4430/17285 [39:49:50<118:44:35, 33.25s/it] {'loss': 1.6329, 'learning_rate': 0.00017823765116211767, 'epoch': 0.77} + 26%|██▌ | 4430/17285 [39:49:50<118:44:35, 33.25s/it] 26%|██▌ | 4431/17285 [39:50:27<122:41:19, 34.36s/it] 26%|██▌ | 4432/17285 [39:51:00<121:13:07, 33.95s/it] 26%|██▌ | 4433/17285 [39:51:27<113:51:59, 31.90s/it] 26%|██▌ | 4434/17285 [39:52:06<121:28:33, 34.03s/it] 26%|██�� | 4435/17285 [39:52:37<118:02:36, 33.07s/it] 26%|██▌ | 4436/17285 [39:53:13<121:56:23, 34.16s/it] 26%|██▌ | 4437/17285 [39:53:41<114:26:03, 32.06s/it] 26%|██▌ | 4438/17285 [39:54:10<111:15:20, 31.18s/it] 26%|██▌ | 4439/17285 [39:54:37<107:28:26, 30.12s/it] 26%|██▌ | 4440/17285 [39:55:05<105:16:35, 29.51s/it] {'loss': 1.6248, 'learning_rate': 0.0001781183483559451, 'epoch': 0.77} + 26%|██▌ | 4440/17285 [39:55:05<105:16:35, 29.51s/it] 26%|██▌ | 4441/17285 [39:55:40<110:59:36, 31.11s/it] 26%|██▌ | 4442/17285 [39:56:13<113:00:42, 31.68s/it] 26%|██▌ | 4443/17285 [39:56:51<119:39:02, 33.54s/it] 26%|██▌ | 4444/17285 [39:57:24<119:18:27, 33.45s/it] 26%|██▌ | 4445/17285 [39:57:54<115:17:31, 32.32s/it] 26%|██▌ | 4446/17285 [39:58:26<114:59:07, 32.24s/it] 26%|██▌ | 4447/17285 [39:59:04<120:53:52, 33.90s/it] 26%|██▌ | 4448/17285 [39:59:38<121:13:07, 33.99s/it] 26%|██▌ | 4449/17285 [40:00:08<117:08:51, 32.86s/it] 26%|██▌ | 4450/17285 [40:00:41<116:58:39, 32.81s/it] {'loss': 1.6109, 'learning_rate': 0.00017799875958906703, 'epoch': 0.77} + 26%|██▌ | 4450/17285 [40:00:41<116:58:39, 32.81s/it] 26%|██▌ | 4451/17285 [40:01:21<125:03:58, 35.08s/it] 26%|██▌ | 4452/17285 [40:02:00<128:43:50, 36.11s/it] 26%|██▌ | 4453/17285 [40:02:40<133:14:01, 37.38s/it] 26%|██▌ | 4454/17285 [40:03:10<124:30:03, 34.93s/it] 26%|██▌ | 4455/17285 [40:03:48<127:49:50, 35.87s/it] 26%|██▌ | 4456/17285 [40:04:20<124:23:26, 34.91s/it] 26%|██▌ | 4457/17285 [40:05:04<134:02:59, 37.62s/it] 26%|██▌ | 4458/17285 [40:05:32<123:28:46, 34.66s/it] 26%|██▌ | 4459/17285 [40:06:05<121:41:30, 34.16s/it] 26%|██▌ | 4460/17285 [40:06:41<123:36:17, 34.70s/it] {'loss': 1.5499, 'learning_rate': 0.0001778788852992512, 'epoch': 0.77} + 26%|██▌ | 4460/17285 [40:06:41<123:36:17, 34.70s/it] 26%|██▌ | 4461/17285 [40:07:08<115:25:02, 32.40s/it] 26%|██▌ | 4462/17285 [40:07:34<108:52:51, 30.57s/it] 26%|██▌ | 4463/17285 [40:08:03<106:53:00, 30.01s/it] 26%|██▌ | 4464/17285 [40:08:47<121:26:18, 34.10s/it] 26%|██▌ | 4465/17285 [40:09:22<123:01:46, 34.55s/it] 26%|██▌ | 4466/17285 [40:09:56<122:25:30, 34.38s/it] 26%|██▌ | 4467/17285 [40:10:26<118:03:14, 33.16s/it] 26%|██▌ | 4468/17285 [40:10:57<115:20:30, 32.40s/it] 26%|██▌ | 4469/17285 [40:11:24<109:40:45, 30.81s/it] 26%|██▌ | 4470/17285 [40:12:00<114:58:46, 32.30s/it] {'loss': 1.6107, 'learning_rate': 0.0001777587259253104, 'epoch': 0.78} + 26%|██▌ | 4470/17285 [40:12:00<114:58:46, 32.30s/it] 26%|██▌ | 4471/17285 [40:12:26<108:40:18, 30.53s/it] 26%|██▌ | 4472/17285 [40:13:03<115:02:36, 32.32s/it] 26%|██▌ | 4473/17285 [40:13:31<110:02:19, 30.92s/it] 26%|██▌ | 4474/17285 [40:13:56<104:00:36, 29.23s/it] 26%|██▌ | 4475/17285 [40:14:31<110:47:39, 31.14s/it] 26%|██▌ | 4476/17285 [40:15:08<116:40:43, 32.79s/it] 26%|██▌ | 4477/17285 [40:15:33<107:50:12, 30.31s/it] 26%|██▌ | 4478/17285 [40:16:04<108:58:12, 30.63s/it] 26%|██▌ | 4479/17285 [40:16:38<113:08:51, 31.81s/it] 26%|██▌ | 4480/17285 [40:17:09<111:42:52, 31.41s/it] {'loss': 1.5865, 'learning_rate': 0.00017763828190710113, 'epoch': 0.78} + 26%|██▌ | 4480/17285 [40:17:09<111:42:52, 31.41s/it] 26%|██▌ | 4481/17285 [40:17:53<125:28:28, 35.28s/it] 26%|██▌ | 4482/17285 [40:18:28<125:02:28, 35.16s/it] 26%|██▌ | 4483/17285 [40:19:02<123:09:10, 34.63s/it] 26%|██▌ | 4484/17285 [40:19:44<131:33:30, 37.00s/it] 26%|██▌ | 4485/17285 [40:20:17<127:01:45, 35.73s/it] 26%|██▌ | 4486/17285 [40:20:53<127:29:20, 35.86s/it] 26%|██▌ | 4487/17285 [40:21:18<115:50:11, 32.58s/it] 26%|██▌ | 4488/17285 [40:21:49<114:24:33, 32.19s/it] 26%|██▌ | 4489/17285 [40:22:34<128:16:28, 36.09s/it] 26%|██▌ | 4490/17285 [40:23:01<118:38:58, 33.38s/it] {'loss': 1.6013, 'learning_rate': 0.00017751755368552178, 'epoch': 0.78} + 26%|██▌ | 4490/17285 [40:23:01<118:38:58, 33.38s/it] 26%|██▌ | 4491/17285 [40:23:33<116:24:23, 32.75s/it] 26%|██▌ | 4492/17285 [40:24:05<116:16:50, 32.72s/it] 26%|██▌ | 4493/17285 [40:24:40<118:24:44, 33.32s/it] 26%|██▌ | 4494/17285 [40:25:10<115:12:32, 32.43s/it] 26%|██▌ | 4495/17285 [40:25:50<122:40:39, 34.53s/it] 26%|██▌ | 4496/17285 [40:26:27<124:57:51, 35.18s/it] 26%|██▌ | 4497/17285 [40:26:58<120:25:45, 33.90s/it] 26%|██▌ | 4498/17285 [40:27:41<130:53:18, 36.85s/it] 26%|██▌ | 4499/17285 [40:28:08<120:02:48, 33.80s/it] 26%|██▌ | 4500/17285 [40:28:40<118:43:00, 33.43s/it] {'loss': 1.5829, 'learning_rate': 0.00017739654170251116, 'epoch': 0.78} + 26%|██▌ | 4500/17285 [40:28:40<118:43:00, 33.43s/it] 26%|██▌ | 4501/17285 [40:29:11<115:11:03, 32.44s/it] 26%|██▌ | 4502/17285 [40:29:42<114:15:10, 32.18s/it] 26%|██▌ | 4503/17285 [40:30:12<112:10:13, 31.59s/it] 26%|██▌ | 4504/17285 [40:30:41<109:21:44, 30.80s/it] 26%|██▌ | 4505/17285 [40:31:21<118:22:56, 33.35s/it] 26%|██▌ | 4506/17285 [40:31:48<111:56:09, 31.53s/it] 26%|██▌ | 4507/17285 [40:32:27<120:02:52, 33.82s/it] 26%|██▌ | 4508/17285 [40:33:00<119:26:41, 33.65s/it] 26%|██▌ | 4509/17285 [40:33:27<112:07:56, 31.60s/it] 26%|██▌ | 4510/17285 [40:34:05<119:15:48, 33.61s/it] {'loss': 1.6356, 'learning_rate': 0.00017727524640104674, 'epoch': 0.78} + 26%|██▌ | 4510/17285 [40:34:05<119:15:48, 33.61s/it] 26%|██▌ | 4511/17285 [40:34:31<111:06:16, 31.31s/it] 26%|██▌ | 4512/17285 [40:35:04<112:13:07, 31.63s/it] 26%|██▌ | 4513/17285 [40:35:37<114:05:14, 32.16s/it] 26%|██▌ | 4514/17285 [40:36:03<107:24:03, 30.28s/it] 26%|██▌ | 4515/17285 [40:36:38<111:48:48, 31.52s/it] 26%|██▌ | 4516/17285 [40:37:04<106:29:08, 30.02s/it] 26%|██▌ | 4517/17285 [40:37:33<104:51:30, 29.57s/it] 26%|██▌ | 4518/17285 [40:38:11<114:34:00, 32.31s/it] 26%|██▌ | 4519/17285 [40:38:49<120:24:33, 33.96s/it] 26%|██▌ | 4520/17285 [40:39:16<113:16:57, 31.95s/it] {'loss': 1.6237, 'learning_rate': 0.00017715366822514318, 'epoch': 0.78} + 26%|██▌ | 4520/17285 [40:39:16<113:16:57, 31.95s/it] 26%|██▌ | 4521/17285 [40:39:47<112:17:00, 31.67s/it] 26%|██▌ | 4522/17285 [40:40:21<114:30:45, 32.30s/it] 26%|██▌ | 4523/17285 [40:40:52<113:11:04, 31.93s/it] 26%|██▌ | 4524/17285 [40:41:27<116:07:14, 32.76s/it] 26%|██▌ | 4525/17285 [40:41:58<114:11:18, 32.22s/it] 26%|██▌ | 4526/17285 [40:42:32<116:19:26, 32.82s/it] 26%|██▌ | 4527/17285 [40:43:14<125:57:58, 35.54s/it] 26%|██▌ | 4528/17285 [40:43:53<130:08:59, 36.73s/it] 26%|██▌ | 4529/17285 [40:44:22<121:19:52, 34.24s/it] 26%|██▌ | 4530/17285 [40:45:01<126:26:56, 35.69s/it] {'loss': 1.5802, 'learning_rate': 0.00017703180761985063, 'epoch': 0.79} + 26%|██▌ | 4530/17285 [40:45:01<126:26:56, 35.69s/it] 26%|██▌ | 4531/17285 [40:45:37<126:44:00, 35.77s/it] 26%|██▌ | 4532/17285 [40:46:08<121:42:57, 34.36s/it] 26%|██▌ | 4533/17285 [40:46:43<122:34:14, 34.60s/it] 26%|██▌ | 4534/17285 [40:47:10<114:05:48, 32.21s/it] 26%|██▌ | 4535/17285 [40:47:41<113:22:36, 32.01s/it] 26%|██▌ | 4536/17285 [40:48:17<117:13:21, 33.10s/it] 26%|██▌ | 4537/17285 [40:48:49<115:47:29, 32.70s/it] 26%|██▋ | 4538/17285 [40:49:20<113:56:53, 32.18s/it] 26%|██▋ | 4539/17285 [40:49:46<107:13:00, 30.28s/it] 26%|██▋ | 4540/17285 [40:50:18<109:07:52, 30.83s/it] {'loss': 1.5659, 'learning_rate': 0.00017690966503125307, 'epoch': 0.79} + 26%|██▋ | 4540/17285 [40:50:18<109:07:52, 30.83s/it] 26%|██▋ | 4541/17285 [40:50:54<115:20:33, 32.58s/it] 26%|██▋ | 4542/17285 [40:51:24<112:25:32, 31.76s/it] 26%|██▋ | 4543/17285 [40:52:02<119:22:21, 33.73s/it] 26%|██▋ | 4544/17285 [40:52:37<119:51:00, 33.86s/it] 26%|██▋ | 4545/17285 [40:53:07<115:54:57, 32.75s/it] 26%|██▋ | 4546/17285 [40:53:34<109:31:26, 30.95s/it] 26%|██▋ | 4547/17285 [40:54:04<109:11:15, 30.86s/it] 26%|██▋ | 4548/17285 [40:54:31<105:14:20, 29.74s/it] 26%|██▋ | 4549/17285 [40:55:06<110:31:07, 31.24s/it] 26%|██▋ | 4550/17285 [40:55:45<118:59:50, 33.64s/it] {'loss': 1.61, 'learning_rate': 0.0001767872409064667, 'epoch': 0.79} + 26%|██▋ | 4550/17285 [40:55:45<118:59:50, 33.64s/it] 26%|██▋ | 4551/17285 [40:56:14<113:56:15, 32.21s/it] 26%|██▋ | 4552/17285 [40:56:41<108:21:25, 30.64s/it] 26%|██▋ | 4553/17285 [40:57:14<110:54:33, 31.36s/it] 26%|██▋ | 4554/17285 [40:57:44<109:05:41, 30.85s/it] 26%|██▋ | 4555/17285 [40:58:12<106:23:21, 30.09s/it] 26%|██▋ | 4556/17285 [40:58:43<106:43:02, 30.18s/it] 26%|██▋ | 4557/17285 [40:59:12<105:29:25, 29.84s/it] 26%|██▋ | 4558/17285 [40:59:42<105:57:27, 29.97s/it] 26%|██▋ | 4559/17285 [41:00:07<100:41:26, 28.48s/it] 26%|██▋ | 4560/17285 [41:00:42<107:11:19, 30.32s/it] {'loss': 1.6184, 'learning_rate': 0.00017666453569363836, 'epoch': 0.79} + 26%|██▋ | 4560/17285 [41:00:42<107:11:19, 30.32s/it] 26%|██▋ | 4561/17285 [41:01:17<112:41:02, 31.88s/it] 26%|██▋ | 4562/17285 [41:01:44<107:55:31, 30.54s/it] 26%|██▋ | 4563/17285 [41:02:26<119:21:32, 33.78s/it] 26%|██▋ | 4564/17285 [41:03:06<125:47:40, 35.60s/it] 26%|██▋ | 4565/17285 [41:03:32<116:20:08, 32.93s/it] 26%|██▋ | 4566/17285 [41:04:01<111:52:30, 31.67s/it] 26%|██▋ | 4567/17285 [41:04:30<108:34:28, 30.73s/it] 26%|██▋ | 4568/17285 [41:05:13<122:05:00, 34.56s/it] 26%|██▋ | 4569/17285 [41:05:41<115:19:03, 32.65s/it] 26%|██▋ | 4570/17285 [41:06:11<112:04:49, 31.73s/it] {'loss': 1.5797, 'learning_rate': 0.00017654154984194382, 'epoch': 0.79} + 26%|██▋ | 4570/17285 [41:06:11<112:04:49, 31.73s/it] 26%|██▋ | 4571/17285 [41:06:42<111:42:41, 31.63s/it] 26%|██▋ | 4572/17285 [41:07:20<118:35:42, 33.58s/it] 26%|██▋ | 4573/17285 [41:07:55<119:41:36, 33.90s/it] 26%|██▋ | 4574/17285 [41:08:34<125:10:15, 35.45s/it] 26%|██▋ | 4575/17285 [41:09:19<134:40:24, 38.15s/it] 26%|██▋ | 4576/17285 [41:09:49<126:49:52, 35.93s/it] 26%|██▋ | 4577/17285 [41:10:15<115:36:59, 32.75s/it] 26%|██▋ | 4578/17285 [41:10:43<110:34:50, 31.33s/it] 26%|██▋ | 4579/17285 [41:11:12<108:20:47, 30.70s/it] 26%|██▋ | 4580/17285 [41:11:41<106:18:04, 30.12s/it] {'loss': 1.6256, 'learning_rate': 0.00017641828380158612, 'epoch': 0.79} + 26%|██▋ | 4580/17285 [41:11:41<106:18:04, 30.12s/it] 27%|██▋ | 4581/17285 [41:12:16<111:52:51, 31.70s/it] 27%|██▋ | 4582/17285 [41:12:43<106:59:31, 30.32s/it] 27%|██▋ | 4583/17285 [41:13:12<105:37:47, 29.94s/it] 27%|██▋ | 4584/17285 [41:13:49<112:43:10, 31.95s/it] 27%|██▋ | 4585/17285 [41:14:23<115:14:35, 32.67s/it] 27%|██▋ | 4586/17285 [41:14:54<112:56:15, 32.02s/it] 27%|██▋ | 4587/17285 [41:15:21<108:01:16, 30.63s/it] 27%|██▋ | 4588/17285 [41:15:48<103:47:39, 29.43s/it] 27%|██▋ | 4589/17285 [41:16:14<100:03:11, 28.37s/it] 27%|██▋ | 4590/17285 [41:16:48<106:40:37, 30.25s/it] {'loss': 1.5783, 'learning_rate': 0.00017629473802379403, 'epoch': 0.8} + 27%|██▋ | 4590/17285 [41:16:48<106:40:37, 30.25s/it] 27%|██▋ | 4591/17285 [41:17:20<107:58:17, 30.62s/it] 27%|██▋ | 4592/17285 [41:17:58<116:20:41, 33.00s/it] 27%|██▋ | 4593/17285 [41:18:31<116:13:12, 32.97s/it] 27%|██▋ | 4594/17285 [41:18:58<109:56:21, 31.19s/it] 27%|██▋ | 4595/17285 [41:19:26<106:20:44, 30.17s/it] 27%|██▋ | 4596/17285 [41:19:57<107:14:39, 30.43s/it] 27%|██▋ | 4597/17285 [41:20:36<116:18:24, 33.00s/it] 27%|██▋ | 4598/17285 [41:21:01<108:10:58, 30.70s/it] 27%|██▋ | 4599/17285 [41:21:27<102:21:04, 29.05s/it] 27%|██▋ | 4600/17285 [41:22:04<111:14:28, 31.57s/it] {'loss': 1.5988, 'learning_rate': 0.00017617091296082032, 'epoch': 0.8} + 27%|██▋ | 4600/17285 [41:22:09<111:14:28, 31.57s/it] 27%|██▋ | 4601/17285 [41:22:39<115:07:12, 32.67s/it] 27%|██▋ | 4602/17285 [41:23:05<107:55:54, 30.64s/it] 27%|██▋ | 4603/17285 [41:23:38<110:02:33, 31.24s/it] 27%|██▋ | 4604/17285 [41:24:11<111:39:01, 31.70s/it] 27%|██▋ | 4605/17285 [41:24:40<109:38:14, 31.13s/it] 27%|██▋ | 4606/17285 [41:25:12<110:29:24, 31.37s/it] 27%|██▋ | 4607/17285 [41:25:42<108:27:43, 30.80s/it] 27%|██▋ | 4608/17285 [41:26:14<110:31:55, 31.39s/it] 27%|██▋ | 4609/17285 [41:26:44<108:31:57, 30.82s/it] 27%|██▋ | 4610/17285 [41:27:09<102:10:37, 29.02s/it] {'loss': 1.5904, 'learning_rate': 0.0001760468090659401, 'epoch': 0.8} + 27%|██▋ | 4610/17285 [41:27:09<102:10:37, 29.02s/it] 27%|██▋ | 4611/17285 [41:27:44<109:04:11, 30.98s/it] 27%|██▋ | 4612/17285 [41:28:22<116:22:32, 33.06s/it] 27%|██▋ | 4613/17285 [41:28:56<117:09:33, 33.28s/it] 27%|██▋ | 4614/17285 [41:29:38<126:11:35, 35.85s/it] 27%|██▋ | 4615/17285 [41:30:05<116:37:12, 33.14s/it] 27%|██▋ | 4616/17285 [41:30:38<116:34:30, 33.13s/it] 27%|██▋ | 4617/17285 [41:31:20<125:45:14, 35.74s/it] 27%|██▋ | 4618/17285 [41:31:55<124:52:39, 35.49s/it] 27%|██▋ | 4619/17285 [41:32:20<114:07:24, 32.44s/it] 27%|██▋ | 4620/17285 [41:32:50<111:54:05, 31.81s/it] {'loss': 1.611, 'learning_rate': 0.0001759224267934491, 'epoch': 0.8} + 27%|██▋ | 4620/17285 [41:32:50<111:54:05, 31.81s/it] 27%|██▋ | 4621/17285 [41:33:20<109:35:48, 31.16s/it] 27%|██▋ | 4622/17285 [41:33:48<106:36:13, 30.31s/it] 27%|██▋ | 4623/17285 [41:34:23<111:53:30, 31.81s/it] 27%|██▋ | 4624/17285 [41:34:56<112:58:43, 32.12s/it] 27%|██▋ | 4625/17285 [41:35:27<110:58:07, 31.56s/it] 27%|██▋ | 4626/17285 [41:36:00<112:58:27, 32.13s/it] 27%|██▋ | 4627/17285 [41:36:40<120:57:45, 34.40s/it] 27%|██▋ | 4628/17285 [41:37:25<132:49:45, 37.78s/it] 27%|██▋ | 4629/17285 [41:37:55<123:52:33, 35.24s/it] 27%|██▋ | 4630/17285 [41:38:26<120:06:44, 34.17s/it] {'loss': 1.6066, 'learning_rate': 0.00017579776659866218, 'epoch': 0.8} + 27%|██▋ | 4630/17285 [41:38:26<120:06:44, 34.17s/it] 27%|██▋ | 4631/17285 [41:38:59<118:56:51, 33.84s/it][2023-08-24 17:34:14,373] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 27%|██▋ | 4632/17285 [41:39:37<122:31:18, 34.86s/it] 27%|██▋ | 4633/17285 [41:40:10<121:11:04, 34.48s/it] 27%|██▋ | 4634/17285 [41:40:43<119:20:34, 33.96s/it][2023-08-24 17:35:56,862] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 27%|██▋ | 4635/17285 [41:41:19<121:38:11, 34.62s/it] 27%|██▋ | 4636/17285 [41:41:51<118:45:54, 33.80s/it] 27%|██▋ | 4637/17285 [41:42:22<115:24:44, 32.85s/it] 27%|██▋ | 4638/17285 [41:42:52<113:07:45, 32.20s/it] 27%|██▋ | 4639/17285 [41:43:32<120:50:23, 34.40s/it] 27%|██▋ | 4640/17285 [41:44:07<121:40:23, 34.64s/it] {'loss': 1.5947, 'learning_rate': 0.00017569783864540068, 'epoch': 0.81} + 27%|██▋ | 4640/17285 [41:44:07<121:40:23, 34.64s/it] 27%|██▋ | 4641/17285 [41:44:39<119:13:41, 33.95s/it] 27%|██▋ | 4642/17285 [41:45:16<121:51:49, 34.70s/it] 27%|██▋ | 4643/17285 [41:45:57<128:19:18, 36.54s/it] 27%|██▋ | 4644/17285 [41:46:36<131:06:00, 37.34s/it] 27%|██▋ | 4645/17285 [41:47:02<119:30:38, 34.04s/it] 27%|██▋ | 4646/17285 [41:47:31<113:56:31, 32.45s/it] 27%|██▋ | 4647/17285 [41:48:07<117:16:19, 33.41s/it] 27%|██▋ | 4648/17285 [41:48:40<116:51:14, 33.29s/it] 27%|██▋ | 4649/17285 [41:49:16<119:40:49, 34.10s/it] 27%|██▋ | 4650/17285 [41:49:45<114:37:23, 32.66s/it] {'loss': 1.576, 'learning_rate': 0.00017557267934112085, 'epoch': 0.81} + 27%|██▋ | 4650/17285 [41:49:45<114:37:23, 32.66s/it] 27%|██▋ | 4651/17285 [41:50:20<116:55:57, 33.32s/it] 27%|██▋ | 4652/17285 [41:50:55<119:01:22, 33.92s/it] 27%|██▋ | 4653/17285 [41:51:29<119:18:02, 34.00s/it] 27%|██▋ | 4654/17285 [41:52:06<121:37:14, 34.66s/it] 27%|██▋ | 4655/17285 [41:52:38<119:15:33, 33.99s/it] 27%|██▋ | 4656/17285 [41:53:13<119:52:19, 34.17s/it] 27%|██▋ | 4657/17285 [41:53:44<116:34:27, 33.23s/it] 27%|██▋ | 4658/17285 [41:54:13<112:51:51, 32.18s/it] 27%|██▋ | 4659/17285 [41:54:40<107:12:25, 30.57s/it] 27%|██▋ | 4660/17285 [41:55:10<106:02:53, 30.24s/it] {'loss': 1.6143, 'learning_rate': 0.00017544724339483368, 'epoch': 0.81} + 27%|██▋ | 4660/17285 [41:55:10<106:02:53, 30.24s/it] 27%|██▋ | 4661/17285 [41:55:40<106:35:38, 30.40s/it] 27%|██▋ | 4662/17285 [41:56:12<107:56:21, 30.78s/it] 27%|██▋ | 4663/17285 [41:56:41<105:55:49, 30.21s/it] 27%|██▋ | 4664/17285 [41:57:21<116:19:19, 33.18s/it] 27%|██▋ | 4665/17285 [41:57:53<115:32:47, 32.96s/it] 27%|██▋ | 4666/17285 [41:58:20<109:10:57, 31.15s/it] 27%|██▋ | 4667/17285 [41:58:52<109:23:55, 31.21s/it] 27%|██▋ | 4668/17285 [41:59:22<108:06:50, 30.85s/it] 27%|██▋ | 4669/17285 [41:59:52<107:22:24, 30.64s/it] 27%|██▋ | 4670/17285 [42:00:31<116:10:52, 33.16s/it] {'loss': 1.5985, 'learning_rate': 0.00017532153126571107, 'epoch': 0.81} + 27%|██▋ | 4670/17285 [42:00:31<116:10:52, 33.16s/it] 27%|██▋ | 4671/17285 [42:01:01<112:31:33, 32.11s/it] 27%|██▋ | 4672/17285 [42:01:31<110:09:08, 31.44s/it] 27%|██▋ | 4673/17285 [42:02:07<115:45:21, 33.04s/it] 27%|██▋ | 4674/17285 [42:02:40<115:26:36, 32.96s/it] 27%|██▋ | 4675/17285 [42:03:12<113:52:15, 32.51s/it] 27%|██▋ | 4676/17285 [42:03:42<111:37:22, 31.87s/it] 27%|██▋ | 4677/17285 [42:04:13<111:08:16, 31.73s/it] 27%|██▋ | 4678/17285 [42:04:52<118:22:13, 33.80s/it] 27%|██▋ | 4679/17285 [42:05:34<127:11:44, 36.32s/it] 27%|██▋ | 4680/17285 [42:06:08<124:40:15, 35.61s/it] {'loss': 1.5992, 'learning_rate': 0.00017519554341393593, 'epoch': 0.81} + 27%|██▋ | 4680/17285 [42:06:08<124:40:15, 35.61s/it] 27%|██▋ | 4681/17285 [42:06:41<122:01:32, 34.85s/it] 27%|██▋ | 4682/17285 [42:07:20<126:11:05, 36.04s/it] 27%|██▋ | 4683/17285 [42:07:54<124:24:13, 35.54s/it] 27%|██▋ | 4684/17285 [42:08:21<114:49:50, 32.81s/it] 27%|██▋ | 4685/17285 [42:08:48<108:59:14, 31.14s/it] 27%|██▋ | 4686/17285 [42:09:19<108:43:19, 31.07s/it] 27%|██▋ | 4687/17285 [42:09:52<110:24:27, 31.55s/it] 27%|██▋ | 4688/17285 [42:10:27<114:08:20, 32.62s/it] 27%|██▋ | 4689/17285 [42:11:04<119:02:19, 34.02s/it] 27%|██▋ | 4690/17285 [42:11:30<110:18:53, 31.53s/it] {'loss': 1.5891, 'learning_rate': 0.00017506928030070054, 'epoch': 0.81} + 27%|██▋ | 4690/17285 [42:11:30<110:18:53, 31.53s/it] 27%|██▋ | 4691/17285 [42:12:01<109:38:07, 31.34s/it] 27%|██▋ | 4692/17285 [42:12:35<113:14:15, 32.37s/it] 27%|██▋ | 4693/17285 [42:13:04<109:33:13, 31.32s/it] 27%|██▋ | 4694/17285 [42:13:37<110:43:00, 31.66s/it] 27%|██▋ | 4695/17285 [42:14:06<108:43:04, 31.09s/it] 27%|██▋ | 4696/17285 [42:14:39<109:55:06, 31.43s/it] 27%|██▋ | 4697/17285 [42:15:15<115:21:06, 32.99s/it] 27%|██▋ | 4698/17285 [42:15:46<112:51:03, 32.28s/it] 27%|██▋ | 4699/17285 [42:16:12<106:24:18, 30.44s/it] 27%|██▋ | 4700/17285 [42:16:42<105:25:22, 30.16s/it] {'loss': 1.5622, 'learning_rate': 0.00017494274238820468, 'epoch': 0.82} + 27%|██▋ | 4700/17285 [42:16:42<105:25:22, 30.16s/it] 27%|██▋ | 4701/17285 [42:17:06<99:24:15, 28.44s/it] 27%|██▋ | 4702/17285 [42:17:34<99:17:48, 28.41s/it] 27%|██▋ | 4703/17285 [42:18:04<100:05:29, 28.64s/it] 27%|██▋ | 4704/17285 [42:18:30<98:06:58, 28.08s/it] 27%|██▋ | 4705/17285 [42:19:02<101:38:47, 29.09s/it] 27%|██▋ | 4706/17285 [42:19:29<99:32:34, 28.49s/it] [2023-08-24 18:14:39,324] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 27%|██▋ | 4707/17285 [42:20:02<104:04:11, 29.79s/it] 27%|██▋ | 4708/17285 [42:20:39<112:19:02, 32.15s/it] 27%|██▋ | 4709/17285 [42:21:15<115:31:21, 33.07s/it] 27%|██▋ | 4710/17285 [42:21:54<121:51:12, 34.88s/it] {'loss': 1.5646, 'learning_rate': 0.00017482862369640954, 'epoch': 0.82} + 27%|██▋ | 4710/17285 [42:21:54<121:51:12, 34.88s/it] 27%|██▋ | 4711/17285 [42:22:25<118:11:17, 33.84s/it] 27%|██▋ | 4712/17285 [42:22:56<115:20:13, 33.02s/it] 27%|██▋ | 4713/17285 [42:23:30<116:21:40, 33.32s/it] 27%|██▋ | 4714/17285 [42:23:57<109:49:06, 31.45s/it] 27%|██▋ | 4715/17285 [42:24:23<103:44:30, 29.71s/it] 27%|██▋ | 4716/17285 [42:25:05<116:44:25, 33.44s/it] 27%|██▋ | 4717/17285 [42:25:34<112:13:15, 32.14s/it] 27%|██▋ | 4718/17285 [42:26:02<107:31:02, 30.80s/it] 27%|██▋ | 4719/17285 [42:26:34<108:25:46, 31.06s/it] 27%|██▋ | 4720/17285 [42:27:11<114:55:15, 32.93s/it] {'loss': 1.6121, 'learning_rate': 0.00017470156494228082, 'epoch': 0.82} + 27%|██▋ | 4720/17285 [42:27:11<114:55:15, 32.93s/it] 27%|██▋ | 4721/17285 [42:27:41<111:33:44, 31.97s/it] 27%|██▋ | 4722/17285 [42:28:06<104:21:41, 29.91s/it] 27%|██▋ | 4723/17285 [42:28:41<109:45:11, 31.45s/it] 27%|██▋ | 4724/17285 [42:29:09<106:24:56, 30.50s/it] 27%|██▋ | 4725/17285 [42:29:42<108:47:53, 31.18s/it] 27%|██▋ | 4726/17285 [42:30:10<105:32:41, 30.25s/it] 27%|██▋ | 4727/17285 [42:30:49<115:12:52, 33.03s/it] 27%|██▋ | 4728/17285 [42:31:20<112:53:02, 32.36s/it] 27%|██▋ | 4729/17285 [42:31:56<116:07:01, 33.29s/it] 27%|██▋ | 4730/17285 [42:32:26<113:16:30, 32.48s/it] {'loss': 1.5766, 'learning_rate': 0.0001745742327349537, 'epoch': 0.82} + 27%|██▋ | 4730/17285 [42:32:26<113:16:30, 32.48s/it] 27%|██▋ | 4731/17285 [42:32:57<111:12:18, 31.89s/it] 27%|██▋ | 4732/17285 [42:33:28<110:19:01, 31.64s/it] 27%|██▋ | 4733/17285 [42:34:09<120:48:08, 34.65s/it] 27%|██▋ | 4734/17285 [42:34:42<118:33:15, 34.00s/it] 27%|██▋ | 4735/17285 [42:35:13<115:19:35, 33.08s/it] 27%|██▋ | 4736/17285 [42:35:41<110:25:01, 31.68s/it] 27%|██▋ | 4737/17285 [42:36:08<105:30:29, 30.27s/it] 27%|██▋ | 4738/17285 [42:36:39<106:29:46, 30.56s/it] 27%|██▋ | 4739/17285 [42:37:20<116:33:21, 33.45s/it] 27%|██▋ | 4740/17285 [42:37:50<113:43:46, 32.64s/it] {'loss': 1.557, 'learning_rate': 0.00017444662754054156, 'epoch': 0.82} + 27%|██▋ | 4740/17285 [42:37:50<113:43:46, 32.64s/it] 27%|██▋ | 4741/17285 [42:38:20<110:12:26, 31.63s/it] 27%|██▋ | 4742/17285 [42:38:55<113:56:48, 32.70s/it] 27%|██▋ | 4743/17285 [42:39:38<124:47:46, 35.82s/it] 27%|██▋ | 4744/17285 [42:40:04<114:04:16, 32.75s/it] 27%|██▋ | 4745/17285 [42:40:35<112:14:06, 32.22s/it] 27%|██▋ | 4746/17285 [42:41:06<111:28:44, 32.01s/it] 27%|██▋ | 4747/17285 [42:41:36<109:27:35, 31.43s/it] 27%|██▋ | 4748/17285 [42:42:09<111:28:22, 32.01s/it] 27%|██▋ | 4749/17285 [42:42:45<114:59:32, 33.02s/it] 27%|██▋ | 4750/17285 [42:43:13<109:53:05, 31.56s/it] {'loss': 1.5716, 'learning_rate': 0.00017431874982615708, 'epoch': 0.82} + 27%|██▋ | 4750/17285 [42:43:13<109:53:05, 31.56s/it] 27%|██▋ | 4751/17285 [42:43:44<109:39:07, 31.49s/it] 27%|██▋ | 4752/17285 [42:44:16<109:46:29, 31.53s/it] 27%|██▋ | 4753/17285 [42:44:54<116:17:14, 33.41s/it] 28%|██▊ | 4754/17285 [42:45:19<107:58:29, 31.02s/it] 28%|██▊ | 4755/17285 [42:45:44<101:45:52, 29.24s/it] 28%|██▊ | 4756/17285 [42:46:25<113:31:14, 32.62s/it] 28%|██▊ | 4757/17285 [42:46:54<109:39:44, 31.51s/it] 28%|██▊ | 4758/17285 [42:47:29<113:57:09, 32.75s/it] 28%|██▊ | 4759/17285 [42:47:58<109:28:30, 31.46s/it] 28%|██▊ | 4760/17285 [42:48:28<107:59:39, 31.04s/it] {'loss': 1.5992, 'learning_rate': 0.00017419060005991054, 'epoch': 0.83} + 28%|██▊ | 4760/17285 [42:48:28<107:59:39, 31.04s/it] 28%|██▊ | 4761/17285 [42:49:05<114:07:30, 32.81s/it] 28%|██▊ | 4762/17285 [42:49:40<116:13:25, 33.41s/it] 28%|██▊ | 4763/17285 [42:50:13<116:32:44, 33.51s/it] 28%|██▊ | 4764/17285 [42:50:48<117:34:22, 33.80s/it] 28%|██▊ | 4765/17285 [42:51:31<127:39:13, 36.71s/it] 28%|██▊ | 4766/17285 [42:52:05<124:03:31, 35.67s/it] 28%|██▊ | 4767/17285 [42:52:31<114:19:08, 32.88s/it] 28%|██▊ | 4768/17285 [42:53:00<110:33:37, 31.80s/it] 28%|██▊ | 4769/17285 [42:53:31<109:14:48, 31.42s/it] 28%|██▊ | 4770/17285 [42:54:05<111:50:58, 32.17s/it] {'loss': 1.6036, 'learning_rate': 0.0001740621787109081, 'epoch': 0.83} + 28%|██▊ | 4770/17285 [42:54:05<111:50:58, 32.17s/it] 28%|██▊ | 4771/17285 [42:54:40<115:00:22, 33.08s/it] 28%|██▊ | 4772/17285 [42:55:13<115:13:44, 33.15s/it] 28%|██▊ | 4773/17285 [42:55:51<120:22:40, 34.64s/it] 28%|██▊ | 4774/17285 [42:56:23<117:36:00, 33.84s/it] 28%|██▊ | 4775/17285 [42:56:57<117:35:12, 33.84s/it] 28%|██▊ | 4776/17285 [42:57:34<120:21:23, 34.64s/it] 28%|██▊ | 4777/17285 [42:58:04<116:19:24, 33.48s/it] 28%|██▊ | 4778/17285 [42:58:39<117:26:11, 33.80s/it] 28%|██▊ | 4779/17285 [42:59:05<109:49:24, 31.61s/it] 28%|██▊ | 4780/17285 [42:59:35<107:22:01, 30.91s/it] {'loss': 1.6121, 'learning_rate': 0.00017393348624925004, 'epoch': 0.83} + 28%|██▊ | 4780/17285 [42:59:35<107:22:01, 30.91s/it] 28%|██▊ | 4781/17285 [43:00:05<106:28:42, 30.66s/it] 28%|██▊ | 4782/17285 [43:00:32<102:26:33, 29.50s/it] 28%|██▊ | 4783/17285 [43:01:01<101:51:07, 29.33s/it] 28%|██▊ | 4784/17285 [43:01:35<106:56:14, 30.80s/it] 28%|██▊ | 4785/17285 [43:02:02<103:12:18, 29.72s/it] 28%|██▊ | 4786/17285 [43:02:32<103:14:30, 29.74s/it] 28%|██▊ | 4787/17285 [43:03:09<111:23:15, 32.08s/it] 28%|██▊ | 4788/17285 [43:03:43<113:17:41, 32.64s/it] 28%|██▊ | 4789/17285 [43:04:22<119:20:39, 34.38s/it] 28%|██▊ | 4790/17285 [43:04:52<115:29:33, 33.28s/it] {'loss': 1.6076, 'learning_rate': 0.00017380452314602916, 'epoch': 0.83} + 28%|██▊ | 4790/17285 [43:04:52<115:29:33, 33.28s/it] 28%|██▊ | 4791/17285 [43:05:23<112:53:55, 32.53s/it] 28%|██▊ | 4792/17285 [43:05:53<109:35:09, 31.58s/it] 28%|██▊ | 4793/17285 [43:06:29<115:08:26, 33.18s/it] 28%|██▊ | 4794/17285 [43:06:54<105:41:11, 30.46s/it] 28%|██▊ | 4795/17285 [43:07:22<103:18:12, 29.78s/it] 28%|██▊ | 4796/17285 [43:07:58<110:17:04, 31.79s/it] 28%|██▊ | 4797/17285 [43:08:31<111:34:13, 32.16s/it] 28%|██▊ | 4798/17285 [43:09:06<114:18:27, 32.95s/it] 28%|██▊ | 4799/17285 [43:09:36<111:22:41, 32.11s/it] 28%|██▊ | 4800/17285 [43:10:06<108:52:17, 31.39s/it] {'loss': 1.5798, 'learning_rate': 0.00017367528987332885, 'epoch': 0.83} + 28%|██▊ | 4800/17285 [43:10:06<108:52:17, 31.39s/it] 28%|██▊ | 4801/17285 [43:10:35<105:58:28, 30.56s/it] 28%|██▊ | 4802/17285 [43:11:04<105:11:27, 30.34s/it] 28%|██▊ | 4803/17285 [43:11:35<105:04:52, 30.31s/it] 28%|██▊ | 4804/17285 [43:12:09<109:01:31, 31.45s/it] 28%|██▊ | 4805/17285 [43:12:38<107:07:05, 30.90s/it] 28%|██▊ | 4806/17285 [43:13:08<106:20:21, 30.68s/it] 28%|██▊ | 4807/17285 [43:13:44<111:16:44, 32.10s/it] 28%|██▊ | 4808/17285 [43:14:19<114:52:00, 33.14s/it] 28%|██▊ | 4809/17285 [43:14:48<110:20:47, 31.84s/it] 28%|██▊ | 4810/17285 [43:15:21<111:28:44, 32.17s/it] {'loss': 1.5597, 'learning_rate': 0.00017354578690422157, 'epoch': 0.83} + 28%|██▊ | 4810/17285 [43:15:21<111:28:44, 32.17s/it] 28%|██▊ | 4811/17285 [43:15:53<111:30:17, 32.18s/it] 28%|██▊ | 4812/17285 [43:16:20<106:00:06, 30.59s/it] 28%|██▊ | 4813/17285 [43:16:49<103:56:05, 30.00s/it] 28%|██▊ | 4814/17285 [43:17:20<104:38:07, 30.21s/it] 28%|██▊ | 4815/17285 [43:17:56<110:38:48, 31.94s/it] 28%|██▊ | 4816/17285 [43:18:31<114:20:17, 33.01s/it] 28%|██▊ | 4817/17285 [43:19:03<113:02:45, 32.64s/it] 28%|██▊ | 4818/17285 [43:19:33<109:53:34, 31.73s/it] 28%|██▊ | 4819/17285 [43:20:06<111:20:58, 32.16s/it] 28%|██▊ | 4820/17285 [43:20:39<112:17:14, 32.43s/it] {'loss': 1.5834, 'learning_rate': 0.00017341601471276708, 'epoch': 0.84} + 28%|██▊ | 4820/17285 [43:20:39<112:17:14, 32.43s/it] 28%|██▊ | 4821/17285 [43:21:07<108:11:07, 31.25s/it] 28%|██▊ | 4822/17285 [43:21:44<113:42:29, 32.85s/it] 28%|██▊ | 4823/17285 [43:22:19<116:06:26, 33.54s/it] 28%|██▊ | 4824/17285 [43:22:46<109:46:40, 31.71s/it] 28%|██▊ | 4825/17285 [43:23:15<106:30:16, 30.77s/it] 28%|██▊ | 4826/17285 [43:23:50<110:35:57, 31.96s/it] 28%|██▊ | 4827/17285 [43:24:30<118:45:22, 34.32s/it] 28%|██▊ | 4828/17285 [43:25:00<114:45:53, 33.17s/it] 28%|██▊ | 4829/17285 [43:25:35<116:54:17, 33.79s/it] 28%|██▊ | 4830/17285 [43:26:10<118:03:36, 34.12s/it] {'loss': 1.6169, 'learning_rate': 0.0001732859737740105, 'epoch': 0.84} + 28%|██▊ | 4830/17285 [43:26:10<118:03:36, 34.12s/it] 28%|██▊ | 4831/17285 [43:26:45<119:07:14, 34.43s/it] 28%|██▊ | 4832/17285 [43:27:12<110:42:07, 32.00s/it] 28%|██▊ | 4833/17285 [43:27:40<107:20:03, 31.03s/it] 28%|██▊ | 4834/17285 [43:28:14<109:56:49, 31.79s/it] 28%|██▊ | 4835/17285 [43:28:52<115:58:27, 33.53s/it] 28%|██▊ | 4836/17285 [43:29:21<111:12:12, 32.16s/it] 28%|██▊ | 4837/17285 [43:29:58<116:31:08, 33.70s/it] 28%|██▊ | 4838/17285 [43:30:31<115:29:54, 33.41s/it] 28%|██▊ | 4839/17285 [43:30:58<109:22:20, 31.64s/it] 28%|██▊ | 4840/17285 [43:31:33<113:03:20, 32.70s/it] {'loss': 1.5933, 'learning_rate': 0.00017315566456398086, 'epoch': 0.84} + 28%|██▊ | 4840/17285 [43:31:33<113:03:20, 32.70s/it] 28%|██▊ | 4841/17285 [43:32:02<109:04:46, 31.56s/it] 28%|██▊ | 4842/17285 [43:32:36<111:05:45, 32.14s/it] 28%|██▊ | 4843/17285 [43:33:13<116:59:49, 33.85s/it] 28%|██▊ | 4844/17285 [43:33:45<114:17:27, 33.07s/it][2023-08-24 19:28:56,457] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 28%|██▊ | 4845/17285 [43:34:19<115:19:00, 33.37s/it] 28%|██▊ | 4846/17285 [43:34:52<114:42:07, 33.20s/it] 28%|██▊ | 4847/17285 [43:35:17<107:00:31, 30.97s/it] 28%|██▊ | 4848/17285 [43:35:52<110:45:10, 32.06s/it] 28%|██▊ | 4849/17285 [43:36:20<106:34:04, 30.85s/it] 28%|██▊ | 4850/17285 [43:36:45<100:52:03, 29.20s/it] {'loss': 1.6006, 'learning_rate': 0.00017303815729724509, 'epoch': 0.84} + 28%|██▊ | 4850/17285 [43:36:45<100:52:03, 29.20s/it] 28%|██▊ | 4851/17285 [43:37:09<94:37:12, 27.40s/it] 28%|██▊ | 4852/17285 [43:37:55<114:14:41, 33.08s/it] 28%|██▊ | 4853/17285 [43:38:24<110:38:38, 32.04s/it] 28%|██▊ | 4854/17285 [43:38:52<105:37:54, 30.59s/it] 28%|██▊ | 4855/17285 [43:39:23<106:11:54, 30.76s/it][2023-08-24 19:34:34,345] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 28%|██▊ | 4856/17285 [43:39:57<109:23:23, 31.68s/it] 28%|██▊ | 4857/17285 [43:40:26<106:58:26, 30.99s/it] 28%|██▊ | 4858/17285 [43:41:00<110:00:34, 31.87s/it] 28%|██▊ | 4859/17285 [43:41:30<107:47:27, 31.23s/it] 28%|██▊ | 4860/17285 [43:42:11<117:46:20, 34.12s/it] {'loss': 1.6013, 'learning_rate': 0.00017292043346556449, 'epoch': 0.84} + 28%|██▊ | 4860/17285 [43:42:11<117:46:20, 34.12s/it] 28%|██▊ | 4861/17285 [43:42:41<113:46:06, 32.97s/it] 28%|██▊ | 4862/17285 [43:43:08<107:56:48, 31.28s/it] 28%|██▊ | 4863/17285 [43:43:37<105:00:50, 30.43s/it] 28%|██▊ | 4864/17285 [43:44:06<103:23:11, 29.96s/it] 28%|██▊ | 4865/17285 [43:44:44<112:24:41, 32.58s/it] 28%|██▊ | 4866/17285 [43:45:19<115:04:04, 33.36s/it] 28%|██▊ | 4867/17285 [43:45:50<112:02:57, 32.48s/it] 28%|██▊ | 4868/17285 [43:46:15<104:56:25, 30.42s/it] 28%|██▊ | 4869/17285 [43:46:42<101:23:33, 29.40s/it] 28%|██▊ | 4870/17285 [43:47:12<101:48:57, 29.52s/it] {'loss': 1.6042, 'learning_rate': 0.0001727893756367969, 'epoch': 0.85} + 28%|██▊ | 4870/17285 [43:47:12<101:48:57, 29.52s/it] 28%|██▊ | 4871/17285 [43:47:49<109:32:27, 31.77s/it] 28%|██▊ | 4872/17285 [43:48:17<105:38:45, 30.64s/it] 28%|██▊ | 4873/17285 [43:48:52<109:27:06, 31.75s/it] 28%|██▊ | 4874/17285 [43:49:28<114:15:05, 33.14s/it] 28%|██▊ | 4875/17285 [43:49:57<110:23:12, 32.02s/it] 28%|██▊ | 4876/17285 [43:50:30<110:44:18, 32.13s/it] 28%|██▊ | 4877/17285 [43:51:04<112:29:30, 32.64s/it] 28%|██▊ | 4878/17285 [43:51:29<105:24:27, 30.58s/it] 28%|██▊ | 4879/17285 [43:51:57<102:03:20, 29.61s/it] 28%|██▊ | 4880/17285 [43:52:35<111:15:34, 32.29s/it] {'loss': 1.5738, 'learning_rate': 0.00017265805135460778, 'epoch': 0.85} + 28%|██▊ | 4880/17285 [43:52:35<111:15:34, 32.29s/it] 28%|██▊ | 4881/17285 [43:53:03<106:36:26, 30.94s/it] 28%|██▊ | 4882/17285 [43:53:36<108:30:07, 31.49s/it] 28%|██▊ | 4883/17285 [43:54:03<103:54:07, 30.16s/it] 28%|██▊ | 4884/17285 [43:54:33<104:14:35, 30.26s/it] 28%|██▊ | 4885/17285 [43:55:05<106:08:37, 30.82s/it] 28%|██▊ | 4886/17285 [43:55:39<108:58:36, 31.64s/it] 28%|██▊ | 4887/17285 [43:56:07<104:52:37, 30.45s/it] 28%|██▊ | 4888/17285 [43:56:41<108:18:42, 31.45s/it] 28%|██▊ | 4889/17285 [43:57:11<107:30:55, 31.22s/it] 28%|██▊ | 4890/17285 [43:57:42<107:06:47, 31.11s/it] {'loss': 1.6376, 'learning_rate': 0.00017252646109972383, 'epoch': 0.85} + 28%|██▊ | 4890/17285 [43:57:42<107:06:47, 31.11s/it] 28%|██▊ | 4891/17285 [43:58:08<101:29:32, 29.48s/it] 28%|██▊ | 4892/17285 [43:58:46<110:25:01, 32.07s/it] 28%|██▊ | 4893/17285 [43:59:19<111:41:46, 32.45s/it] 28%|██▊ | 4894/17285 [43:59:53<113:08:01, 32.87s/it] 28%|██▊ | 4895/17285 [44:00:23<109:54:05, 31.93s/it][2023-08-24 19:55:31,806] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 28%|██▊ | 4896/17285 [44:00:54<109:16:29, 31.75s/it] 28%|██▊ | 4897/17285 [44:01:29<112:48:47, 32.78s/it] 28%|██▊ | 4898/17285 [44:01:58<108:46:09, 31.61s/it] 28%|██▊ | 4899/17285 [44:02:27<105:52:11, 30.77s/it] 28%|██▊ | 4900/17285 [44:03:04<112:11:51, 32.61s/it] {'loss': 1.5904, 'learning_rate': 0.00017240780286177955, 'epoch': 0.85} + 28%|██▊ | 4900/17285 [44:03:04<112:11:51, 32.61s/it] 28%|██▊ | 4901/17285 [44:03:34<109:41:46, 31.89s/it] 28%|██▊ | 4902/17285 [44:04:09<113:02:43, 32.86s/it] 28%|██▊ | 4903/17285 [44:04:39<109:34:08, 31.86s/it] 28%|██▊ | 4904/17285 [44:05:17<116:32:12, 33.89s/it] 28%|██▊ | 4905/17285 [44:05:50<114:57:31, 33.43s/it] 28%|██▊ | 4906/17285 [44:06:17<108:09:28, 31.45s/it] 28%|██▊ | 4907/17285 [44:06:52<111:58:50, 32.57s/it] 28%|██▊ | 4908/17285 [44:07:23<110:56:27, 32.27s/it] 28%|██▊ | 4909/17285 [44:07:52<107:03:23, 31.14s/it] 28%|██▊ | 4910/17285 [44:08:28<112:15:23, 32.66s/it] {'loss': 1.5651, 'learning_rate': 0.0001722757085866635, 'epoch': 0.85} + 28%|██▊ | 4910/17285 [44:08:28<112:15:23, 32.66s/it] 28%|██▊ | 4911/17285 [44:08:59<110:10:43, 32.05s/it] 28%|██▊ | 4912/17285 [44:09:32<111:28:45, 32.44s/it] 28%|██▊ | 4913/17285 [44:09:59<105:59:51, 30.84s/it] 28%|██▊ | 4914/17285 [44:10:30<105:52:32, 30.81s/it] 28%|██▊ | 4915/17285 [44:11:01<105:43:58, 30.77s/it] 28%|██▊ | 4916/17285 [44:11:30<104:21:01, 30.37s/it] 28%|██▊ | 4917/17285 [44:12:05<109:27:22, 31.86s/it] 28%|██▊ | 4918/17285 [44:12:35<106:54:47, 31.12s/it] 28%|██▊ | 4919/17285 [44:13:02<102:47:57, 29.93s/it] 28%|██▊ | 4920/17285 [44:13:40<111:08:08, 32.36s/it] {'loss': 1.5923, 'learning_rate': 0.00017214334973845988, 'epoch': 0.85} + 28%|██▊ | 4920/17285 [44:13:40<111:08:08, 32.36s/it] 28%|██▊ | 4921/17285 [44:14:11<109:59:45, 32.03s/it] 28%|██▊ | 4922/17285 [44:14:54<121:38:27, 35.42s/it] 28%|██▊ | 4923/17285 [44:15:31<122:58:39, 35.81s/it] 28%|██▊ | 4924/17285 [44:15:57<112:55:10, 32.89s/it] 28%|██▊ | 4925/17285 [44:16:28<111:01:39, 32.34s/it] 28%|██▊ | 4926/17285 [44:17:05<115:47:40, 33.73s/it] 29%|██▊ | 4927/17285 [44:17:32<108:35:26, 31.63s/it] 29%|██▊ | 4928/17285 [44:18:00<105:07:23, 30.63s/it] 29%|██▊ | 4929/17285 [44:18:31<105:05:09, 30.62s/it] 29%|██▊ | 4930/17285 [44:19:00<103:08:03, 30.05s/it] {'loss': 1.6032, 'learning_rate': 0.0001720107268016827, 'epoch': 0.86} + 29%|██▊ | 4930/17285 [44:19:00<103:08:03, 30.05s/it] 29%|██▊ | 4931/17285 [44:19:32<105:07:11, 30.63s/it] 29%|██▊ | 4932/17285 [44:20:08<110:32:17, 32.21s/it] 29%|██▊ | 4933/17285 [44:20:49<120:14:50, 35.05s/it] 29%|██▊ | 4934/17285 [44:21:15<111:07:01, 32.39s/it] 29%|██▊ | 4935/17285 [44:21:42<105:15:19, 30.68s/it] 29%|██▊ | 4936/17285 [44:22:09<101:52:29, 29.70s/it] 29%|██▊ | 4937/17285 [44:22:38<100:38:17, 29.34s/it] 29%|██▊ | 4938/17285 [44:23:16<109:17:30, 31.87s/it] 29%|██▊ | 4939/17285 [44:23:50<111:17:03, 32.45s/it] 29%|██▊ | 4940/17285 [44:24:27<116:30:22, 33.98s/it] {'loss': 1.5859, 'learning_rate': 0.00017187784026181265, 'epoch': 0.86} + 29%|██▊ | 4940/17285 [44:24:27<116:30:22, 33.98s/it] 29%|██▊ | 4941/17285 [44:24:54<108:57:25, 31.78s/it] 29%|██▊ | 4942/17285 [44:25:25<108:03:01, 31.51s/it] 29%|██▊ | 4943/17285 [44:25:55<106:40:50, 31.12s/it] 29%|██▊ | 4944/17285 [44:26:23<103:34:33, 30.21s/it] 29%|██▊ | 4945/17285 [44:26:57<107:31:54, 31.37s/it] 29%|██▊ | 4946/17285 [44:27:34<112:52:59, 32.93s/it] 29%|██▊ | 4947/17285 [44:28:04<109:52:18, 32.06s/it] 29%|██▊ | 4948/17285 [44:28:40<114:06:26, 33.30s/it] 29%|██▊ | 4949/17285 [44:29:05<106:14:29, 31.00s/it] 29%|██▊ | 4950/17285 [44:29:38<107:55:53, 31.50s/it] {'loss': 1.5376, 'learning_rate': 0.00017174469060529527, 'epoch': 0.86} + 29%|██▊ | 4950/17285 [44:29:38<107:55:53, 31.50s/it] 29%|██▊ | 4951/17285 [44:30:18<116:31:12, 34.01s/it] 29%|██▊ | 4952/17285 [44:30:55<119:59:21, 35.02s/it] 29%|██▊ | 4953/17285 [44:31:27<116:33:49, 34.03s/it] 29%|██▊ | 4954/17285 [44:31:54<109:13:48, 31.89s/it] 29%|██▊ | 4955/17285 [44:32:20<103:28:17, 30.21s/it] 29%|██▊ | 4956/17285 [44:32:53<105:58:33, 30.94s/it] 29%|██▊ | 4957/17285 [44:33:28<109:46:04, 32.05s/it] 29%|██▊ | 4958/17285 [44:33:55<105:16:54, 30.75s/it] 29%|██▊ | 4959/17285 [44:34:26<105:11:47, 30.72s/it] 29%|██▊ | 4960/17285 [44:34:53<101:43:55, 29.71s/it] {'loss': 1.5445, 'learning_rate': 0.00017161127831953946, 'epoch': 0.86} + 29%|██▊ | 4960/17285 [44:34:53<101:43:55, 29.71s/it] 29%|██▊ | 4961/17285 [44:35:19<97:48:06, 28.57s/it] 29%|██▊ | 4962/17285 [44:35:55<105:15:14, 30.75s/it] 29%|██▊ | 4963/17285 [44:36:23<102:45:41, 30.02s/it] 29%|██▊ | 4964/17285 [44:36:52<101:27:09, 29.64s/it] 29%|██▊ | 4965/17285 [44:37:22<102:05:21, 29.83s/it] 29%|██▊ | 4966/17285 [44:37:55<104:28:30, 30.53s/it] 29%|██▊ | 4967/17285 [44:38:29<108:15:12, 31.64s/it] 29%|██▊ | 4968/17285 [44:39:00<107:30:35, 31.42s/it] 29%|██▊ | 4969/17285 [44:39:29<105:20:38, 30.79s/it] 29%|██▉ | 4970/17285 [44:39:59<104:17:03, 30.49s/it] {'loss': 1.5652, 'learning_rate': 0.0001714776038929153, 'epoch': 0.86} + 29%|██▉ | 4970/17285 [44:39:59<104:17:03, 30.49s/it] 29%|██▉ | 4971/17285 [44:40:33<108:09:44, 31.62s/it] 29%|██▉ | 4972/17285 [44:41:08<112:04:42, 32.77s/it] 29%|██▉ | 4973/17285 [44:41:43<113:34:46, 33.21s/it] 29%|██▉ | 4974/17285 [44:42:20<117:51:04, 34.46s/it] 29%|██▉ | 4975/17285 [44:42:49<111:52:35, 32.72s/it] 29%|██▉ | 4976/17285 [44:43:21<111:27:45, 32.60s/it] 29%|██▉ | 4977/17285 [44:43:49<106:50:15, 31.25s/it] 29%|██▉ | 4978/17285 [44:44:32<118:44:05, 34.73s/it] 29%|██▉ | 4979/17285 [44:45:02<114:08:56, 33.39s/it] 29%|██▉ | 4980/17285 [44:45:34<112:26:41, 32.90s/it] {'loss': 1.5267, 'learning_rate': 0.00017134366781475262, 'epoch': 0.86} + 29%|██▉ | 4980/17285 [44:45:34<112:26:41, 32.90s/it] 29%|██▉ | 4981/17285 [44:46:01<106:05:00, 31.04s/it] 29%|██▉ | 4982/17285 [44:46:30<104:35:13, 30.60s/it] 29%|██▉ | 4983/17285 [44:47:01<104:12:01, 30.49s/it] 29%|██▉ | 4984/17285 [44:47:32<105:16:57, 30.81s/it] 29%|██▉ | 4985/17285 [44:48:06<108:44:49, 31.83s/it] 29%|██▉ | 4986/17285 [44:48:52<122:36:27, 35.89s/it] 29%|██▉ | 4987/17285 [44:49:24<118:53:32, 34.80s/it] 29%|██▉ | 4988/17285 [44:50:01<120:43:21, 35.34s/it] 29%|██▉ | 4989/17285 [44:50:32<117:13:24, 34.32s/it] 29%|██▉ | 4990/17285 [44:51:04<114:27:18, 33.51s/it] {'loss': 1.5805, 'learning_rate': 0.00017120947057533897, 'epoch': 0.87} + 29%|██▉ | 4990/17285 [44:51:04<114:27:18, 33.51s/it] 29%|██▉ | 4991/17285 [44:51:34<110:15:27, 32.29s/it] 29%|██▉ | 4992/17285 [44:52:12<116:34:57, 34.14s/it] 29%|██▉ | 4993/17285 [44:52:41<111:41:45, 32.71s/it] 29%|██▉ | 4994/17285 [44:53:15<112:32:57, 32.97s/it] 29%|██▉ | 4995/17285 [44:53:46<110:46:56, 32.45s/it] 29%|██▉ | 4996/17285 [44:54:26<118:10:29, 34.62s/it] 29%|██▉ | 4997/17285 [44:54:57<114:36:24, 33.58s/it][2023-08-24 20:50:10,299] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 29%|██▉ | 4998/17285 [44:55:33<116:39:57, 34.18s/it] 29%|██▉ | 4999/17285 [44:56:08<117:33:25, 34.45s/it] 29%|██▉ | 5000/17285 [44:56:42<117:18:45, 34.38s/it] {'loss': 1.5876, 'learning_rate': 0.00017108847017299018, 'epoch': 0.87} + 29%|██▉ | 5000/17285 [44:56:42<117:18:45, 34.38s/it][INFO|trainer.py:3081] 2023-08-24 20:51:19,605 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-24 20:51:19,606 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-24 20:51:19,606 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-2000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-5000 +[INFO|tokenization_utils_base.py:2210] 2023-08-24 20:52:44,881 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-5000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-24 20:52:44,884 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-5000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-5000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-5000 + 29%|██▉ | 5001/17285 [44:58:41<204:14:32, 59.86s/it] 29%|██▉ | 5002/17285 [44:59:12<174:33:13, 51.16s/it] 29%|██▉ | 5003/17285 [44:59:57<167:47:53, 49.18s/it] 29%|██▉ | 5004/17285 [45:00:34<155:23:28, 45.55s/it] 29%|██▉ | 5005/17285 [45:01:09<144:23:58, 42.33s/it] 29%|██▉ | 5006/17285 [45:01:44<137:20:43, 40.27s/it] 29%|██▉ | 5007/17285 [45:02:21<134:08:17, 39.33s/it] 29%|██▉ | 5008/17285 [45:02:57<130:18:47, 38.21s/it] 29%|██▉ | 5009/17285 [45:03:23<117:39:25, 34.50s/it] 29%|██▉ | 5010/17285 [45:03:57<118:03:07, 34.62s/it] {'loss': 1.6163, 'learning_rate': 0.00017095377808136445, 'epoch': 0.87} + 29%|██▉ | 5010/17285 [45:03:57<118:03:07, 34.62s/it][2023-08-24 20:59:01,269] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 29%|██▉ | 5011/17285 [45:04:24<109:19:50, 32.07s/it] 29%|██▉ | 5012/17285 [45:04:56<109:21:08, 32.08s/it] 29%|██▉ | 5013/17285 [45:05:37<119:08:25, 34.95s/it] 29%|██▉ | 5014/17285 [45:06:14<120:37:49, 35.39s/it] 29%|██▉ | 5015/17285 [45:06:46<117:33:12, 34.49s/it] 29%|██▉ | 5016/17285 [45:07:16<112:45:52, 33.09s/it] 29%|██▉ | 5017/17285 [45:07:43<106:59:56, 31.40s/it] 29%|██▉ | 5018/17285 [45:08:14<106:05:45, 31.14s/it] 29%|██▉ | 5019/17285 [45:08:42<102:31:25, 30.09s/it] 29%|██▉ | 5020/17285 [45:09:11<102:08:05, 29.98s/it] {'loss': 1.629, 'learning_rate': 0.00017083233311224484, 'epoch': 0.87} + 29%|██▉ | 5020/17285 [45:09:11<102:08:05, 29.98s/it] 29%|██▉ | 5021/17285 [45:09:47<108:14:08, 31.77s/it] 29%|██▉ | 5022/17285 [45:10:17<106:12:43, 31.18s/it] 29%|██▉ | 5023/17285 [45:10:52<109:36:13, 32.18s/it] 29%|██▉ | 5024/17285 [45:11:31<116:58:33, 34.35s/it] 29%|██▉ | 5025/17285 [45:11:58<109:10:40, 32.06s/it] 29%|██▉ | 5026/17285 [45:12:37<116:29:47, 34.21s/it] 29%|██▉ | 5027/17285 [45:13:05<109:46:43, 32.24s/it] 29%|██▉ | 5028/17285 [45:13:41<114:04:58, 33.51s/it] 29%|██▉ | 5029/17285 [45:14:13<112:58:52, 33.19s/it] 29%|██▉ | 5030/17285 [45:14:46<112:06:14, 32.93s/it] {'loss': 1.5993, 'learning_rate': 0.0001706971479483343, 'epoch': 0.87} + 29%|██▉ | 5030/17285 [45:14:46<112:06:14, 32.93s/it] 29%|██▉ | 5031/17285 [45:15:13<106:34:50, 31.31s/it] 29%|██▉ | 5032/17285 [45:15:42<103:47:21, 30.49s/it] 29%|██▉ | 5033/17285 [45:16:07<98:32:34, 28.95s/it] 29%|██▉ | 5034/17285 [45:16:35<97:04:42, 28.53s/it] 29%|██▉ | 5035/17285 [45:17:05<98:50:32, 29.05s/it] 29%|██▉ | 5036/17285 [45:17:38<102:42:24, 30.19s/it] 29%|██▉ | 5037/17285 [45:18:10<104:28:35, 30.71s/it] 29%|██▉ | 5038/17285 [45:18:46<110:02:56, 32.35s/it] 29%|██▉ | 5039/17285 [45:19:24<115:59:04, 34.10s/it] 29%|██▉ | 5040/17285 [45:19:55<113:04:32, 33.24s/it] {'loss': 1.6056, 'learning_rate': 0.00017056170398982906, 'epoch': 0.87} + 29%|██▉ | 5040/17285 [45:19:55<113:04:32, 33.24s/it] 29%|██▉ | 5041/17285 [45:20:22<106:20:48, 31.27s/it] 29%|██▉ | 5042/17285 [45:21:01<114:27:20, 33.66s/it] 29%|██▉ | 5043/17285 [45:21:34<113:52:16, 33.49s/it] 29%|██▉ | 5044/17285 [45:22:05<110:34:04, 32.52s/it] 29%|██▉ | 5045/17285 [45:22:49<122:11:58, 35.94s/it] 29%|██▉ | 5046/17285 [45:23:26<123:28:37, 36.32s/it] 29%|██▉ | 5047/17285 [45:23:52<112:40:34, 33.15s/it] 29%|██▉ | 5048/17285 [45:24:18<105:41:46, 31.09s/it] 29%|██▉ | 5049/17285 [45:24:56<112:41:43, 33.16s/it] 29%|██▉ | 5050/17285 [45:25:31<115:10:32, 33.89s/it] {'loss': 1.5728, 'learning_rate': 0.00017042600173253645, 'epoch': 0.88} + 29%|██▉ | 5050/17285 [45:25:31<115:10:32, 33.89s/it] 29%|██▉ | 5051/17285 [45:26:00<109:15:12, 32.15s/it] 29%|██▉ | 5052/17285 [45:26:24<101:38:47, 29.91s/it] 29%|██▉ | 5053/17285 [45:26:54<101:39:34, 29.92s/it] 29%|██▉ | 5054/17285 [45:27:25<102:51:47, 30.28s/it] 29%|██▉ | 5055/17285 [45:27:59<106:01:29, 31.21s/it] 29%|██▉ | 5056/17285 [45:28:29<105:33:58, 31.08s/it] 29%|██▉ | 5057/17285 [45:29:01<106:00:12, 31.21s/it] 29%|██▉ | 5058/17285 [45:29:29<102:28:09, 30.17s/it] 29%|██▉ | 5059/17285 [45:29:54<97:44:58, 28.78s/it] 29%|██▉ | 5060/17285 [45:30:30<105:11:58, 30.98s/it] {'loss': 1.6298, 'learning_rate': 0.00017029004167320926, 'epoch': 0.88} + 29%|██▉ | 5060/17285 [45:30:30<105:11:58, 30.98s/it] 29%|██▉ | 5061/17285 [45:30:59<102:21:12, 30.14s/it] 29%|██▉ | 5062/17285 [45:31:29<102:18:04, 30.13s/it] 29%|██▉ | 5063/17285 [45:32:08<111:21:41, 32.80s/it] 29%|██▉ | 5064/17285 [45:32:41<112:10:55, 33.05s/it] 29%|██▉ | 5065/17285 [45:33:10<108:09:14, 31.86s/it] 29%|██▉ | 5066/17285 [45:33:42<107:36:02, 31.70s/it] 29%|██▉ | 5067/17285 [45:34:08<102:18:44, 30.15s/it] 29%|██▉ | 5068/17285 [45:34:44<107:58:58, 31.82s/it] 29%|██▉ | 5069/17285 [45:35:22<114:36:23, 33.77s/it] 29%|██▉ | 5070/17285 [45:35:53<111:09:25, 32.76s/it] {'loss': 1.5792, 'learning_rate': 0.00017015382430954413, 'epoch': 0.88} + 29%|██▉ | 5070/17285 [45:35:53<111:09:25, 32.76s/it] 29%|██▉ | 5071/17285 [45:36:25<110:13:28, 32.49s/it] 29%|██▉ | 5072/17285 [45:36:56<109:19:45, 32.23s/it] 29%|██▉ | 5073/17285 [45:37:27<108:13:36, 31.90s/it] 29%|██▉ | 5074/17285 [45:38:04<113:17:57, 33.40s/it] 29%|██▉ | 5075/17285 [45:38:38<113:39:31, 33.51s/it] 29%|██▉ | 5076/17285 [45:39:07<109:22:55, 32.25s/it] 29%|██▉ | 5077/17285 [45:39:53<122:37:25, 36.16s/it] 29%|██▉ | 5078/17285 [45:40:19<112:41:52, 33.24s/it] 29%|██▉ | 5079/17285 [45:40:51<111:14:25, 32.81s/it] 29%|██▉ | 5080/17285 [45:41:21<108:41:33, 32.06s/it] {'loss': 1.5615, 'learning_rate': 0.00017001735014017955, 'epoch': 0.88} + 29%|██▉ | 5080/17285 [45:41:21<108:41:33, 32.06s/it] 29%|██▉ | 5081/17285 [45:41:54<109:12:50, 32.22s/it] 29%|██▉ | 5082/17285 [45:42:21<104:02:49, 30.69s/it] 29%|██▉ | 5083/17285 [45:42:46<98:46:01, 29.14s/it] 29%|██▉ | 5084/17285 [45:43:13<95:50:21, 28.28s/it] 29%|██▉ | 5085/17285 [45:43:44<99:20:46, 29.32s/it] 29%|██▉ | 5086/17285 [45:44:13<98:58:41, 29.21s/it] 29%|██▉ | 5087/17285 [45:44:44<100:29:48, 29.66s/it] 29%|██▉ | 5088/17285 [45:45:17<103:51:44, 30.66s/it] 29%|██▉ | 5089/17285 [45:45:53<109:45:19, 32.40s/it] 29%|██▉ | 5090/17285 [45:46:21<104:34:50, 30.87s/it] {'loss': 1.6449, 'learning_rate': 0.000169880619664694, 'epoch': 0.88} + 29%|██▉ | 5090/17285 [45:46:21<104:34:50, 30.87s/it] 29%|██▉ | 5091/17285 [45:46:54<106:31:41, 31.45s/it] 29%|██▉ | 5092/17285 [45:47:24<105:03:59, 31.02s/it] 29%|██▉ | 5093/17285 [45:48:01<111:14:59, 32.85s/it] 29%|██▉ | 5094/17285 [45:48:36<113:25:27, 33.49s/it] 29%|██▉ | 5095/17285 [45:49:09<113:39:37, 33.57s/it] 29%|██▉ | 5096/17285 [45:49:47<117:36:22, 34.73s/it] 29%|██▉ | 5097/17285 [45:50:18<113:38:31, 33.57s/it] 29%|██▉ | 5098/17285 [45:50:55<117:38:18, 34.75s/it] 29%|██▉ | 5099/17285 [45:51:21<108:44:19, 32.12s/it] 30%|██▉ | 5100/17285 [45:51:57<112:02:50, 33.10s/it] {'loss': 1.5903, 'learning_rate': 0.00016974363338360425, 'epoch': 0.89} + 30%|██▉ | 5100/17285 [45:51:57<112:02:50, 33.10s/it] 30%|██▉ | 5101/17285 [45:52:28<110:06:50, 32.54s/it] 30%|██▉ | 5102/17285 [45:52:58<107:58:05, 31.90s/it] 30%|██▉ | 5103/17285 [45:53:34<111:55:27, 33.08s/it] 30%|██▉ | 5104/17285 [45:54:14<118:29:27, 35.02s/it] 30%|██▉ | 5105/17285 [45:54:45<114:48:11, 33.93s/it] 30%|██▉ | 5106/17285 [45:55:18<113:32:57, 33.56s/it] 30%|██▉ | 5107/17285 [45:55:52<113:45:09, 33.63s/it] 30%|██▉ | 5108/17285 [45:56:16<104:10:48, 30.80s/it] 30%|██▉ | 5109/17285 [45:56:52<110:03:29, 32.54s/it] 30%|██▉ | 5110/17285 [45:57:28<113:15:06, 33.49s/it] {'loss': 1.5682, 'learning_rate': 0.0001696063917983635, 'epoch': 0.89} + 30%|██▉ | 5110/17285 [45:57:28<113:15:06, 33.49s/it] 30%|██▉ | 5111/17285 [45:57:57<108:28:36, 32.08s/it] 30%|██▉ | 5112/17285 [45:58:27<106:48:29, 31.59s/it] 30%|██▉ | 5113/17285 [45:58:57<104:46:50, 30.99s/it] 30%|██▉ | 5114/17285 [45:59:35<111:51:36, 33.09s/it] 30%|██▉ | 5115/17285 [46:00:09<112:43:13, 33.34s/it] 30%|██▉ | 5116/17285 [46:00:42<113:06:29, 33.46s/it] 30%|██▉ | 5117/17285 [46:01:12<109:01:12, 32.25s/it] 30%|██▉ | 5118/17285 [46:01:47<112:06:40, 33.17s/it] 30%|██▉ | 5119/17285 [46:02:21<112:22:40, 33.25s/it] 30%|██▉ | 5120/17285 [46:02:48<106:11:28, 31.43s/it] {'loss': 1.5754, 'learning_rate': 0.00016946889541135946, 'epoch': 0.89} + 30%|██▉ | 5120/17285 [46:02:48<106:11:28, 31.43s/it] 30%|██▉ | 5121/17285 [46:03:23<109:52:45, 32.52s/it] 30%|██▉ | 5122/17285 [46:03:48<102:06:40, 30.22s/it] 30%|██▉ | 5123/17285 [46:04:22<106:30:43, 31.53s/it] 30%|██▉ | 5124/17285 [46:04:56<108:59:46, 32.27s/it] 30%|██▉ | 5125/17285 [46:05:29<109:46:03, 32.50s/it] 30%|██▉ | 5126/17285 [46:05:54<102:11:47, 30.26s/it] 30%|██▉ | 5127/17285 [46:06:24<101:18:18, 30.00s/it] 30%|██▉ | 5128/17285 [46:06:55<102:25:24, 30.33s/it] 30%|██▉ | 5129/17285 [46:07:22<99:12:01, 29.38s/it] 30%|██▉ | 5130/17285 [46:07:47<95:08:01, 28.18s/it] {'loss': 1.6168, 'learning_rate': 0.00016933114472591262, 'epoch': 0.89} + 30%|██▉ | 5130/17285 [46:07:47<95:08:01, 28.18s/it] 30%|██▉ | 5131/17285 [46:08:23<102:47:04, 30.44s/it] 30%|██▉ | 5132/17285 [46:08:52<101:17:26, 30.00s/it] 30%|██▉ | 5133/17285 [46:09:18<96:59:31, 28.73s/it] 30%|██▉ | 5134/17285 [46:09:56<106:38:34, 31.60s/it] 30%|██▉ | 5135/17285 [46:10:22<100:58:40, 29.92s/it] 30%|██▉ | 5136/17285 [46:10:47<95:34:25, 28.32s/it] 30%|██▉ | 5137/17285 [46:11:15<95:01:59, 28.16s/it] 30%|██▉ | 5138/17285 [46:11:40<92:32:58, 27.43s/it] 30%|██▉ | 5139/17285 [46:12:11<95:55:32, 28.43s/it] 30%|██▉ | 5140/17285 [46:12:40<96:07:41, 28.49s/it] {'loss': 1.5977, 'learning_rate': 0.00016919314024627436, 'epoch': 0.89} + 30%|██▉ | 5140/17285 [46:12:40<96:07:41, 28.49s/it] 30%|██▉ | 5141/17285 [46:13:12<99:29:31, 29.49s/it] 30%|██▉ | 5142/17285 [46:13:46<104:21:06, 30.94s/it] 30%|██▉ | 5143/17285 [46:14:18<105:29:57, 31.28s/it] 30%|██▉ | 5144/17285 [46:14:44<99:54:17, 29.62s/it] 30%|██▉ | 5145/17285 [46:15:16<102:40:59, 30.45s/it] 30%|██▉ | 5146/17285 [46:15:50<105:47:07, 31.37s/it][2023-08-24 22:10:55,107] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 30%|██▉ | 5147/17285 [46:16:17<102:12:13, 30.31s/it] 30%|██▉ | 5148/17285 [46:16:59<113:15:44, 33.60s/it] 30%|██▉ | 5149/17285 [46:17:31<111:46:36, 33.16s/it] 30%|██▉ | 5150/17285 [46:18:01<109:03:05, 32.35s/it] {'loss': 1.6037, 'learning_rate': 0.00016906871963807865, 'epoch': 0.89} + 30%|██▉ | 5150/17285 [46:18:01<109:03:05, 32.35s/it] 30%|██▉ | 5151/17285 [46:18:34<109:53:11, 32.60s/it] 30%|██▉ | 5152/17285 [46:19:04<106:20:55, 31.55s/it] 30%|██▉ | 5153/17285 [46:19:36<107:42:36, 31.96s/it] 30%|██▉ | 5154/17285 [46:20:07<106:16:53, 31.54s/it] 30%|██▉ | 5155/17285 [46:20:33<100:37:59, 29.87s/it][2023-08-24 22:15:36,128] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 30%|██▉ | 5156/17285 [46:20:58<96:08:34, 28.54s/it] 30%|██▉ | 5157/17285 [46:21:26<94:57:03, 28.18s/it] 30%|██▉ | 5158/17285 [46:21:58<99:04:07, 29.41s/it] 30%|██▉ | 5159/17285 [46:22:33<104:30:29, 31.03s/it] 30%|██▉ | 5160/17285 [46:23:01<101:56:36, 30.27s/it] {'loss': 1.582, 'learning_rate': 0.00016894409423469082, 'epoch': 0.9} + 30%|██▉ | 5160/17285 [46:23:01<101:56:36, 30.27s/it] 30%|██▉ | 5161/17285 [46:23:33<103:48:43, 30.83s/it] 30%|██▉ | 5162/17285 [46:24:06<105:30:20, 31.33s/it] 30%|██▉ | 5163/17285 [46:24:38<105:51:21, 31.44s/it] 30%|██▉ | 5164/17285 [46:25:08<104:12:55, 30.95s/it] 30%|██▉ | 5165/17285 [46:25:40<105:23:00, 31.30s/it] 30%|██▉ | 5166/17285 [46:26:14<108:02:46, 32.10s/it] 30%|██▉ | 5167/17285 [46:26:48<109:57:07, 32.66s/it] 30%|██▉ | 5168/17285 [46:27:17<106:27:01, 31.63s/it] 30%|██▉ | 5169/17285 [46:27:51<109:10:20, 32.44s/it] 30%|██▉ | 5170/17285 [46:28:33<118:19:16, 35.16s/it] {'loss': 1.5863, 'learning_rate': 0.00016880538182183466, 'epoch': 0.9} + 30%|██▉ | 5170/17285 [46:28:33<118:19:16, 35.16s/it] 30%|██▉ | 5171/17285 [46:29:12<122:07:46, 36.29s/it] 30%|██▉ | 5172/17285 [46:29:37<111:29:59, 33.14s/it] 30%|██▉ | 5173/17285 [46:30:11<112:06:04, 33.32s/it] 30%|██▉ | 5174/17285 [46:30:44<112:04:07, 33.31s/it] 30%|██▉ | 5175/17285 [46:31:27<121:40:38, 36.17s/it] 30%|██▉ | 5176/17285 [46:32:05<122:51:00, 36.52s/it] 30%|██▉ | 5177/17285 [46:32:33<114:50:43, 34.15s/it] 30%|██▉ | 5178/17285 [46:33:04<111:04:44, 33.03s/it] 30%|██▉ | 5179/17285 [46:33:40<114:04:12, 33.92s/it] 30%|██▉ | 5180/17285 [46:34:05<105:08:08, 31.27s/it] {'loss': 1.5792, 'learning_rate': 0.00016866641753939926, 'epoch': 0.9} + 30%|██▉ | 5180/17285 [46:34:05<105:08:08, 31.27s/it] 30%|██▉ | 5181/17285 [46:34:34<103:05:01, 30.66s/it] 30%|██▉ | 5182/17285 [46:35:05<103:10:04, 30.69s/it] 30%|██▉ | 5183/17285 [46:35:36<104:08:46, 30.98s/it] 30%|██▉ | 5184/17285 [46:36:11<107:50:54, 32.08s/it] 30%|██▉ | 5185/17285 [46:36:50<114:29:24, 34.06s/it] 30%|███ | 5186/17285 [46:37:22<112:45:54, 33.55s/it] 30%|███ | 5187/17285 [46:37:58<115:01:35, 34.23s/it] 30%|███ | 5188/17285 [46:38:28<110:53:52, 33.00s/it] 30%|███ | 5189/17285 [46:39:06<116:12:45, 34.59s/it] 30%|███ | 5190/17285 [46:39:38<113:41:46, 33.84s/it] {'loss': 1.5481, 'learning_rate': 0.00016852720189607857, 'epoch': 0.9} + 30%|███ | 5190/17285 [46:39:38<113:41:46, 33.84s/it] 30%|███ | 5191/17285 [46:40:09<110:42:40, 32.96s/it] 30%|███ | 5192/17285 [46:40:39<107:02:34, 31.87s/it] 30%|███ | 5193/17285 [46:41:06<102:43:02, 30.58s/it] 30%|███ | 5194/17285 [46:41:37<102:41:21, 30.57s/it] 30%|███ | 5195/17285 [46:42:09<104:20:53, 31.07s/it] 30%|███ | 5196/17285 [46:42:41<104:50:29, 31.22s/it] 30%|███ | 5197/17285 [46:43:14<106:46:32, 31.80s/it] 30%|███ | 5198/17285 [46:43:39<100:37:16, 29.97s/it] 30%|███ | 5199/17285 [46:44:11<102:40:36, 30.58s/it] 30%|███ | 5200/17285 [46:44:51<111:19:39, 33.16s/it] {'loss': 1.594, 'learning_rate': 0.00016838773540148655, 'epoch': 0.9} + 30%|███ | 5200/17285 [46:44:51<111:19:39, 33.16s/it] 30%|███ | 5201/17285 [46:45:16<103:08:06, 30.73s/it] 30%|███ | 5202/17285 [46:45:54<111:04:25, 33.09s/it] 30%|███ | 5203/17285 [46:46:23<106:17:12, 31.67s/it] 30%|███ | 5204/17285 [46:46:50<101:40:27, 30.30s/it] 30%|███ | 5205/17285 [46:47:20<101:50:54, 30.35s/it] 30%|███ | 5206/17285 [46:47:52<103:24:21, 30.82s/it] 30%|███ | 5207/17285 [46:48:28<108:27:16, 32.33s/it] 30%|███ | 5208/17285 [46:49:01<109:12:43, 32.55s/it] 30%|███ | 5209/17285 [46:49:29<104:52:06, 31.26s/it] 30%|███ | 5210/17285 [46:49:59<103:34:29, 30.88s/it] {'loss': 1.5484, 'learning_rate': 0.00016824801856615547, 'epoch': 0.9} + 30%|███ | 5210/17285 [46:49:59<103:34:29, 30.88s/it] 30%|███ | 5211/17285 [46:50:32<105:26:07, 31.44s/it] 30%|███ | 5212/17285 [46:51:20<122:36:21, 36.56s/it] 30%|███ | 5213/17285 [46:51:50<115:43:12, 34.51s/it] 30%|███ | 5214/17285 [46:52:18<108:51:57, 32.47s/it] 30%|███ | 5215/17285 [46:52:49<107:02:39, 31.93s/it] 30%|███ | 5216/17285 [46:53:13<99:22:02, 29.64s/it] 30%|███ | 5217/17285 [46:53:47<103:54:00, 30.99s/it] 30%|███ | 5218/17285 [46:54:16<102:10:59, 30.48s/it] 30%|███ | 5219/17285 [46:54:50<105:17:23, 31.41s/it] 30%|███ | 5220/17285 [46:55:24<107:34:57, 32.10s/it] {'loss': 1.551, 'learning_rate': 0.00016810805190153397, 'epoch': 0.91} + 30%|███ | 5220/17285 [46:55:24<107:34:57, 32.10s/it] 30%|███ | 5221/17285 [46:55:59<111:18:04, 33.21s/it] 30%|███ | 5222/17285 [46:56:29<108:04:25, 32.25s/it] 30%|███ | 5223/17285 [46:56:56<101:57:45, 30.43s/it] 30%|███ | 5224/17285 [46:57:22<98:24:18, 29.37s/it] 30%|███ | 5225/17285 [46:58:02<108:39:19, 32.43s/it] 30%|███ | 5226/17285 [46:58:27<101:32:33, 30.31s/it] 30%|███ | 5227/17285 [46:58:55<98:20:14, 29.36s/it] 30%|███ | 5228/17285 [46:59:30<104:05:07, 31.08s/it] 30%|███ | 5229/17285 [47:00:01<104:26:05, 31.18s/it] 30%|███ | 5230/17285 [47:00:26<97:37:43, 29.15s/it] {'loss': 1.6115, 'learning_rate': 0.0001679678359199853, 'epoch': 0.91} + 30%|███ | 5230/17285 [47:00:26<97:37:43, 29.15s/it] 30%|███ | 5231/17285 [47:00:56<99:04:12, 29.59s/it] 30%|███ | 5232/17285 [47:01:21<94:50:42, 28.33s/it] 30%|███ | 5233/17285 [47:01:58<102:51:18, 30.72s/it] 30%|███ | 5234/17285 [47:02:30<104:35:06, 31.24s/it] 30%|███ | 5235/17285 [47:03:04<106:40:49, 31.87s/it] 30%|███ | 5236/17285 [47:03:29<99:44:58, 29.80s/it] 30%|███ | 5237/17285 [47:04:00<101:01:23, 30.19s/it] 30%|███ | 5238/17285 [47:04:27<98:06:53, 29.32s/it] 30%|███ | 5239/17285 [47:04:55<97:06:55, 29.02s/it] 30%|███ | 5240/17285 [47:05:21<93:26:09, 27.93s/it] {'loss': 1.5713, 'learning_rate': 0.0001678273711347852, 'epoch': 0.91} + 30%|███ | 5240/17285 [47:05:21<93:26:09, 27.93s/it] 30%|███ | 5241/17285 [47:06:02<106:59:30, 31.98s/it] 30%|███ | 5242/17285 [47:06:32<104:36:28, 31.27s/it] 30%|███ | 5243/17285 [47:07:01<102:24:40, 30.62s/it] 30%|███ | 5244/17285 [47:07:32<102:53:13, 30.76s/it][2023-08-24 23:02:40,921] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 + 30%|███ | 5245/17285 [47:08:03<103:27:36, 30.93s/it] 30%|███ | 5246/17285 [47:08:31<99:54:27, 29.88s/it] 30%|███ | 5247/17285 [47:09:04<103:37:09, 30.99s/it] 30%|███ | 5248/17285 [47:09:38<106:11:26, 31.76s/it] 30%|███ | 5249/17285 [47:10:10<106:21:59, 31.81s/it] 30%|███ | 5250/17285 [47:10:41<105:34:59, 31.58s/it] {'loss': 1.532, 'learning_rate': 0.00016770074052593968, 'epoch': 0.91} + 30%|███ | 5250/17285 [47:10:41<105:34:59, 31.58s/it] 30%|███ | 5251/17285 [47:11:08<101:33:27, 30.38s/it] 30%|███ | 5252/17285 [47:11:41<103:41:31, 31.02s/it] 30%|███ | 5253/17285 [47:12:08<100:09:43, 29.97s/it] 30%|███ | 5254/17285 [47:12:37<99:03:00, 29.64s/it] 30%|███ | 5255/17285 [47:13:05<97:09:57, 29.08s/it] 30%|███ | 5256/17285 [47:13:35<98:27:32, 29.47s/it] 30%|███ | 5257/17285 [47:14:03<96:12:12, 28.79s/it] 30%|███ | 5258/17285 [47:14:28<92:25:01, 27.66s/it] 30%|███ | 5259/17285 [47:14:58<94:59:00, 28.43s/it] 30%|███ | 5260/17285 [47:15:26<94:47:17, 28.38s/it] {'loss': 1.6103, 'learning_rate': 0.00016755980443113736, 'epoch': 0.91} + 30%|███ | 5260/17285 [47:15:26<94:47:17, 28.38s/it] 30%|███ | 5261/17285 [47:15:57<97:30:54, 29.20s/it] 30%|███ | 5262/17285 [47:16:26<97:19:51, 29.14s/it] 30%|███ | 5263/17285 [47:16:58<99:56:06, 29.93s/it] 30%|███ | 5264/17285 [47:17:25<97:22:10, 29.16s/it] 30%|███ | 5265/17285 [47:17:52<94:44:22, 28.37s/it] 30%|███ | 5266/17285 [47:18:31<105:02:22, 31.46s/it] 30%|███ | 5267/17285 [47:18:59<101:42:40, 30.47s/it] 30%|███ | 5268/17285 [47:19:26<98:18:59, 29.45s/it] 30%|███ | 5269/17285 [47:19:53<96:28:21, 28.90s/it] 30%|███ | 5270/17285 [47:20:23<97:31:12, 29.22s/it] {'loss': 1.5881, 'learning_rate': 0.00016741862102632728, 'epoch': 0.91} + 30%|███ | 5270/17285 [47:20:23<97:31:12, 29.22s/it] 30%|███ | 5271/17285 [47:20:55<99:53:46, 29.93s/it] 31%|███ | 5272/17285 [47:21:22<97:02:22, 29.08s/it] 31%|███ | 5273/17285 [47:21:50<95:57:54, 28.76s/it] 31%|███ | 5274/17285 [47:22:24<101:11:53, 30.33s/it] 31%|███ | 5275/17285 [47:22:59<105:33:31, 31.64s/it] 31%|███ | 5276/17285 [47:23:25<99:40:43, 29.88s/it] 31%|███ | 5277/17285 [47:23:58<102:53:17, 30.85s/it] 31%|███ | 5278/17285 [47:24:30<104:45:38, 31.41s/it] 31%|███ | 5279/17285 [47:25:01<104:25:15, 31.31s/it] 31%|███ | 5280/17285 [47:25:31<102:46:11, 30.82s/it] {'loss': 1.5909, 'learning_rate': 0.00016727719082832666, 'epoch': 0.92} + 31%|███ | 5280/17285 [47:25:31<102:46:11, 30.82s/it] 31%|███ | 5281/17285 [47:26:00<101:16:07, 30.37s/it] 31%|███ | 5282/17285 [47:26:30<100:26:19, 30.12s/it] 31%|███ | 5283/17285 [47:26:59<98:53:25, 29.66s/it] 31%|███ | 5284/17285 [47:27:25<95:23:23, 28.61s/it] 31%|███ | 5285/17285 [47:27:53<94:44:26, 28.42s/it] 31%|███ | 5286/17285 [47:28:27<100:42:18, 30.21s/it] 31%|███ | 5287/17285 [47:28:57<99:57:06, 29.99s/it] 31%|███ | 5288/17285 [47:29:30<103:14:04, 30.98s/it] 31%|███ | 5289/17285 [47:30:12<114:02:15, 34.22s/it] 31%|███ | 5290/17285 [47:30:38<105:49:26, 31.76s/it] {'loss': 1.5632, 'learning_rate': 0.00016713551435485608, 'epoch': 0.92} + 31%|███ | 5290/17285 [47:30:38<105:49:26, 31.76s/it] 31%|███ | 5291/17285 [47:31:09<105:05:50, 31.55s/it] 31%|███ | 5292/17285 [47:31:39<104:11:50, 31.28s/it] 31%|███ | 5293/17285 [47:32:12<105:32:12, 31.68s/it] 31%|███ | 5294/17285 [47:32:42<103:28:25, 31.07s/it] 31%|███ | 5295/17285 [47:33:25<115:43:45, 34.75s/it] 31%|███ | 5296/17285 [47:33:58<114:26:54, 34.37s/it] 31%|███ | 5297/17285 [47:34:35<116:39:15, 35.03s/it] 31%|███ | 5298/17285 [47:35:06<112:54:23, 33.91s/it] 31%|███ | 5299/17285 [47:35:33<105:49:15, 31.78s/it] 31%|███ | 5300/17285 [47:36:00<101:12:24, 30.40s/it] {'loss': 1.5794, 'learning_rate': 0.0001669935921245377, 'epoch': 0.92} + 31%|███ | 5300/17285 [47:36:00<101:12:24, 30.40s/it] 31%|███ | 5301/17285 [47:36:34<104:01:56, 31.25s/it] 31%|███ | 5302/17285 [47:37:03<102:23:41, 30.76s/it] 31%|███ | 5303/17285 [47:37:37<105:20:44, 31.65s/it] 31%|███ | 5304/17285 [47:38:07<104:06:33, 31.28s/it] 31%|███ | 5305/17285 [47:38:40<105:57:21, 31.84s/it] 31%|███ | 5306/17285 [47:39:12<105:38:20, 31.75s/it] 31%|███ | 5307/17285 [47:39:40<102:07:10, 30.69s/it] 31%|███ | 5308/17285 [47:40:13<103:58:47, 31.25s/it] 31%|███ | 5309/17285 [47:40:52<111:48:15, 33.61s/it] 31%|███ | 5310/17285 [47:41:18<103:56:59, 31.25s/it] {'loss': 1.5779, 'learning_rate': 0.00016685142465689326, 'epoch': 0.92} + 31%|███ | 5310/17285 [47:41:18<103:56:59, 31.25s/it] 31%|███ | 5311/17285 [47:41:50<104:56:23, 31.55s/it] 31%|███ | 5312/17285 [47:42:23<106:07:11, 31.91s/it] 31%|███ | 5313/17285 [47:42:50<101:35:03, 30.55s/it] 31%|███ | 5314/17285 [47:43:22<103:07:17, 31.01s/it] 31%|███ | 5315/17285 [47:43:52<101:54:39, 30.65s/it] 31%|███ | 5316/17285 [47:44:21<100:20:56, 30.18s/it] 31%|███ | 5317/17285 [47:44:57<106:06:33, 31.92s/it] 31%|███ | 5318/17285 [47:45:26<103:05:32, 31.01s/it] 31%|███ | 5319/17285 [47:45:57<102:55:24, 30.96s/it] 31%|███ | 5320/17285 [47:46:29<103:48:19, 31.23s/it] {'loss': 1.615, 'learning_rate': 0.00016670901247234224, 'epoch': 0.92} + 31%|███ | 5320/17285 [47:46:29<103:48:19, 31.23s/it] 31%|███ | 5321/17285 [47:47:03<106:37:55, 32.09s/it] 31%|███ | 5322/17285 [47:47:33<104:52:28, 31.56s/it] 31%|███ | 5323/17285 [47:48:07<107:24:37, 32.33s/it] 31%|███ | 5324/17285 [47:48:42<110:19:45, 33.21s/it] 31%|███ | 5325/17285 [47:49:07<101:56:28, 30.68s/it] 31%|███ | 5326/17285 [47:49:38<102:06:26, 30.74s/it] 31%|███ | 5327/17285 [47:50:13<105:51:27, 31.87s/it] 31%|███ | 5328/17285 [47:50:48<109:40:13, 33.02s/it] 31%|███ | 5329/17285 [47:51:19<106:59:03, 32.21s/it] 31%|███ | 5330/17285 [47:51:52<108:12:41, 32.59s/it] {'loss': 1.6188, 'learning_rate': 0.0001665663560921999, 'epoch': 0.93} + 31%|███ | 5330/17285 [47:51:52<108:12:41, 32.59s/it] 31%|███ | 5331/17285 [47:52:26<109:49:55, 33.08s/it] 31%|███ | 5332/17285 [47:52:54<104:43:20, 31.54s/it] 31%|███ | 5333/17285 [47:53:30<109:01:11, 32.84s/it] 31%|███ | 5334/17285 [47:54:00<106:13:54, 32.00s/it] 31%|███ | 5335/17285 [47:54:32<106:28:09, 32.07s/it] 31%|███ | 5336/17285 [47:55:05<107:07:31, 32.27s/it] 31%|███ | 5337/17285 [47:55:41<110:33:07, 33.31s/it] 31%|███ | 5338/17285 [47:56:06<102:17:42, 30.82s/it] 31%|███ | 5339/17285 [47:56:30<95:20:35, 28.73s/it] 31%|███ | 5340/17285 [47:57:06<103:21:04, 31.15s/it] {'loss': 1.5351, 'learning_rate': 0.00016642345603867545, 'epoch': 0.93} + 31%|███ | 5340/17285 [47:57:06<103:21:04, 31.15s/it] 31%|███ | 5341/17285 [47:57:39<104:12:19, 31.41s/it] 31%|███ | 5342/17285 [47:58:12<106:25:39, 32.08s/it] 31%|███ | 5343/17285 [47:58:44<105:41:31, 31.86s/it] 31%|███ | 5344/17285 [47:59:13<103:26:12, 31.18s/it] 31%|███ | 5345/17285 [47:59:45<104:26:03, 31.49s/it] 31%|███ | 5346/17285 [48:00:16<103:56:01, 31.34s/it] 31%|███ | 5347/17285 [48:00:49<105:10:34, 31.72s/it] 31%|███ | 5348/17285 [48:01:19<103:32:44, 31.23s/it] 31%|███ | 5349/17285 [48:01:54<107:21:13, 32.38s/it] 31%|███ | 5350/17285 [48:02:26<106:57:17, 32.26s/it] {'loss': 1.6056, 'learning_rate': 0.00016628031283487006, 'epoch': 0.93} + 31%|███ | 5350/17285 [48:02:26<106:57:17, 32.26s/it] 31%|███ | 5351/17285 [48:02:51<99:32:08, 30.03s/it] 31%|███ | 5352/17285 [48:03:19<97:12:51, 29.33s/it] 31%|███ | 5353/17285 [48:03:53<102:16:01, 30.85s/it] 31%|███ | 5354/17285 [48:04:21<99:54:48, 30.15s/it] 31%|███ | 5355/17285 [48:04:55<102:59:43, 31.08s/it] 31%|███ | 5356/17285 [48:05:30<107:34:21, 32.46s/it] 31%|███ | 5357/17285 [48:06:06<110:31:49, 33.36s/it] 31%|███ | 5358/17285 [48:06:34<105:47:58, 31.93s/it] 31%|███ | 5359/17285 [48:07:07<106:26:08, 32.13s/it] 31%|███ | 5360/17285 [48:07:41<108:21:05, 32.71s/it] {'loss': 1.6137, 'learning_rate': 0.00016613692700477494, 'epoch': 0.93} + 31%|███ | 5360/17285 [48:07:41<108:21:05, 32.71s/it] 31%|███ | 5361/17285 [48:08:07<101:54:41, 30.77s/it] 31%|███ | 5362/17285 [48:08:49<112:43:18, 34.03s/it] 31%|███ | 5363/17285 [48:09:21<110:15:59, 33.30s/it] 31%|███ | 5364/17285 [48:09:51<107:06:29, 32.35s/it] 31%|███ | 5365/17285 [48:10:20<104:00:01, 31.41s/it] 31%|███ | 5366/17285 [48:10:57<109:36:21, 33.11s/it] 31%|███ | 5367/17285 [48:11:28<107:49:50, 32.57s/it] 31%|███ | 5368/17285 [48:12:04<110:37:44, 33.42s/it] 31%|███ | 5369/17285 [48:12:37<110:20:17, 33.33s/it] 31%|███ | 5370/17285 [48:13:06<106:17:21, 32.11s/it] {'loss': 1.5894, 'learning_rate': 0.0001659932990732696, 'epoch': 0.93} + 31%|███ | 5370/17285 [48:13:06<106:17:21, 32.11s/it] 31%|███ | 5371/17285 [48:13:36<103:41:27, 31.33s/it] 31%|███ | 5372/17285 [48:14:03<99:23:06, 30.03s/it] 31%|███ | 5373/17285 [48:14:32<98:29:21, 29.77s/it] 31%|███ | 5374/17285 [48:15:05<101:32:06, 30.69s/it] 31%|███ | 5375/17285 [48:15:34<100:36:55, 30.41s/it] 31%|███ | 5376/17285 [48:16:11<106:31:32, 32.20s/it] 31%|███ | 5377/17285 [48:16:48<111:44:39, 33.78s/it] 31%|███ | 5378/17285 [48:17:17<106:54:11, 32.32s/it] 31%|███ | 5379/17285 [48:17:49<106:12:12, 32.11s/it] 31%|███ | 5380/17285 [48:18:22<107:38:46, 32.55s/it] {'loss': 1.6144, 'learning_rate': 0.00016584942956611963, 'epoch': 0.93} + 31%|███ | 5380/17285 [48:18:22<107:38:46, 32.55s/it] 31%|███ | 5381/17285 [48:18:55<107:23:48, 32.48s/it] 31%|███ | 5382/17285 [48:19:20<99:53:01, 30.21s/it] 31%|███ | 5383/17285 [48:19:49<99:10:24, 30.00s/it] 31%|███ | 5384/17285 [48:20:20<100:11:20, 30.31s/it] 31%|███ | 5385/17285 [48:20:48<98:04:37, 29.67s/it] 31%|███ | 5386/17285 [48:21:14<94:09:27, 28.49s/it] 31%|███ | 5387/17285 [48:21:44<95:52:58, 29.01s/it] 31%|███ | 5388/17285 [48:22:15<97:07:36, 29.39s/it] 31%|███ | 5389/17285 [48:22:49<102:04:48, 30.89s/it] 31%|███ | 5390/17285 [48:23:17<99:46:38, 30.20s/it] {'loss': 1.5606, 'learning_rate': 0.00016570531900997497, 'epoch': 0.94} + 31%|███ | 5390/17285 [48:23:17<99:46:38, 30.20s/it] 31%|███ | 5391/17285 [48:23:50<102:09:06, 30.92s/it] 31%|███ | 5392/17285 [48:24:26<107:22:31, 32.50s/it] 31%|███ | 5393/17285 [48:24:58<106:39:34, 32.29s/it] 31%|███ | 5394/17285 [48:25:29<104:57:35, 31.78s/it] 31%|███ | 5395/17285 [48:25:56<100:37:35, 30.47s/it] 31%|███ | 5396/17285 [48:26:37<110:42:34, 33.52s/it] 31%|███ | 5397/17285 [48:27:12<112:49:19, 34.17s/it] 31%|███ | 5398/17285 [48:27:48<114:23:42, 34.64s/it] 31%|███ | 5399/17285 [48:28:17<108:34:15, 32.88s/it] 31%|███ | 5400/17285 [48:28:47<106:09:19, 32.15s/it] {'loss': 1.5671, 'learning_rate': 0.00016556096793236805, 'epoch': 0.94} + 31%|███ | 5400/17285 [48:28:47<106:09:19, 32.15s/it] 31%|███ | 5401/17285 [48:29:23<109:19:49, 33.12s/it] 31%|███▏ | 5402/17285 [48:29:48<101:46:37, 30.83s/it] 31%|███▏ | 5403/17285 [48:30:32<114:39:39, 34.74s/it] 31%|███▏ | 5404/17285 [48:31:00<108:11:20, 32.78s/it] 31%|███▏ | 5405/17285 [48:31:32<107:25:45, 32.55s/it] 31%|███▏ | 5406/17285 [48:32:04<106:37:02, 32.31s/it] 31%|███▏ | 5407/17285 [48:32:33<103:18:38, 31.31s/it] 31%|███▏ | 5408/17285 [48:33:07<105:44:31, 32.05s/it] 31%|███▏ | 5409/17285 [48:33:38<104:46:56, 31.76s/it] 31%|███▏ | 5410/17285 [48:34:03<97:51:43, 29.67s/it] {'loss': 1.5839, 'learning_rate': 0.00016541637686171167, 'epoch': 0.94} + 31%|███▏ | 5410/17285 [48:34:03<97:51:43, 29.67s/it] 31%|███▏ | 5411/17285 [48:34:37<102:27:59, 31.07s/it] 31%|███▏ | 5412/17285 [48:35:14<108:10:56, 32.80s/it] 31%|███▏ | 5413/17285 [48:35:44<105:55:30, 32.12s/it] 31%|███▏ | 5414/17285 [48:36:21<110:33:24, 33.53s/it] 31%|███▏ | 5415/17285 [48:36:52<108:02:35, 32.77s/it] 31%|███▏ | 5416/17285 [48:37:18<100:41:39, 30.54s/it] 31%|███▏ | 5417/17285 [48:37:49<101:27:04, 30.77s/it] 31%|███▏ | 5418/17285 [48:38:15<96:42:22, 29.34s/it] 31%|███▏ | 5419/17285 [48:38:54<105:58:01, 32.15s/it] 31%|███▏ | 5420/17285 [48:39:32<112:24:54, 34.11s/it] {'loss': 1.5991, 'learning_rate': 0.00016527154632729713, 'epoch': 0.94} + 31%|███▏ | 5420/17285 [48:39:32<112:24:54, 34.11s/it] 31%|███▏ | 5421/17285 [48:40:00<106:05:17, 32.19s/it] 31%|███▏ | 5422/17285 [48:40:32<105:42:44, 32.08s/it] 31%|███▏ | 5423/17285 [48:41:07<108:43:41, 33.00s/it] 31%|███▏ | 5424/17285 [48:41:51<119:55:15, 36.40s/it] 31%|███▏ | 5425/17285 [48:42:19<111:29:01, 33.84s/it] 31%|███▏ | 5426/17285 [48:42:51<109:14:57, 33.16s/it] 31%|███▏ | 5427/17285 [48:43:20<105:18:31, 31.97s/it] 31%|███▏ | 5428/17285 [48:43:51<103:57:07, 31.56s/it] 31%|███▏ | 5429/17285 [48:44:22<103:43:27, 31.50s/it] 31%|███▏ | 5430/17285 [48:44:50<100:13:42, 30.44s/it] {'loss': 1.5912, 'learning_rate': 0.00016512647685929235, 'epoch': 0.94} + 31%|███▏ | 5430/17285 [48:44:50<100:13:42, 30.44s/it] 31%|███▏ | 5431/17285 [48:45:18<98:05:35, 29.79s/it] 31%|███▏ | 5432/17285 [48:45:44<94:18:01, 28.64s/it] 31%|███▏ | 5433/17285 [48:46:19<100:23:15, 30.49s/it] 31%|███▏ | 5434/17285 [48:46:47<97:35:00, 29.64s/it] 31%|███▏ | 5435/17285 [48:47:21<101:56:04, 30.97s/it] 31%|███▏ | 5436/17285 [48:47:49<98:57:28, 30.07s/it] 31%|███▏ | 5437/17285 [48:48:23<103:29:14, 31.44s/it] 31%|███▏ | 5438/17285 [48:48:53<101:55:25, 30.97s/it] 31%|███▏ | 5439/17285 [48:49:32<109:46:40, 33.36s/it] 31%|███▏ | 5440/17285 [48:49:58<102:40:35, 31.21s/it] {'loss': 1.578, 'learning_rate': 0.0001649811689887399, 'epoch': 0.94} + 31%|███▏ | 5440/17285 [48:49:58<102:40:35, 31.21s/it] 31%|███▏ | 5441/17285 [48:50:23<96:06:34, 29.21s/it] 31%|███▏ | 5442/17285 [48:50:50<93:57:36, 28.56s/it] 31%|███▏ | 5443/17285 [48:51:29<104:24:42, 31.74s/it] 31%|███▏ | 5444/17285 [48:51:59<103:09:24, 31.36s/it] 32%|███▏ | 5445/17285 [48:52:34<106:27:42, 32.37s/it] 32%|███▏ | 5446/17285 [48:53:20<119:49:50, 36.44s/it] 32%|███▏ | 5447/17285 [48:53:46<109:19:40, 33.25s/it] 32%|███▏ | 5448/17285 [48:54:17<107:05:21, 32.57s/it] 32%|███▏ | 5449/17285 [48:54:50<107:08:09, 32.59s/it] 32%|███▏ | 5450/17285 [48:55:16<101:19:04, 30.82s/it] {'loss': 1.5861, 'learning_rate': 0.00016483562324755502, 'epoch': 0.95} + 32%|███▏ | 5450/17285 [48:55:16<101:19:04, 30.82s/it] 32%|███▏ | 5451/17285 [48:55:45<99:20:35, 30.22s/it] 32%|███▏ | 5452/17285 [48:56:16<100:10:30, 30.48s/it] 32%|███▏ | 5453/17285 [48:56:52<105:18:32, 32.04s/it] 32%|███▏ | 5454/17285 [48:57:19<100:56:00, 30.71s/it] 32%|███▏ | 5455/17285 [48:57:56<106:30:10, 32.41s/it] 32%|███▏ | 5456/17285 [48:58:29<107:41:57, 32.78s/it] 32%|███▏ | 5457/17285 [48:58:55<100:29:20, 30.59s/it] 32%|███▏ | 5458/17285 [48:59:27<101:35:29, 30.92s/it] 32%|███▏ | 5459/17285 [49:00:02<105:56:26, 32.25s/it] 32%|███▏ | 5460/17285 [49:00:28<99:41:03, 30.35s/it] {'loss': 1.5587, 'learning_rate': 0.00016468984016852374, 'epoch': 0.95} + 32%|███▏ | 5460/17285 [49:00:28<99:41:03, 30.35s/it] 32%|███▏ | 5461/17285 [49:01:02<102:55:27, 31.34s/it] 32%|███▏ | 5462/17285 [49:01:26<95:55:49, 29.21s/it] 32%|███▏ | 5463/17285 [49:01:57<98:20:08, 29.94s/it] 32%|███▏ | 5464/17285 [49:02:28<98:54:53, 30.12s/it] 32%|███▏ | 5465/17285 [49:03:10<110:37:34, 33.69s/it] 32%|███▏ | 5466/17285 [49:03:49<116:06:13, 35.36s/it] 32%|███▏ | 5467/17285 [49:04:21<112:23:02, 34.23s/it] 32%|███▏ | 5468/17285 [49:04:50<107:00:36, 32.60s/it] 32%|███▏ | 5469/17285 [49:05:25<109:55:31, 33.49s/it] 32%|███▏ | 5470/17285 [49:05:57<108:31:57, 33.07s/it] {'loss': 1.5391, 'learning_rate': 0.0001645438202853008, 'epoch': 0.95} + 32%|███▏ | 5470/17285 [49:05:57<108:31:57, 33.07s/it] 32%|███▏ | 5471/17285 [49:06:30<108:12:30, 32.97s/it] 32%|███▏ | 5472/17285 [49:07:03<107:50:52, 32.87s/it] 32%|███▏ | 5473/17285 [49:07:38<110:11:48, 33.59s/it] 32%|███▏ | 5474/17285 [49:08:08<107:08:15, 32.66s/it] 32%|███▏ | 5475/17285 [49:08:34<100:25:44, 30.61s/it] 32%|███▏ | 5476/17285 [49:09:04<100:02:06, 30.50s/it] 32%|███▏ | 5477/17285 [49:09:39<103:52:51, 31.67s/it] 32%|███▏ | 5478/17285 [49:10:08<101:35:46, 30.98s/it] 32%|███▏ | 5479/17285 [49:10:36<98:28:57, 30.03s/it] 32%|███▏ | 5480/17285 [49:11:17<109:12:17, 33.30s/it] {'loss': 1.5762, 'learning_rate': 0.00016439756413240793, 'epoch': 0.95} + 32%|███▏ | 5480/17285 [49:11:17<109:12:17, 33.30s/it] 32%|███▏ | 5481/17285 [49:11:49<107:27:00, 32.77s/it] 32%|███▏ | 5482/17285 [49:12:18<104:19:28, 31.82s/it] 32%|███▏ | 5483/17285 [49:12:46<100:24:04, 30.63s/it] 32%|███▏ | 5484/17285 [49:13:19<102:29:28, 31.27s/it] 32%|███▏ | 5485/17285 [49:13:49<101:29:43, 30.96s/it] 32%|███▏ | 5486/17285 [49:14:21<102:21:15, 31.23s/it] 32%|███▏ | 5487/17285 [49:14:51<101:02:27, 30.83s/it] 32%|███▏ | 5488/17285 [49:15:21<100:40:25, 30.72s/it] 32%|███▏ | 5489/17285 [49:15:54<103:11:04, 31.49s/it] 32%|███▏ | 5490/17285 [49:16:19<96:15:56, 29.38s/it] {'loss': 1.6125, 'learning_rate': 0.00016425107224523168, 'epoch': 0.95} + 32%|███▏ | 5490/17285 [49:16:19<96:15:56, 29.38s/it] 32%|███▏ | 5491/17285 [49:16:52<100:12:44, 30.59s/it] 32%|███▏ | 5492/17285 [49:17:18<95:20:45, 29.11s/it] [2023-08-25 01:12:22,815] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 32%|███▏ | 5493/17285 [49:17:45<93:23:40, 28.51s/it] 32%|███▏ | 5494/17285 [49:18:12<92:06:26, 28.12s/it] 32%|███▏ | 5495/17285 [49:18:41<92:25:58, 28.22s/it] 32%|███▏ | 5496/17285 [49:19:08<91:41:55, 28.00s/it] 32%|███▏ | 5497/17285 [49:19:43<98:01:27, 29.94s/it] 32%|███▏ | 5498/17285 [49:20:12<97:10:06, 29.68s/it] 32%|███▏ | 5499/17285 [49:20:39<94:21:01, 28.82s/it] 32%|███▏ | 5500/17285 [49:21:09<95:57:05, 29.31s/it] {'loss': 1.5776, 'learning_rate': 0.0001641190284371531, 'epoch': 0.95} + 32%|███▏ | 5500/17285 [49:21:09<95:57:05, 29.31s/it] 32%|███▏ | 5501/17285 [49:21:42<99:29:43, 30.40s/it] 32%|███▏ | 5502/17285 [49:22:17<104:15:37, 31.85s/it] 32%|███▏ | 5503/17285 [49:22:54<108:42:03, 33.21s/it] 32%|███▏ | 5504/17285 [49:23:24<106:08:36, 32.43s/it] 32%|███▏ | 5505/17285 [49:23:51<100:49:55, 30.81s/it] 32%|███▏ | 5506/17285 [49:24:20<98:23:13, 30.07s/it] 32%|███▏ | 5507/17285 [49:24:51<99:45:06, 30.49s/it] 32%|███▏ | 5508/17285 [49:25:22<99:50:39, 30.52s/it] 32%|███▏ | 5509/17285 [49:25:52<99:26:54, 30.40s/it] 32%|███▏ | 5510/17285 [49:26:19<95:48:18, 29.29s/it] {'loss': 1.5469, 'learning_rate': 0.00016397209013291726, 'epoch': 0.96} + 32%|███▏ | 5510/17285 [49:26:19<95:48:18, 29.29s/it] 32%|███▏ | 5511/17285 [49:26:49<96:51:10, 29.61s/it] 32%|███▏ | 5512/17285 [49:27:23<101:26:52, 31.02s/it] 32%|███▏ | 5513/17285 [49:28:04<110:34:03, 33.81s/it] 32%|███▏ | 5514/17285 [49:28:42<115:27:25, 35.31s/it] 32%|███▏ | 5515/17285 [49:29:10<108:22:02, 33.15s/it] 32%|███▏ | 5516/17285 [49:29:46<110:58:04, 33.94s/it] 32%|███▏ | 5517/17285 [49:30:17<107:43:03, 32.95s/it] 32%|███▏ | 5518/17285 [49:30:59<116:53:32, 35.76s/it] 32%|███▏ | 5519/17285 [49:31:29<110:42:11, 33.87s/it] 32%|███▏ | 5520/17285 [49:32:03<110:54:11, 33.94s/it] {'loss': 1.5636, 'learning_rate': 0.00016382491765189186, 'epoch': 0.96} + 32%|███▏ | 5520/17285 [49:32:03<110:54:11, 33.94s/it] 32%|███▏ | 5521/17285 [49:32:33<106:56:37, 32.73s/it] 32%|███▏ | 5522/17285 [49:33:14<115:07:35, 35.23s/it] 32%|███▏ | 5523/17285 [49:33:46<112:11:15, 34.34s/it] 32%|███▏ | 5524/17285 [49:34:14<106:13:57, 32.52s/it] 32%|███▏ | 5525/17285 [49:34:47<106:14:03, 32.52s/it] 32%|███▏ | 5526/17285 [49:35:13<100:19:21, 30.71s/it] 32%|███▏ | 5527/17285 [49:35:44<99:58:08, 30.61s/it] 32%|███▏ | 5528/17285 [49:36:19<104:25:22, 31.97s/it] 32%|███▏ | 5529/17285 [49:36:48<101:43:29, 31.15s/it] 32%|███▏ | 5530/17285 [49:37:25<106:56:51, 32.75s/it] {'loss': 1.5732, 'learning_rate': 0.00016367751153281774, 'epoch': 0.96} + 32%|███▏ | 5530/17285 [49:37:25<106:56:51, 32.75s/it] 32%|███▏ | 5531/17285 [49:37:58<108:05:21, 33.11s/it] 32%|███▏ | 5532/17285 [49:38:28<104:47:32, 32.10s/it] 32%|███▏ | 5533/17285 [49:39:00<104:52:00, 32.12s/it] 32%|███▏ | 5534/17285 [49:39:32<104:00:00, 31.86s/it] 32%|███▏ | 5535/17285 [49:40:00<100:27:10, 30.78s/it] 32%|███▏ | 5536/17285 [49:40:35<105:08:38, 32.22s/it] 32%|███▏ | 5537/17285 [49:41:07<104:53:28, 32.14s/it] 32%|███▏ | 5538/17285 [49:41:43<108:30:12, 33.25s/it] 32%|███▏ | 5539/17285 [49:42:19<110:53:49, 33.99s/it] 32%|███▏ | 5540/17285 [49:42:47<105:09:24, 32.23s/it] {'loss': 1.547, 'learning_rate': 0.00016352987231529103, 'epoch': 0.96} + 32%|███▏ | 5540/17285 [49:42:47<105:09:24, 32.23s/it] 32%|███▏ | 5541/17285 [49:43:14<99:37:31, 30.54s/it] 32%|███▏ | 5542/17285 [49:43:41<96:22:12, 29.54s/it] 32%|███▏ | 5543/17285 [49:44:10<96:09:28, 29.48s/it] 32%|███▏ | 5544/17285 [49:44:46<102:11:46, 31.34s/it] 32%|███▏ | 5545/17285 [49:45:25<109:31:11, 33.58s/it] 32%|███▏ | 5546/17285 [49:45:56<107:15:48, 32.89s/it] 32%|███▏ | 5547/17285 [49:46:29<107:25:40, 32.95s/it] 32%|███▏ | 5548/17285 [49:46:59<104:19:50, 32.00s/it] 32%|███▏ | 5549/17285 [49:47:30<103:22:10, 31.71s/it] 32%|███▏ | 5550/17285 [49:48:00<101:29:09, 31.13s/it] {'loss': 1.6132, 'learning_rate': 0.00016338200053976108, 'epoch': 0.96} + 32%|███▏ | 5550/17285 [49:48:00<101:29:09, 31.13s/it] 32%|███▏ | 5551/17285 [49:48:26<96:29:11, 29.60s/it] 32%|███▏ | 5552/17285 [49:49:01<101:49:26, 31.24s/it] 32%|███▏ | 5553/17285 [49:49:30<99:46:33, 30.62s/it] 32%|███▏ | 5554/17285 [49:49:56<95:31:28, 29.31s/it] 32%|███▏ | 5555/17285 [49:50:32<101:56:17, 31.29s/it] 32%|███▏ | 5556/17285 [49:51:10<108:28:02, 33.29s/it] 32%|███▏ | 5557/17285 [49:51:42<106:45:18, 32.77s/it] 32%|███▏ | 5558/17285 [49:52:07<99:29:47, 30.54s/it] 32%|███▏ | 5559/17285 [49:52:46<108:01:20, 33.16s/it] 32%|███▏ | 5560/17285 [49:53:25<113:13:21, 34.76s/it] {'loss': 1.5797, 'learning_rate': 0.00016323389674752868, 'epoch': 0.96} + 32%|███▏ | 5560/17285 [49:53:25<113:13:21, 34.76s/it] 32%|███▏ | 5561/17285 [49:53:55<108:38:55, 33.36s/it] 32%|███▏ | 5562/17285 [49:54:26<106:28:08, 32.70s/it] 32%|███▏ | 5563/17285 [49:55:01<108:43:55, 33.39s/it] 32%|███▏ | 5564/17285 [49:55:32<106:01:26, 32.56s/it] 32%|███▏ | 5565/17285 [49:55:57<99:18:14, 30.50s/it] 32%|███▏ | 5566/17285 [49:56:31<102:43:06, 31.55s/it] 32%|███▏ | 5567/17285 [49:57:00<99:30:14, 30.57s/it] 32%|███▏ | 5568/17285 [49:57:29<97:52:28, 30.07s/it] 32%|███▏ | 5569/17285 [49:58:05<103:39:53, 31.85s/it] 32%|███▏ | 5570/17285 [49:58:38<105:07:42, 32.31s/it] {'loss': 1.5448, 'learning_rate': 0.00016308556148074378, 'epoch': 0.97} + 32%|███▏ | 5570/17285 [49:58:38<105:07:42, 32.31s/it] 32%|███▏ | 5571/17285 [49:59:08<102:52:22, 31.62s/it] 32%|███▏ | 5572/17285 [49:59:36<99:30:29, 30.58s/it] 32%|███▏ | 5573/17285 [50:00:08<100:31:55, 30.90s/it] 32%|███▏ | 5574/17285 [50:00:34<95:43:01, 29.42s/it] 32%|███▏ | 5575/17285 [50:01:10<102:09:13, 31.41s/it] 32%|███▏ | 5576/17285 [50:01:43<104:17:40, 32.07s/it] 32%|███▏ | 5577/17285 [50:02:23<111:46:47, 34.37s/it] 32%|███▏ | 5578/17285 [50:02:55<109:46:49, 33.76s/it] 32%|███▏ | 5579/17285 [50:03:34<114:23:40, 35.18s/it] 32%|███▏ | 5580/17285 [50:04:06<111:42:05, 34.35s/it] {'loss': 1.5573, 'learning_rate': 0.00016293699528240386, 'epoch': 0.97} + 32%|███▏ | 5580/17285 [50:04:06<111:42:05, 34.35s/it] 32%|███▏ | 5581/17285 [50:04:44<115:20:22, 35.48s/it] 32%|███▏ | 5582/17285 [50:05:19<114:06:23, 35.10s/it] 32%|███▏ | 5583/17285 [50:05:54<114:41:47, 35.29s/it] 32%|███▏ | 5584/17285 [50:06:28<113:13:22, 34.83s/it] 32%|███▏ | 5585/17285 [50:07:02<112:32:10, 34.63s/it] 32%|███▏ | 5586/17285 [50:07:31<106:53:54, 32.89s/it] 32%|███▏ | 5587/17285 [50:08:09<111:49:38, 34.41s/it] 32%|███▏ | 5588/17285 [50:08:41<109:42:17, 33.76s/it] 32%|███▏ | 5589/17285 [50:09:11<105:57:48, 32.62s/it] 32%|███▏ | 5590/17285 [50:09:43<105:25:01, 32.45s/it] {'loss': 1.5472, 'learning_rate': 0.0001627881986963515, 'epoch': 0.97} + 32%|███▏ | 5590/17285 [50:09:43<105:25:01, 32.45s/it] 32%|███▏ | 5591/17285 [50:10:15<104:45:42, 32.25s/it] 32%|███▏ | 5592/17285 [50:10:40<97:36:06, 30.05s/it] 32%|███▏ | 5593/17285 [50:11:19<105:53:59, 32.61s/it] 32%|███▏ | 5594/17285 [50:11:55<109:26:26, 33.70s/it] 32%|███▏ | 5595/17285 [50:12:30<110:34:55, 34.05s/it] 32%|███▏ | 5596/17285 [50:13:00<106:55:33, 32.93s/it] 32%|███▏ | 5597/17285 [50:13:32<105:37:52, 32.54s/it] 32%|███▏ | 5598/17285 [50:14:05<106:44:39, 32.88s/it] 32%|███▏ | 5599/17285 [50:14:41<109:22:45, 33.70s/it] 32%|███▏ | 5600/17285 [50:15:10<105:14:12, 32.42s/it] {'loss': 1.5518, 'learning_rate': 0.00016263917226727286, 'epoch': 0.97} + 32%|███▏ | 5600/17285 [50:15:10<105:14:12, 32.42s/it] 32%|███▏ | 5601/17285 [50:15:41<103:24:28, 31.86s/it] 32%|███▏ | 5602/17285 [50:16:05<95:54:52, 29.56s/it] 32%|███▏ | 5603/17285 [50:16:39<99:51:33, 30.77s/it] 32%|███▏ | 5604/17285 [50:17:06<96:46:22, 29.82s/it] 32%|███▏ | 5605/17285 [50:17:40<100:23:00, 30.94s/it] 32%|███▏ | 5606/17285 [50:18:14<103:51:11, 32.01s/it] 32%|███▏ | 5607/17285 [50:18:40<97:58:45, 30.20s/it] 32%|███▏ | 5608/17285 [50:19:15<102:16:23, 31.53s/it] 32%|███▏ | 5609/17285 [50:19:40<95:44:55, 29.52s/it] 32%|███▏ | 5610/17285 [50:20:18<104:07:55, 32.11s/it] {'loss': 1.5813, 'learning_rate': 0.0001624899165406954, 'epoch': 0.97} + 32%|███▏ | 5610/17285 [50:20:18<104:07:55, 32.11s/it] 32%|███▏ | 5611/17285 [50:20:48<101:46:27, 31.38s/it] 32%|███▏ | 5612/17285 [50:21:13<95:50:38, 29.56s/it] 32%|███▏ | 5613/17285 [50:21:44<97:18:29, 30.01s/it] 32%|███▏ | 5614/17285 [50:22:17<99:43:54, 30.76s/it] 32%|███▏ | 5615/17285 [50:22:55<107:21:22, 33.12s/it] 32%|███▏ | 5616/17285 [50:23:25<104:24:07, 32.21s/it] 32%|███▏ | 5617/17285 [50:23:51<98:02:37, 30.25s/it] 33%|███▎ | 5618/17285 [50:24:20<97:19:35, 30.03s/it] 33%|███▎ | 5619/17285 [50:25:03<109:32:27, 33.80s/it] 33%|███▎ | 5620/17285 [50:25:33<105:48:28, 32.65s/it] {'loss': 1.5408, 'learning_rate': 0.00016234043206298586, 'epoch': 0.98} + 33%|███▎ | 5620/17285 [50:25:33<105:48:28, 32.65s/it] 33%|███▎ | 5621/17285 [50:26:07<107:21:32, 33.14s/it] 33%|███▎ | 5622/17285 [50:26:38<105:23:04, 32.53s/it] 33%|███▎ | 5623/17285 [50:27:05<99:25:04, 30.69s/it] 33%|███▎ | 5624/17285 [50:27:39<103:08:45, 31.84s/it] 33%|███▎ | 5625/17285 [50:28:14<106:02:36, 32.74s/it] 33%|███▎ | 5626/17285 [50:28:51<109:50:27, 33.92s/it] 33%|███▎ | 5627/17285 [50:29:16<101:27:21, 31.33s/it] 33%|███▎ | 5628/17285 [50:29:42<96:14:20, 29.72s/it] 33%|███▎ | 5629/17285 [50:30:15<99:37:02, 30.77s/it] 33%|███▎ | 5630/17285 [50:30:47<100:51:02, 31.15s/it] {'loss': 1.5346, 'learning_rate': 0.00016219071938134845, 'epoch': 0.98} + 33%|███▎ | 5630/17285 [50:30:47<100:51:02, 31.15s/it] 33%|███▎ | 5631/17285 [50:31:16<98:07:10, 30.31s/it] 33%|███▎ | 5632/17285 [50:31:53<104:55:58, 32.42s/it] 33%|███▎ | 5633/17285 [50:32:32<111:25:06, 34.42s/it] 33%|███▎ | 5634/17285 [50:33:02<107:06:53, 33.10s/it] 33%|███▎ | 5635/17285 [50:33:30<101:35:45, 31.39s/it] 33%|███▎ | 5636/17285 [50:33:58<99:05:04, 30.62s/it] [2023-08-25 02:29:05,502] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 33%|███▎ | 5637/17285 [50:34:28<97:53:46, 30.26s/it] 33%|███▎ | 5638/17285 [50:34:54<94:10:22, 29.11s/it] 33%|███▎ | 5639/17285 [50:35:31<101:32:23, 31.39s/it] 33%|███▎ | 5640/17285 [50:36:00<99:09:14, 30.65s/it] {'loss': 1.5452, 'learning_rate': 0.0001620557833064539, 'epoch': 0.98} + 33%|███▎ | 5640/17285 [50:36:00<99:09:14, 30.65s/it] 33%|███▎ | 5641/17285 [50:36:28<96:13:19, 29.75s/it] 33%|███▎ | 5642/17285 [50:37:16<113:58:27, 35.24s/it] 33%|███▎ | 5643/17285 [50:37:45<108:34:18, 33.57s/it] 33%|███▎ | 5644/17285 [50:38:16<105:28:28, 32.62s/it][2023-08-25 02:33:21,965] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 33%|███▎ | 5645/17285 [50:38:44<101:35:16, 31.42s/it] 33%|███▎ | 5646/17285 [50:39:15<101:12:21, 31.30s/it] 33%|███▎ | 5647/17285 [50:39:55<108:54:06, 33.69s/it] 33%|███▎ | 5648/17285 [50:40:26<106:36:22, 32.98s/it] 33%|███▎ | 5649/17285 [50:40:57<104:34:08, 32.35s/it] 33%|███▎ | 5650/17285 [50:41:30<104:55:36, 32.47s/it] {'loss': 1.5825, 'learning_rate': 0.00016192066323037722, 'epoch': 0.98} + 33%|███▎ | 5650/17285 [50:41:30<104:55:36, 32.47s/it] 33%|███▎ | 5651/17285 [50:42:00<102:47:09, 31.81s/it] 33%|███▎ | 5652/17285 [50:42:31<102:39:04, 31.77s/it] 33%|███▎ | 5653/17285 [50:42:58<97:31:14, 30.18s/it] 33%|███▎ | 5654/17285 [50:43:34<102:50:57, 31.83s/it] 33%|███▎ | 5655/17285 [50:44:06<103:34:22, 32.06s/it] 33%|███▎ | 5656/17285 [50:44:39<104:15:30, 32.28s/it] 33%|███▎ | 5657/17285 [50:45:14<107:22:34, 33.24s/it] 33%|███▎ | 5658/17285 [50:45:46<105:26:46, 32.65s/it] 33%|███▎ | 5659/17285 [50:46:15<101:41:13, 31.49s/it] 33%|███▎ | 5660/17285 [50:46:42<97:58:45, 30.34s/it] {'loss': 1.5871, 'learning_rate': 0.00016177031449597098, 'epoch': 0.98} + 33%|███▎ | 5660/17285 [50:46:42<97:58:45, 30.34s/it] 33%|███▎ | 5661/17285 [50:47:20<105:40:10, 32.73s/it] 33%|███▎ | 5662/17285 [50:47:55<107:16:44, 33.23s/it] 33%|███▎ | 5663/17285 [50:48:24<103:10:59, 31.96s/it] 33%|███▎ | 5664/17285 [50:48:55<102:04:05, 31.62s/it] 33%|███▎ | 5665/17285 [50:49:19<95:04:45, 29.46s/it] 33%|███▎ | 5666/17285 [50:50:07<112:27:56, 34.85s/it] 33%|███▎ | 5667/17285 [50:50:35<105:51:01, 32.80s/it] 33%|███▎ | 5668/17285 [50:51:00<99:07:01, 30.72s/it] 33%|███▎ | 5669/17285 [50:51:35<103:13:58, 31.99s/it] 33%|███▎ | 5670/17285 [50:52:12<107:18:53, 33.26s/it] {'loss': 1.5518, 'learning_rate': 0.0001616197396446142, 'epoch': 0.98} + 33%|███▎ | 5670/17285 [50:52:12<107:18:53, 33.26s/it] 33%|███▎ | 5671/17285 [50:52:45<107:10:52, 33.22s/it] 33%|███▎ | 5672/17285 [50:53:15<104:39:08, 32.44s/it] 33%|███▎ | 5673/17285 [50:53:45<101:40:06, 31.52s/it] 33%|███▎ | 5674/17285 [50:54:21<106:01:35, 32.87s/it] 33%|███▎ | 5675/17285 [50:54:53<105:30:18, 32.71s/it] 33%|███▎ | 5676/17285 [50:55:23<102:34:01, 31.81s/it] 33%|███▎ | 5677/17285 [50:55:58<105:58:00, 32.86s/it] 33%|███▎ | 5678/17285 [50:56:30<105:25:37, 32.70s/it] 33%|███▎ | 5679/17285 [50:57:05<107:13:44, 33.26s/it] 33%|███▎ | 5680/17285 [50:57:43<111:26:04, 34.57s/it] {'loss': 1.5645, 'learning_rate': 0.0001614689392275025, 'epoch': 0.99} + 33%|███▎ | 5680/17285 [50:57:43<111:26:04, 34.57s/it] 33%|███▎ | 5681/17285 [50:58:22<115:37:54, 35.87s/it] 33%|███▎ | 5682/17285 [50:58:56<114:04:51, 35.40s/it] 33%|███▎ | 5683/17285 [50:59:21<103:45:23, 32.19s/it] 33%|███▎ | 5684/17285 [50:59:48<98:59:28, 30.72s/it] 33%|███▎ | 5685/17285 [51:00:17<97:08:29, 30.15s/it] 33%|███▎ | 5686/17285 [51:00:50<100:21:24, 31.15s/it] 33%|███▎ | 5687/17285 [51:01:21<100:29:33, 31.19s/it] 33%|███▎ | 5688/17285 [51:01:57<104:21:26, 32.40s/it] 33%|███▎ | 5689/17285 [51:02:36<111:29:33, 34.61s/it] 33%|███▎ | 5690/17285 [51:03:09<109:39:14, 34.05s/it] {'loss': 1.549, 'learning_rate': 0.00016131791379665717, 'epoch': 0.99} + 33%|███▎ | 5690/17285 [51:03:09<109:39:14, 34.05s/it] 33%|███▎ | 5691/17285 [51:03:41<107:37:46, 33.42s/it] 33%|███▎ | 5692/17285 [51:04:07<99:55:03, 31.03s/it] 33%|███▎ | 5693/17285 [51:04:47<109:00:11, 33.85s/it] 33%|███▎ | 5694/17285 [51:05:17<104:59:38, 32.61s/it] 33%|███▎ | 5695/17285 [51:05:43<98:55:50, 30.73s/it] 33%|███▎ | 5696/17285 [51:06:12<97:02:51, 30.15s/it] 33%|███▎ | 5697/17285 [51:06:38<92:45:05, 28.81s/it] 33%|███▎ | 5698/17285 [51:07:03<89:10:48, 27.71s/it] 33%|███▎ | 5699/17285 [51:07:38<96:44:12, 30.06s/it] 33%|███▎ | 5700/17285 [51:08:19<106:39:17, 33.14s/it] {'loss': 1.5868, 'learning_rate': 0.00016116666390492325, 'epoch': 0.99} + 33%|███▎ | 5700/17285 [51:08:19<106:39:17, 33.14s/it] 33%|███▎ | 5701/17285 [51:08:44<99:32:39, 30.94s/it] 33%|███▎ | 5702/17285 [51:09:14<98:36:01, 30.65s/it] 33%|███▎ | 5703/17285 [51:09:46<99:37:31, 30.97s/it] 33%|███▎ | 5704/17285 [51:10:18<100:47:04, 31.33s/it] 33%|███▎ | 5705/17285 [51:10:49<99:55:29, 31.06s/it] 33%|███▎ | 5706/17285 [51:11:23<103:28:28, 32.17s/it] 33%|███▎ | 5707/17285 [51:11:54<101:33:32, 31.58s/it] 33%|███▎ | 5708/17285 [51:12:28<103:48:47, 32.28s/it] 33%|███▎ | 5709/17285 [51:12:58<102:06:27, 31.75s/it] 33%|███▎ | 5710/17285 [51:13:24<96:34:07, 30.03s/it] {'loss': 1.5491, 'learning_rate': 0.00016101519010596743, 'epoch': 0.99} + 33%|███▎ | 5710/17285 [51:13:24<96:34:07, 30.03s/it] 33%|███▎ | 5711/17285 [51:13:55<97:34:26, 30.35s/it] 33%|███▎ | 5712/17285 [51:14:24<95:56:13, 29.84s/it] 33%|███▎ | 5713/17285 [51:15:03<105:19:47, 32.77s/it] 33%|███▎ | 5714/17285 [51:15:30<99:43:57, 31.03s/it] 33%|███▎ | 5715/17285 [51:16:04<102:15:09, 31.82s/it] 33%|███▎ | 5716/17285 [51:16:36<102:40:39, 31.95s/it] 33%|███▎ | 5717/17285 [51:17:03<97:59:25, 30.49s/it] 33%|███▎ | 5718/17285 [51:17:38<101:46:28, 31.68s/it] 33%|███▎ | 5719/17285 [51:18:05<97:10:58, 30.25s/it] 33%|███▎ | 5720/17285 [51:18:33<94:48:30, 29.51s/it] {'loss': 1.5768, 'learning_rate': 0.00016086349295427595, 'epoch': 0.99} + 33%|███▎ | 5720/17285 [51:18:33<94:48:30, 29.51s/it] 33%|███▎ | 5721/17285 [51:18:59<92:08:34, 28.69s/it] 33%|███▎ | 5722/17285 [51:19:42<105:25:36, 32.82s/it] 33%|███▎ | 5723/17285 [51:20:15<106:04:51, 33.03s/it] 33%|███▎ | 5724/17285 [51:20:45<102:59:30, 32.07s/it] 33%|███▎ | 5725/17285 [51:21:14<99:35:45, 31.02s/it] 33%|███▎ | 5726/17285 [51:21:48<103:10:52, 32.14s/it] 33%|███▎ | 5727/17285 [51:22:13<96:03:19, 29.92s/it] [2023-08-25 03:17:20,792] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 33%|███▎ | 5728/17285 [51:22:43<96:04:53, 29.93s/it] 33%|███▎ | 5729/17285 [51:23:14<96:45:13, 30.14s/it] 33%|███▎ | 5730/17285 [51:23:45<97:23:34, 30.34s/it] {'loss': 1.5492, 'learning_rate': 0.00016072677501010647, 'epoch': 0.99} + 33%|███▎ | 5730/17285 [51:23:45<97:23:34, 30.34s/it] 33%|███▎ | 5731/17285 [51:24:16<98:26:24, 30.67s/it] 33%|███▎ | 5732/17285 [51:24:51<102:43:17, 32.01s/it] 33%|███▎ | 5733/17285 [51:25:28<106:58:55, 33.34s/it] 33%|███▎ | 5734/17285 [51:25:52<98:27:25, 30.69s/it] 33%|███▎ | 5735/17285 [51:26:20<95:26:38, 29.75s/it] 33%|███▎ | 5736/17285 [51:26:44<90:36:59, 28.25s/it] 33%|███▎ | 5737/17285 [51:27:13<90:34:35, 28.24s/it] 33%|███▎ | 5738/17285 [51:27:43<92:36:43, 28.87s/it] 33%|███▎ | 5739/17285 [51:28:13<93:55:51, 29.29s/it] 33%|███▎ | 5740/17285 [51:28:45<96:18:55, 30.03s/it] {'loss': 1.5695, 'learning_rate': 0.00016057465501875367, 'epoch': 1.0} + 33%|███▎ | 5740/17285 [51:28:45<96:18:55, 30.03s/it] 33%|███▎ | 5741/17285 [51:29:21<101:47:41, 31.74s/it] 33%|███▎ | 5742/17285 [51:29:48<97:28:08, 30.40s/it] 33%|███▎ | 5743/17285 [51:30:26<105:07:09, 32.79s/it] 33%|███▎ | 5744/17285 [51:30:58<103:59:53, 32.44s/it] 33%|███▎ | 5745/17285 [51:31:31<104:12:22, 32.51s/it] 33%|███▎ | 5746/17285 [51:32:09<109:54:37, 34.29s/it] 33%|███▎ | 5747/17285 [51:32:50<115:51:20, 36.15s/it] 33%|███▎ | 5748/17285 [51:33:20<110:10:00, 34.38s/it] 33%|███▎ | 5749/17285 [51:33:50<106:09:33, 33.13s/it] 33%|███▎ | 5750/17285 [51:34:16<99:22:39, 31.02s/it] {'loss': 1.5693, 'learning_rate': 0.00016042231328729185, 'epoch': 1.0} + 33%|███▎ | 5750/17285 [51:34:16<99:22:39, 31.02s/it] 33%|███▎ | 5751/17285 [51:34:51<103:29:49, 32.30s/it] 33%|███▎ | 5752/17285 [51:35:19<99:22:52, 31.02s/it] 33%|███▎ | 5753/17285 [51:35:58<107:04:09, 33.42s/it] 33%|███▎ | 5754/17285 [51:36:29<104:19:20, 32.57s/it] 33%|███▎ | 5755/17285 [51:36:58<101:13:33, 31.61s/it] 33%|███▎ | 5756/17285 [51:37:31<102:29:44, 32.00s/it] 33%|███▎ | 5757/17285 [51:38:01<100:04:37, 31.25s/it] 33%|███▎ | 5758/17285 [51:38:29<97:09:47, 30.35s/it] 33%|███▎ | 5759/17285 [51:39:03<100:20:38, 31.34s/it] 33%|███▎ | 5760/17285 [51:39:36<102:09:44, 31.91s/it] {'loss': 1.5273, 'learning_rate': 0.0001602697503733844, 'epoch': 1.0} + 33%|███▎ | 5760/17285 [51:39:36<102:09:44, 31.91s/it] 33%|███▎ | 5761/17285 [51:40:10<104:20:57, 32.60s/it] 33%|███▎ | 5762/17285 [51:40:42<103:48:18, 32.43s/it] 33%|███▎ | 5763/17285 [51:41:19<107:30:47, 33.59s/it] 33%|███▎ | 5764/17285 [51:41:47<102:39:47, 32.08s/it] 33%|███▎ | 5765/17285 [51:42:27<110:29:20, 34.53s/it] 33%|███▎ | 5766/17285 [51:43:01<110:04:51, 34.40s/it] 33%|███▎ | 5767/17285 [51:43:33<107:41:13, 33.66s/it] 33%|███▎ | 5768/17285 [51:44:11<111:53:32, 34.98s/it] 33%|███▎ | 5769/17285 [51:44:53<118:39:04, 37.09s/it] 33%|███▎ | 5770/17285 [51:45:23<111:17:37, 34.79s/it] {'loss': 1.4587, 'learning_rate': 0.00016011696683550456, 'epoch': 1.0} + 33%|███▎ | 5770/17285 [51:45:23<111:17:37, 34.79s/it] 33%|███▎ | 5771/17285 [51:45:56<109:22:45, 34.20s/it] 33%|███▎ | 5772/17285 [51:46:31<110:33:33, 34.57s/it] 33%|███▎ | 5773/17285 [51:47:00<104:56:07, 32.82s/it] 33%|███▎ | 5774/17285 [51:47:28<100:56:19, 31.57s/it] 33%|███▎ | 5775/17285 [51:48:01<102:15:33, 31.98s/it] 33%|███▎ | 5776/17285 [51:48:27<96:02:37, 30.04s/it] 33%|███▎ | 5777/17285 [51:49:03<101:56:36, 31.89s/it] 33%|███▎ | 5778/17285 [51:49:34<100:39:41, 31.49s/it] 33%|███▎ | 5779/17285 [51:50:07<102:03:36, 31.93s/it] 33%|███▎ | 5780/17285 [51:50:37<100:34:10, 31.47s/it] {'loss': 1.4827, 'learning_rate': 0.00015996396323293295, 'epoch': 1.0} + 33%|███▎ | 5780/17285 [51:50:37<100:34:10, 31.47s/it] 33%|███▎ | 5781/17285 [51:51:09<100:42:31, 31.52s/it] 33%|███▎ | 5782/17285 [51:51:37<97:31:43, 30.52s/it] 33%|███▎ | 5783/17285 [51:52:03<93:38:11, 29.31s/it] 33%|███▎ | 5784/17285 [51:52:34<94:36:17, 29.61s/it] 33%|███▎ | 5785/17285 [51:53:01<92:35:09, 28.98s/it] 33%|███▎ | 5786/17285 [51:53:36<98:04:37, 30.71s/it] 33%|███▎ | 5787/17285 [51:54:03<94:38:12, 29.63s/it] 33%|███▎ | 5788/17285 [51:54:36<98:07:58, 30.73s/it] 33%|███▎ | 5789/17285 [51:55:09<99:42:17, 31.22s/it] 33%|███▎ | 5790/17285 [51:55:40<99:47:24, 31.25s/it] {'loss': 1.4734, 'learning_rate': 0.00015981074012575593, 'epoch': 1.0} + 33%|███▎ | 5790/17285 [51:55:40<99:47:24, 31.25s/it] 34%|███▎ | 5791/17285 [51:56:08<96:30:00, 30.22s/it] 34%|███▎ | 5792/17285 [51:56:41<99:26:00, 31.15s/it] 34%|███▎ | 5793/17285 [51:57:18<104:32:10, 32.75s/it] 34%|███▎ | 5794/17285 [51:57:48<102:21:52, 32.07s/it] 34%|███▎ | 5795/17285 [51:58:16<98:18:39, 30.80s/it] 34%|███▎ | 5796/17285 [51:58:47<98:46:14, 30.95s/it] 34%|███▎ | 5797/17285 [51:59:13<93:50:39, 29.41s/it] 34%|███▎ | 5798/17285 [51:59:53<103:29:06, 32.43s/it] 34%|███▎ | 5799/17285 [52:00:28<106:11:37, 33.28s/it] 34%|███▎ | 5800/17285 [52:01:09<114:06:53, 35.77s/it] {'loss': 1.4913, 'learning_rate': 0.0001596572980748634, 'epoch': 1.01} + 34%|███▎ | 5800/17285 [52:01:09<114:06:53, 35.77s/it] 34%|███▎ | 5801/17285 [52:01:35<104:41:51, 32.82s/it] 34%|███▎ | 5802/17285 [52:02:09<105:54:45, 33.20s/it] 34%|███▎ | 5803/17285 [52:02:43<106:09:43, 33.29s/it] 34%|███▎ | 5804/17285 [52:03:13<103:20:15, 32.40s/it] 34%|███▎ | 5805/17285 [52:03:44<102:09:56, 32.04s/it] 34%|███▎ | 5806/17285 [52:04:17<102:27:15, 32.13s/it] 34%|███▎ | 5807/17285 [52:04:56<109:14:02, 34.26s/it] 34%|███▎ | 5808/17285 [52:05:25<104:12:00, 32.68s/it] 34%|███▎ | 5809/17285 [52:05:56<102:18:18, 32.09s/it] 34%|███▎ | 5810/17285 [52:06:32<106:17:46, 33.35s/it] {'loss': 1.4804, 'learning_rate': 0.00015950363764194662, 'epoch': 1.01} + 34%|███▎ | 5810/17285 [52:06:32<106:17:46, 33.35s/it] 34%|███▎ | 5811/17285 [52:07:04<105:21:22, 33.06s/it] 34%|███▎ | 5812/17285 [52:07:38<105:52:57, 33.22s/it] 34%|███▎ | 5813/17285 [52:08:15<109:34:47, 34.39s/it] 34%|███▎ | 5814/17285 [52:08:50<109:58:32, 34.51s/it] 34%|███▎ | 5815/17285 [52:09:22<107:13:06, 33.65s/it] 34%|███▎ | 5816/17285 [52:09:50<101:56:36, 32.00s/it] 34%|███▎ | 5817/17285 [52:10:19<99:17:08, 31.17s/it] 34%|███▎ | 5818/17285 [52:10:57<105:50:39, 33.23s/it] 34%|███▎ | 5819/17285 [52:11:28<103:40:58, 32.55s/it] 34%|███▎ | 5820/17285 [52:11:53<96:29:34, 30.30s/it] {'loss': 1.4434, 'learning_rate': 0.0001593497593894963, 'epoch': 1.01} + 34%|███▎ | 5820/17285 [52:11:53<96:29:34, 30.30s/it] 34%|███▎ | 5821/17285 [52:12:22<95:43:53, 30.06s/it] 34%|███▎ | 5822/17285 [52:12:52<95:09:58, 29.89s/it] 34%|███▎ | 5823/17285 [52:13:27<99:39:54, 31.30s/it] 34%|███▎ | 5824/17285 [52:13:57<99:03:30, 31.12s/it] 34%|███▎ | 5825/17285 [52:14:27<98:00:19, 30.79s/it] 34%|███▎ | 5826/17285 [52:14:55<94:50:04, 29.79s/it] 34%|███▎ | 5827/17285 [52:15:24<94:34:00, 29.71s/it] 34%|███▎ | 5828/17285 [52:15:50<90:56:35, 28.58s/it] 34%|███▎ | 5829/17285 [52:16:23<94:44:13, 29.77s/it] 34%|███▎ | 5830/17285 [52:16:49<91:13:11, 28.67s/it] {'loss': 1.4501, 'learning_rate': 0.00015919566388080048, 'epoch': 1.01} + 34%|███▎ | 5830/17285 [52:16:49<91:13:11, 28.67s/it] 34%|███▎ | 5831/17285 [52:17:24<97:33:59, 30.67s/it] 34%|███▎ | 5832/17285 [52:17:52<94:59:14, 29.86s/it] 34%|███▎ | 5833/17285 [52:18:18<90:50:15, 28.56s/it] 34%|███▍ | 5834/17285 [52:18:44<88:26:08, 27.80s/it] 34%|███▍ | 5835/17285 [52:19:13<90:18:17, 28.39s/it] 34%|███▍ | 5836/17285 [52:19:47<94:43:46, 29.79s/it] 34%|███▍ | 5837/17285 [52:20:18<96:16:55, 30.28s/it] 34%|███▍ | 5838/17285 [52:20:52<99:56:43, 31.43s/it] 34%|███▍ | 5839/17285 [52:21:28<104:27:43, 32.86s/it] 34%|███▍ | 5840/17285 [52:21:58<101:55:28, 32.06s/it] {'loss': 1.4815, 'learning_rate': 0.00015904135167994264, 'epoch': 1.01} + 34%|███▍ | 5840/17285 [52:21:58<101:55:28, 32.06s/it] 34%|███▍ | 5841/17285 [52:22:32<103:35:27, 32.59s/it] 34%|███▍ | 5842/17285 [52:23:10<108:24:04, 34.10s/it] 34%|███▍ | 5843/17285 [52:23:43<107:22:29, 33.78s/it] 34%|███▍ | 5844/17285 [52:24:15<105:59:50, 33.35s/it] 34%|███▍ | 5845/17285 [52:24:45<102:09:58, 32.15s/it] 34%|███▍ | 5846/17285 [52:25:08<94:04:48, 29.61s/it] 34%|███▍ | 5847/17285 [52:25:42<97:56:02, 30.82s/it] 34%|███▍ | 5848/17285 [52:26:11<96:01:22, 30.22s/it] 34%|███▍ | 5849/17285 [52:26:38<93:04:03, 29.30s/it] 34%|███▍ | 5850/17285 [52:27:06<91:32:46, 28.82s/it] {'loss': 1.4588, 'learning_rate': 0.00015888682335179924, 'epoch': 1.02} + 34%|███▍ | 5850/17285 [52:27:06<91:32:46, 28.82s/it] 34%|███▍ | 5851/17285 [52:27:32<88:54:11, 27.99s/it] 34%|███▍ | 5852/17285 [52:28:03<91:35:39, 28.84s/it] 34%|███▍ | 5853/17285 [52:28:36<96:22:04, 30.35s/it] 34%|███▍ | 5854/17285 [52:29:06<95:18:32, 30.02s/it] 34%|███▍ | 5855/17285 [52:29:45<104:15:01, 32.83s/it] 34%|███▍ | 5856/17285 [52:30:14<100:39:21, 31.71s/it] 34%|███▍ | 5857/17285 [52:30:49<103:55:23, 32.74s/it] 34%|███▍ | 5858/17285 [52:31:26<108:11:29, 34.09s/it] 34%|███▍ | 5859/17285 [52:32:06<113:26:42, 35.74s/it] 34%|███▍ | 5860/17285 [52:32:36<108:01:19, 34.04s/it] {'loss': 1.4268, 'learning_rate': 0.00015873207946203802, 'epoch': 1.02} + 34%|███▍ | 5860/17285 [52:32:36<108:01:19, 34.04s/it] 34%|███▍ | 5861/17285 [52:33:14<112:04:03, 35.32s/it] 34%|███▍ | 5862/17285 [52:33:41<104:03:53, 32.80s/it] 34%|███▍ | 5863/17285 [52:34:10<100:32:30, 31.69s/it] 34%|███▍ | 5864/17285 [52:34:45<103:36:20, 32.66s/it] 34%|███▍ | 5865/17285 [52:35:13<98:42:22, 31.12s/it] 34%|███▍ | 5866/17285 [52:35:51<105:38:52, 33.31s/it] 34%|███▍ | 5867/17285 [52:36:28<108:31:55, 34.22s/it] 34%|███▍ | 5868/17285 [52:37:03<109:50:14, 34.63s/it] 34%|███▍ | 5869/17285 [52:37:35<107:07:40, 33.78s/it] 34%|███▍ | 5870/17285 [52:38:02<100:19:03, 31.64s/it] {'loss': 1.4502, 'learning_rate': 0.00015857712057711592, 'epoch': 1.02} + 34%|███▍ | 5870/17285 [52:38:02<100:19:03, 31.64s/it] 34%|███▍ | 5871/17285 [52:38:30<97:20:29, 30.70s/it] 34%|███▍ | 5872/17285 [52:39:01<97:40:05, 30.81s/it] 34%|███▍ | 5873/17285 [52:39:42<106:52:43, 33.72s/it] 34%|███▍ | 5874/17285 [52:40:15<106:21:10, 33.55s/it] 34%|███▍ | 5875/17285 [52:40:41<99:10:32, 31.29s/it] 34%|███▍ | 5876/17285 [52:41:12<98:50:30, 31.19s/it] 34%|███▍ | 5877/17285 [52:41:46<101:09:33, 31.92s/it] 34%|███▍ | 5878/17285 [52:42:24<107:00:18, 33.77s/it] 34%|███▍ | 5879/17285 [52:42:55<104:30:58, 32.99s/it][2023-08-25 04:37:59,467] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 34%|███▍ | 5880/17285 [52:43:22<98:48:22, 31.19s/it] {'loss': 1.4486, 'learning_rate': 0.00015843747422863421, 'epoch': 1.02} + 34%|███▍ | 5880/17285 [52:43:22<98:48:22, 31.19s/it] 34%|███▍ | 5881/17285 [52:43:47<93:16:33, 29.45s/it] 34%|███▍ | 5882/17285 [52:44:14<90:41:21, 28.63s/it] 34%|███▍ | 5883/17285 [52:44:52<99:29:36, 31.41s/it] 34%|███▍ | 5884/17285 [52:45:27<103:23:02, 32.64s/it] 34%|███▍ | 5885/17285 [52:45:52<95:57:32, 30.30s/it] 34%|███▍ | 5886/17285 [52:46:27<100:19:24, 31.68s/it] 34%|███▍ | 5887/17285 [52:46:55<96:33:22, 30.50s/it] 34%|███▍ | 5888/17285 [52:47:20<91:22:57, 28.87s/it] 34%|███▍ | 5889/17285 [52:47:55<97:27:14, 30.79s/it] 34%|███▍ | 5890/17285 [52:48:26<97:12:25, 30.71s/it] {'loss': 1.4331, 'learning_rate': 0.00015828210841631188, 'epoch': 1.02} + 34%|███▍ | 5890/17285 [52:48:26<97:12:25, 30.71s/it] 34%|███▍ | 5891/17285 [52:48:50<91:12:25, 28.82s/it] 34%|███▍ | 5892/17285 [52:49:25<96:51:50, 30.61s/it] 34%|███▍ | 5893/17285 [52:50:03<104:14:25, 32.94s/it] 34%|███▍ | 5894/17285 [52:50:33<101:03:00, 31.94s/it] 34%|███▍ | 5895/17285 [52:50:58<94:58:11, 30.02s/it] 34%|███��� | 5896/17285 [52:51:26<92:46:44, 29.33s/it] 34%|███▍ | 5897/17285 [52:51:50<87:55:58, 27.80s/it] 34%|███▍ | 5898/17285 [52:52:29<98:42:45, 31.21s/it] 34%|███▍ | 5899/17285 [52:52:59<97:33:47, 30.85s/it] 34%|███▍ | 5900/17285 [52:53:34<101:06:54, 31.97s/it] {'loss': 1.4565, 'learning_rate': 0.0001581265292559965, 'epoch': 1.02} + 34%|███▍ | 5900/17285 [52:53:34<101:06:54, 31.97s/it] 34%|███▍ | 5901/17285 [52:54:02<97:37:18, 30.87s/it] 34%|███▍ | 5902/17285 [52:54:40<104:02:32, 32.90s/it] 34%|███▍ | 5903/17285 [52:55:12<103:35:04, 32.76s/it] 34%|███▍ | 5904/17285 [52:55:50<108:14:44, 34.24s/it] 34%|███▍ | 5905/17285 [52:56:22<105:45:38, 33.46s/it] 34%|███▍ | 5906/17285 [52:56:54<104:27:21, 33.05s/it] 34%|███▍ | 5907/17285 [52:57:36<113:18:30, 35.85s/it] 34%|███▍ | 5908/17285 [52:58:07<108:56:10, 34.47s/it] 34%|███▍ | 5909/17285 [52:58:47<113:28:42, 35.91s/it] 34%|███▍ | 5910/17285 [52:59:20<111:10:19, 35.18s/it] {'loss': 1.453, 'learning_rate': 0.00015797073731720253, 'epoch': 1.03} + 34%|███▍ | 5910/17285 [52:59:20<111:10:19, 35.18s/it] 34%|███▍ | 5911/17285 [52:59:50<105:56:29, 33.53s/it] 34%|███▍ | 5912/17285 [53:00:18<100:45:29, 31.89s/it] 34%|███▍ | 5913/17285 [53:00:52<102:34:26, 32.47s/it] 34%|███▍ | 5914/17285 [53:01:21<99:41:15, 31.56s/it] 34%|███▍ | 5915/17285 [53:01:56<103:03:54, 32.63s/it] 34%|███▍ | 5916/17285 [53:02:37<110:55:25, 35.12s/it] 34%|███▍ | 5917/17285 [53:03:13<111:40:33, 35.37s/it] 34%|███▍ | 5918/17285 [53:03:44<107:44:04, 34.12s/it] 34%|███▍ | 5919/17285 [53:04:21<110:11:25, 34.90s/it] 34%|███▍ | 5920/17285 [53:04:47<101:26:59, 32.14s/it] {'loss': 1.45, 'learning_rate': 0.00015781473317022333, 'epoch': 1.03} + 34%|███▍ | 5920/17285 [53:04:47<101:26:59, 32.14s/it] 34%|███▍ | 5921/17285 [53:05:13<95:25:17, 30.23s/it] 34%|███▍ | 5922/17285 [53:05:39<91:34:01, 29.01s/it] 34%|███▍ | 5923/17285 [53:06:19<102:05:26, 32.35s/it] 34%|███▍ | 5924/17285 [53:06:48<98:57:28, 31.36s/it] 34%|███▍ | 5925/17285 [53:07:19<98:25:43, 31.19s/it] 34%|███▍ | 5926/17285 [53:07:48<96:34:13, 30.61s/it] 34%|███▍ | 5927/17285 [53:08:24<102:01:40, 32.34s/it] 34%|███▍ | 5928/17285 [53:08:59<104:08:40, 33.01s/it] 34%|███▍ | 5929/17285 [53:09:28<100:26:23, 31.84s/it] 34%|███▍ | 5930/17285 [53:10:06<106:14:15, 33.68s/it] {'loss': 1.4356, 'learning_rate': 0.00015765851738612895, 'epoch': 1.03} + 34%|███▍ | 5930/17285 [53:10:06<106:14:15, 33.68s/it] 34%|███▍ | 5931/17285 [53:10:37<103:09:06, 32.71s/it] 34%|███▍ | 5932/17285 [53:11:09<103:11:09, 32.72s/it] 34%|███▍ | 5933/17285 [53:11:39<100:45:52, 31.95s/it] 34%|███▍ | 5934/17285 [53:12:06<95:22:54, 30.25s/it] 34%|███▍ | 5935/17285 [53:12:44<103:25:22, 32.80s/it] 34%|███▍ | 5936/17285 [53:13:20<106:18:15, 33.72s/it] 34%|███▍ | 5937/17285 [53:13:50<102:15:21, 32.44s/it] 34%|███▍ | 5938/17285 [53:14:27<106:46:34, 33.88s/it] 34%|███▍ | 5939/17285 [53:14:58<104:23:15, 33.12s/it] 34%|███▍ | 5940/17285 [53:15:26<98:57:41, 31.40s/it] {'loss': 1.4915, 'learning_rate': 0.00015750209053676432, 'epoch': 1.03} + 34%|███▍ | 5940/17285 [53:15:26<98:57:41, 31.40s/it] 34%|███▍ | 5941/17285 [53:15:54<95:53:30, 30.43s/it] 34%|███▍ | 5942/17285 [53:16:19<91:14:21, 28.96s/it] 34%|███▍ | 5943/17285 [53:16:50<92:46:40, 29.45s/it] 34%|███▍ | 5944/17285 [53:17:26<99:15:07, 31.51s/it] 34%|███▍ | 5945/17285 [53:18:14<114:47:40, 36.44s/it] 34%|███▍ | 5946/17285 [53:18:47<111:24:56, 35.37s/it] 34%|███▍ | 5947/17285 [53:19:17<106:16:26, 33.74s/it] 34%|███▍ | 5948/17285 [53:19:51<106:29:21, 33.82s/it] 34%|███▍ | 5949/17285 [53:20:16<97:49:29, 31.07s/it] 34%|███▍ | 5950/17285 [53:20:51<102:04:32, 32.42s/it] {'loss': 1.4443, 'learning_rate': 0.00015734545319474693, 'epoch': 1.03} + 34%|███▍ | 5950/17285 [53:20:51<102:04:32, 32.42s/it] 34%|███▍ | 5951/17285 [53:21:27<105:14:51, 33.43s/it] 34%|███▍ | 5952/17285 [53:22:02<106:40:16, 33.88s/it] 34%|███▍ | 5953/17285 [53:22:36<106:58:24, 33.98s/it] 34%|███▍ | 5954/17285 [53:23:12<108:50:14, 34.58s/it] 34%|███▍ | 5955/17285 [53:23:50<111:32:40, 35.44s/it] 34%|███▍ | 5956/17285 [53:24:23<109:33:24, 34.81s/it] 34%|███▍ | 5957/17285 [53:24:50<101:56:13, 32.40s/it] 34%|███▍ | 5958/17285 [53:25:21<101:06:36, 32.14s/it] 34%|███▍ | 5959/17285 [53:25:57<104:33:37, 33.23s/it] 34%|███▍ | 5960/17285 [53:26:31<104:42:57, 33.29s/it] {'loss': 1.4245, 'learning_rate': 0.00015718860593346473, 'epoch': 1.03} + 34%|███▍ | 5960/17285 [53:26:31<104:42:57, 33.29s/it] 34%|███▍ | 5961/17285 [53:27:09<109:37:22, 34.85s/it] 34%|███▍ | 5962/17285 [53:27:44<109:14:03, 34.73s/it] 34%|███▍ | 5963/17285 [53:28:09<100:38:44, 32.00s/it] 35%|███▍ | 5964/17285 [53:28:35<95:09:23, 30.26s/it] 35%|███▍ | 5965/17285 [53:29:13<101:46:18, 32.37s/it] 35%|███▍ | 5966/17285 [53:29:43<99:35:34, 31.68s/it] 35%|███▍ | 5967/17285 [53:30:11<96:39:24, 30.74s/it] 35%|███▍ | 5968/17285 [53:30:44<98:39:08, 31.38s/it] 35%|███▍ | 5969/17285 [53:31:15<98:11:36, 31.24s/it] 35%|███▍ | 5970/17285 [53:31:45<97:17:26, 30.95s/it] {'loss': 1.4927, 'learning_rate': 0.0001570315493270742, 'epoch': 1.04} + 35%|███▍ | 5970/17285 [53:31:45<97:17:26, 30.95s/it] 35%|███▍ | 5971/17285 [53:32:19<99:36:41, 31.70s/it] 35%|███▍ | 5972/17285 [53:32:54<103:09:06, 32.82s/it] 35%|███▍ | 5973/17285 [53:33:28<103:47:06, 33.03s/it] 35%|███▍ | 5974/17285 [53:33:54<97:43:02, 31.10s/it] 35%|███▍ | 5975/17285 [53:34:23<95:11:08, 30.30s/it] 35%|███▍ | 5976/17285 [53:34:54<96:10:25, 30.62s/it] 35%|███▍ | 5977/17285 [53:35:28<99:38:15, 31.72s/it] 35%|███▍ | 5978/17285 [53:36:05<104:23:47, 33.24s/it] 35%|███▍ | 5979/17285 [53:36:41<106:49:04, 34.01s/it] 35%|███▍ | 5980/17285 [53:37:10<101:40:16, 32.38s/it] {'loss': 1.4698, 'learning_rate': 0.00015687428395049814, 'epoch': 1.04} + 35%|███▍ | 5980/17285 [53:37:10<101:40:16, 32.38s/it] 35%|███▍ | 5981/17285 [53:37:39<98:41:11, 31.43s/it] 35%|███▍ | 5982/17285 [53:38:13<101:01:46, 32.18s/it] 35%|███▍ | 5983/17285 [53:38:39<95:45:35, 30.50s/it] 35%|███▍ | 5984/17285 [53:39:05<91:35:36, 29.18s/it] 35%|███▍ | 5985/17285 [53:39:38<94:55:18, 30.24s/it] 35%|███▍ | 5986/17285 [53:40:10<96:07:23, 30.63s/it] 35%|███▍ | 5987/17285 [53:40:35<91:27:07, 29.14s/it] 35%|███▍ | 5988/17285 [53:41:05<92:09:56, 29.37s/it] 35%|███▍ | 5989/17285 [53:41:34<91:12:49, 29.07s/it] 35%|███▍ | 5990/17285 [53:42:03<91:22:53, 29.13s/it] {'loss': 1.4568, 'learning_rate': 0.00015671681037942355, 'epoch': 1.04} + 35%|███▍ | 5990/17285 [53:42:03<91:22:53, 29.13s/it] 35%|███▍ | 5991/17285 [53:42:45<103:59:28, 33.15s/it] 35%|███▍ | 5992/17285 [53:43:19<104:29:39, 33.31s/it] 35%|███▍ | 5993/17285 [53:43:58<109:23:54, 34.88s/it] 35%|███▍ | 5994/17285 [53:44:28<105:05:26, 33.51s/it] 35%|███▍ | 5995/17285 [53:45:12<115:02:14, 36.68s/it] 35%|███▍ | 5996/17285 [53:45:50<115:59:32, 36.99s/it] 35%|███▍ | 5997/17285 [53:46:22<111:10:57, 35.46s/it] 35%|███▍ | 5998/17285 [53:46:48<103:05:01, 32.88s/it] 35%|███▍ | 5999/17285 [53:47:21<102:39:29, 32.75s/it] 35%|███▍ | 6000/17285 [53:47:47<96:38:17, 30.83s/it] {'loss': 1.4498, 'learning_rate': 0.00015655912919029953, 'epoch': 1.04} + 35%|███▍ | 6000/17285 [53:47:47<96:38:17, 30.83s/it][INFO|trainer.py:3081] 2023-08-25 05:42:24,938 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-25 05:42:24,939 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-25 05:42:24,939 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-3000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-6000 +[INFO|tokenization_utils_base.py:2210] 2023-08-25 05:43:49,639 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-6000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-25 05:43:49,645 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-6000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-6000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-6000 + 35%|███▍ | 6001/17285 [53:49:56<188:17:04, 60.07s/it] 35%|███▍ | 6002/17285 [53:50:29<163:28:37, 52.16s/it] 35%|███▍ | 6003/17285 [53:51:01<143:50:27, 45.90s/it] 35%|███▍ | 6004/17285 [53:51:30<128:09:02, 40.90s/it] 35%|███▍ | 6005/17285 [53:52:00<117:45:12, 37.58s/it] 35%|███▍ | 6006/17285 [53:52:33<113:59:40, 36.38s/it] 35%|███▍ | 6007/17285 [53:53:07<111:38:17, 35.64s/it] 35%|███▍ | 6008/17285 [53:53:38<106:54:33, 34.13s/it] 35%|███▍ | 6009/17285 [53:54:07<102:27:38, 32.71s/it] 35%|███▍ | 6010/17285 [53:54:40<102:55:10, 32.86s/it] {'loss': 1.4347, 'learning_rate': 0.00015640124096033526, 'epoch': 1.04} + 35%|███▍ | 6010/17285 [53:54:40<102:55:10, 32.86s/it] 35%|███▍ | 6011/17285 [53:55:06<96:18:10, 30.75s/it] 35%|███▍ | 6012/17285 [53:55:39<98:16:04, 31.38s/it] 35%|███▍ | 6013/17285 [53:56:14<101:54:03, 32.54s/it] 35%|███▍ | 6014/17285 [53:56:38<94:03:00, 30.04s/it] 35%|███▍ | 6015/17285 [53:57:13<97:49:45, 31.25s/it] 35%|███▍ | 6016/17285 [53:57:49<102:46:49, 32.83s/it] 35%|███▍ | 6017/17285 [53:58:25<105:19:57, 33.65s/it] 35%|███▍ | 6018/17285 [53:58:50<97:56:26, 31.29s/it] 35%|███▍ | 6019/17285 [53:59:16<92:39:51, 29.61s/it] 35%|███▍ | 6020/17285 [53:59:47<94:09:39, 30.09s/it] {'loss': 1.4754, 'learning_rate': 0.0001562431462674977, 'epoch': 1.04} + 35%|███▍ | 6020/17285 [53:59:47<94:09:39, 30.09s/it] 35%|███▍ | 6021/17285 [54:00:18<95:04:31, 30.39s/it] 35%|███▍ | 6022/17285 [54:00:50<96:35:21, 30.87s/it] 35%|███▍ | 6023/17285 [54:01:21<96:28:59, 30.84s/it] 35%|███▍ | 6024/17285 [54:01:49<93:16:53, 29.82s/it] 35%|███▍ | 6025/17285 [54:02:31<104:47:18, 33.50s/it] 35%|███▍ | 6026/17285 [54:03:01<101:55:03, 32.59s/it] 35%|███▍ | 6027/17285 [54:03:28<96:49:52, 30.96s/it] 35%|███▍ | 6028/17285 [54:04:04<100:54:51, 32.27s/it] 35%|███▍ | 6029/17285 [54:04:35<100:31:45, 32.15s/it] 35%|███▍ | 6030/17285 [54:05:08<100:51:21, 32.26s/it] {'loss': 1.4716, 'learning_rate': 0.00015608484569050975, 'epoch': 1.05} + 35%|███▍ | 6030/17285 [54:05:08<100:51:21, 32.26s/it] 35%|███▍ | 6031/17285 [54:05:43<103:32:05, 33.12s/it] 35%|███▍ | 6032/17285 [54:06:12<99:55:25, 31.97s/it] 35%|███▍ | 6033/17285 [54:06:43<98:21:35, 31.47s/it] 35%|███▍ | 6034/17285 [54:07:17<101:05:30, 32.35s/it] 35%|███▍ | 6035/17285 [54:07:52<103:55:32, 33.26s/it] 35%|███▍ | 6036/17285 [54:08:23<101:34:10, 32.51s/it] 35%|███▍ | 6037/17285 [54:08:58<104:07:41, 33.33s/it] 35%|███▍ | 6038/17285 [54:09:24<96:51:58, 31.01s/it] 35%|███▍ | 6039/17285 [54:09:53<94:38:17, 30.30s/it] 35%|███▍ | 6040/17285 [54:10:23<94:55:45, 30.39s/it] {'loss': 1.4523, 'learning_rate': 0.00015592633980884778, 'epoch': 1.05} + 35%|███▍ | 6040/17285 [54:10:23<94:55:45, 30.39s/it] 35%|███▍ | 6041/17285 [54:10:58<99:16:29, 31.78s/it] 35%|███▍ | 6042/17285 [54:11:23<92:56:05, 29.76s/it] 35%|███▍ | 6043/17285 [54:11:47<87:25:51, 28.00s/it][2023-08-25 06:06:55,423] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 35%|███▍ | 6044/17285 [54:12:18<89:43:38, 28.74s/it] 35%|███▍ | 6045/17285 [54:12:49<92:16:23, 29.55s/it] 35%|███▍ | 6046/17285 [54:13:15<89:11:00, 28.57s/it] 35%|███▍ | 6047/17285 [54:13:46<91:27:50, 29.30s/it] 35%|███▍ | 6048/17285 [54:14:27<102:03:02, 32.69s/it] 35%|███▍ | 6049/17285 [54:14:57<99:43:28, 31.95s/it] 35%|███▌ | 6050/17285 [54:15:24<95:10:49, 30.50s/it] {'loss': 1.4578, 'learning_rate': 0.00015578350945939874, 'epoch': 1.05} + 35%|███▌ | 6050/17285 [54:15:24<95:10:49, 30.50s/it] 35%|███▌ | 6051/17285 [54:15:55<95:25:06, 30.58s/it] 35%|███▌ | 6052/17285 [54:16:30<99:06:44, 31.76s/it] 35%|███▌ | 6053/17285 [54:17:04<101:50:32, 32.64s/it] 35%|███▌ | 6054/17285 [54:17:35<99:43:02, 31.96s/it] 35%|███▌ | 6055/17285 [54:18:14<106:52:48, 34.26s/it] 35%|███▌ | 6056/17285 [54:18:49<107:35:33, 34.49s/it] 35%|███▌ | 6057/17285 [54:19:19<103:01:13, 33.03s/it] 35%|███▌ | 6058/17285 [54:19:46<97:13:54, 31.18s/it] 35%|███▌ | 6059/17285 [54:20:14<93:52:03, 30.10s/it] 35%|███▌ | 6060/17285 [54:20:45<94:51:18, 30.42s/it] {'loss': 1.4311, 'learning_rate': 0.00015562461509800382, 'epoch': 1.05} + 35%|███▌ | 6060/17285 [54:20:45<94:51:18, 30.42s/it] 35%|███▌ | 6061/17285 [54:21:15<94:59:59, 30.47s/it] 35%|███▌ | 6062/17285 [54:21:45<94:00:49, 30.16s/it] 35%|███▌ | 6063/17285 [54:22:14<93:09:36, 29.89s/it] 35%|███▌ | 6064/17285 [54:22:44<93:37:08, 30.04s/it][2023-08-25 06:17:52,267] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 35%|███▌ | 6065/17285 [54:23:15<93:49:24, 30.10s/it] 35%|███▌ | 6066/17285 [54:23:47<95:40:23, 30.70s/it] 35%|███▌ | 6067/17285 [54:24:24<101:48:17, 32.67s/it] 35%|███▌ | 6068/17285 [54:24:50<95:44:52, 30.73s/it] 35%|███▌ | 6069/17285 [54:25:21<95:52:24, 30.77s/it] 35%|███▌ | 6070/17285 [54:26:02<104:59:04, 33.70s/it] {'loss': 1.4419, 'learning_rate': 0.0001554814360610988, 'epoch': 1.05} + 35%|███▌ | 6070/17285 [54:26:02<104:59:04, 33.70s/it] 35%|███▌ | 6071/17285 [54:26:35<104:57:00, 33.69s/it] 35%|███▌ | 6072/17285 [54:27:10<105:41:35, 33.93s/it] 35%|███▌ | 6073/17285 [54:27:43<104:56:29, 33.70s/it] 35%|███▌ | 6074/17285 [54:28:11<99:50:16, 32.06s/it] 35%|███▌ | 6075/17285 [54:28:39<95:32:47, 30.68s/it] 35%|███▌ | 6076/17285 [54:29:03<89:55:21, 28.88s/it] 35%|███▌ | 6077/17285 [54:29:33<90:43:23, 29.14s/it] 35%|███▌ | 6078/17285 [54:30:03<91:53:36, 29.52s/it] 35%|███▌ | 6079/17285 [54:30:34<93:01:00, 29.88s/it] 35%|███▌ | 6080/17285 [54:31:09<97:48:55, 31.43s/it] {'loss': 1.449, 'learning_rate': 0.00015532215531972608, 'epoch': 1.06} + 35%|███▌ | 6080/17285 [54:31:09<97:48:55, 31.43s/it] 35%|███▌ | 6081/17285 [54:31:40<96:56:16, 31.15s/it] 35%|███▌ | 6082/17285 [54:32:20<105:37:44, 33.94s/it] 35%|███▌ | 6083/17285 [54:32:55<106:31:26, 34.23s/it] 35%|███▌ | 6084/17285 [54:33:30<107:02:48, 34.40s/it] 35%|███▌ | 6085/17285 [54:34:00<102:48:08, 33.04s/it] 35%|███▌ | 6086/17285 [54:34:35<105:08:13, 33.80s/it] 35%|███▌ | 6087/17285 [54:35:14<109:21:52, 35.16s/it] 35%|███▌ | 6088/17285 [54:35:49<109:46:36, 35.29s/it][2023-08-25 06:30:57,166] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 35%|███▌ | 6089/17285 [54:36:19<105:05:13, 33.79s/it] 35%|███▌ | 6090/17285 [54:36:59<109:58:08, 35.36s/it] {'loss': 1.4604, 'learning_rate': 0.0001551786294874456, 'epoch': 1.06} + 35%|███▌ | 6090/17285 [54:36:59<109:58:08, 35.36s/it] 35%|███▌ | 6091/17285 [54:37:30<106:16:43, 34.18s/it] 35%|███▌ | 6092/17285 [54:38:00<102:53:22, 33.09s/it] 35%|███▌ | 6093/17285 [54:38:40<108:34:48, 34.93s/it] 35%|███▌ | 6094/17285 [54:39:05<99:31:28, 32.02s/it] 35%|███▌ | 6095/17285 [54:39:37<99:58:31, 32.16s/it] 35%|███▌ | 6096/17285 [54:40:06<96:51:17, 31.16s/it] 35%|███▌ | 6097/17285 [54:40:40<99:09:00, 31.90s/it] 35%|███▌ | 6098/17285 [54:41:12<99:02:15, 31.87s/it] 35%|███▌ | 6099/17285 [54:41:54<108:26:20, 34.90s/it] 35%|███▌ | 6100/17285 [54:42:33<112:12:06, 36.11s/it] {'loss': 1.4673, 'learning_rate': 0.0001550189644709561, 'epoch': 1.06} + 35%|███▌ | 6100/17285 [54:42:33<112:12:06, 36.11s/it] 35%|███▌ | 6101/17285 [54:43:14<117:20:10, 37.77s/it] 35%|███▌ | 6102/17285 [54:43:50<115:01:52, 37.03s/it] 35%|███▌ | 6103/17285 [54:44:17<105:47:17, 34.06s/it] 35%|███▌ | 6104/17285 [54:44:53<107:28:37, 34.60s/it] 35%|███▌ | 6105/17285 [54:45:21<101:54:22, 32.81s/it] 35%|███▌ | 6106/17285 [54:45:52<99:38:31, 32.09s/it] 35%|███▌ | 6107/17285 [54:46:19<95:40:36, 30.81s/it] 35%|███▌ | 6108/17285 [54:47:01<105:43:50, 34.05s/it] 35%|███▌ | 6109/17285 [54:47:28<98:41:09, 31.79s/it] 35%|███▌ | 6110/17285 [54:48:11<109:11:37, 35.18s/it] {'loss': 1.4787, 'learning_rate': 0.00015485909805156665, 'epoch': 1.06} + 35%|███▌ | 6110/17285 [54:48:11<109:11:37, 35.18s/it] 35%|███▌ | 6111/17285 [54:48:38<101:34:30, 32.73s/it] 35%|███▌ | 6112/17285 [54:49:11<102:05:09, 32.89s/it] 35%|███▌ | 6113/17285 [54:49:41<99:33:04, 32.08s/it] 35%|███▌ | 6114/17285 [54:50:12<98:19:00, 31.68s/it] 35%|███▌ | 6115/17285 [54:50:49<103:06:23, 33.23s/it] 35%|███▌ | 6116/17285 [54:51:18<99:44:33, 32.15s/it] 35%|███▌ | 6117/17285 [54:51:51<99:52:06, 32.19s/it] 35%|███▌ | 6118/17285 [54:52:27<104:08:34, 33.57s/it] 35%|███▌ | 6119/17285 [54:52:55<98:10:35, 31.65s/it] 35%|███▌ | 6120/17285 [54:53:26<98:20:39, 31.71s/it] {'loss': 1.4571, 'learning_rate': 0.0001546990308144857, 'epoch': 1.06} + 35%|███▌ | 6120/17285 [54:53:26<98:20:39, 31.71s/it] 35%|███▌ | 6121/17285 [54:54:01<101:11:37, 32.63s/it] 35%|███▌ | 6122/17285 [54:54:35<102:42:06, 33.12s/it] 35%|███▌ | 6123/17285 [54:55:09<102:50:56, 33.17s/it] 35%|███▌ | 6124/17285 [54:55:35<96:21:32, 31.08s/it] 35%|███▌ | 6125/17285 [54:56:11<100:39:12, 32.47s/it] 35%|███▌ | 6126/17285 [54:56:41<98:33:03, 31.79s/it] 35%|███▌ | 6127/17285 [54:57:14<99:20:04, 32.05s/it] 35%|███▌ | 6128/17285 [54:57:44<97:52:46, 31.58s/it] 35%|███▌ | 6129/17285 [54:58:18<100:05:33, 32.30s/it] 35%|███▌ | 6130/17285 [54:58:43<93:20:25, 30.12s/it] {'loss': 1.4773, 'learning_rate': 0.0001545387633456568, 'epoch': 1.06} + 35%|███▌ | 6130/17285 [54:58:43<93:20:25, 30.12s/it] 35%|███▌ | 6131/17285 [54:59:16<95:56:30, 30.97s/it] 35%|███▌ | 6132/17285 [54:59:43<92:29:18, 29.85s/it] 35%|███▌ | 6133/17285 [55:00:13<92:35:38, 29.89s/it] 35%|███▌ | 6134/17285 [55:00:50<98:36:06, 31.83s/it] 35%|███▌ | 6135/17285 [55:01:23<100:11:44, 32.35s/it] 35%|███▌ | 6136/17285 [55:01:50<95:27:06, 30.82s/it] 36%|███▌ | 6137/17285 [55:02:18<92:07:33, 29.75s/it] 36%|███▌ | 6138/17285 [55:02:54<98:15:26, 31.73s/it] 36%|███▌ | 6139/17285 [55:03:23<95:49:53, 30.95s/it] 36%|███▌ | 6140/17285 [55:03:57<98:48:53, 31.92s/it] {'loss': 1.4816, 'learning_rate': 0.00015437829623175637, 'epoch': 1.07} + 36%|███▌ | 6140/17285 [55:03:57<98:48:53, 31.92s/it] 36%|███▌ | 6141/17285 [55:04:26<95:45:08, 30.93s/it] 36%|███▌ | 6142/17285 [55:04:58<97:03:59, 31.36s/it] 36%|███▌ | 6143/17285 [55:05:29<96:43:55, 31.25s/it] 36%|███▌ | 6144/17285 [55:06:00<96:05:42, 31.05s/it] 36%|███▌ | 6145/17285 [55:06:29<94:19:36, 30.48s/it] 36%|███▌ | 6146/17285 [55:07:05<99:25:13, 32.13s/it] 36%|███▌ | 6147/17285 [55:07:41<102:51:33, 33.25s/it] 36%|███▌ | 6148/17285 [55:08:07<96:27:49, 31.18s/it] 36%|███▌ | 6149/17285 [55:08:38<96:18:36, 31.13s/it] 36%|███▌ | 6150/17285 [55:09:09<96:11:32, 31.10s/it] {'loss': 1.5024, 'learning_rate': 0.00015421763006019177, 'epoch': 1.07} + 36%|███▌ | 6150/17285 [55:09:09<96:11:32, 31.10s/it] 36%|███▌ | 6151/17285 [55:09:42<97:37:57, 31.57s/it] 36%|███▌ | 6152/17285 [55:10:22<105:46:12, 34.20s/it] 36%|███▌ | 6153/17285 [55:10:47<96:44:21, 31.28s/it] 36%|███▌ | 6154/17285 [55:11:15<94:09:54, 30.45s/it] 36%|███▌ | 6155/17285 [55:11:44<92:56:16, 30.06s/it] 36%|███▌ | 6156/17285 [55:12:11<89:23:25, 28.92s/it] 36%|███▌ | 6157/17285 [55:12:43<92:57:03, 30.07s/it] 36%|███▌ | 6158/17285 [55:13:14<93:18:52, 30.19s/it] 36%|███▌ | 6159/17285 [55:13:53<101:27:06, 32.83s/it] 36%|███▌ | 6160/17285 [55:14:24<99:53:21, 32.32s/it] {'loss': 1.4956, 'learning_rate': 0.00015405676541909897, 'epoch': 1.07} + 36%|███▌ | 6160/17285 [55:14:24<99:53:21, 32.32s/it] 36%|███▌ | 6161/17285 [55:14:59<102:32:12, 33.18s/it] 36%|███▌ | 6162/17285 [55:15:32<102:03:16, 33.03s/it] 36%|███▌ | 6163/17285 [55:16:04<101:30:59, 32.86s/it] 36%|███▌ | 6164/17285 [55:16:36<100:07:24, 32.41s/it] 36%|███▌ | 6165/17285 [55:17:10<101:52:12, 32.98s/it] 36%|███▌ | 6166/17285 [55:17:35<94:35:34, 30.63s/it] 36%|███▌ | 6167/17285 [55:18:09<98:00:17, 31.73s/it] 36%|███▌ | 6168/17285 [55:18:38<94:41:51, 30.67s/it] 36%|███▌ | 6169/17285 [55:19:13<99:25:20, 32.20s/it] 36%|███▌ | 6170/17285 [55:19:44<97:34:31, 31.60s/it] {'loss': 1.4515, 'learning_rate': 0.00015389570289734046, 'epoch': 1.07} + 36%|███▌ | 6170/17285 [55:19:44<97:34:31, 31.60s/it] 36%|███▌ | 6171/17285 [55:20:18<100:15:57, 32.48s/it] 36%|███▌ | 6172/17285 [55:20:58<107:28:34, 34.82s/it] 36%|███▌ | 6173/17285 [55:21:24<99:02:31, 32.09s/it] 36%|███▌ | 6174/17285 [55:21:58<100:47:36, 32.66s/it] 36%|███▌ | 6175/17285 [55:22:26<96:43:01, 31.34s/it] 36%|███▌ | 6176/17285 [55:22:53<92:30:33, 29.98s/it] 36%|███▌ | 6177/17285 [55:23:25<94:09:41, 30.52s/it] 36%|███▌ | 6178/17285 [55:23:56<94:39:57, 30.68s/it] 36%|███▌ | 6179/17285 [55:24:23<91:27:25, 29.65s/it] 36%|███▌ | 6180/17285 [55:24:58<96:12:40, 31.19s/it] {'loss': 1.4745, 'learning_rate': 0.00015373444308450313, 'epoch': 1.07} + 36%|███▌ | 6180/17285 [55:24:58<96:12:40, 31.19s/it] 36%|███▌ | 6181/17285 [55:25:28<94:53:11, 30.76s/it] 36%|███▌ | 6182/17285 [55:26:02<97:49:07, 31.72s/it] 36%|███▌ | 6183/17285 [55:26:32<96:45:49, 31.38s/it] 36%|███▌ | 6184/17285 [55:27:16<108:07:12, 35.06s/it] 36%|███▌ | 6185/17285 [55:27:45<102:16:44, 33.17s/it] 36%|███▌ | 6186/17285 [55:28:26<110:09:18, 35.73s/it] 36%|███▌ | 6187/17285 [55:28:57<105:06:21, 34.09s/it] 36%|███▌ | 6188/17285 [55:29:30<104:06:09, 33.77s/it] 36%|███▌ | 6189/17285 [55:29:56<96:57:52, 31.46s/it] 36%|███▌ | 6190/17285 [55:30:28<97:33:35, 31.66s/it] {'loss': 1.4185, 'learning_rate': 0.00015357298657089606, 'epoch': 1.07} + 36%|███▌ | 6190/17285 [55:30:28<97:33:35, 31.66s/it] 36%|███▌ | 6191/17285 [55:30:54<92:03:07, 29.87s/it] 36%|███▌ | 6192/17285 [55:31:32<99:38:30, 32.34s/it] 36%|███▌ | 6193/17285 [55:32:01<96:42:11, 31.39s/it] 36%|███▌ | 6194/17285 [55:32:26<91:09:40, 29.59s/it] 36%|███▌ | 6195/17285 [55:32:59<94:18:10, 30.61s/it] 36%|███▌ | 6196/17285 [55:33:27<91:32:52, 29.72s/it] 36%|███▌ | 6197/17285 [55:33:59<93:39:10, 30.41s/it] 36%|███▌ | 6198/17285 [55:34:40<103:17:06, 33.54s/it] 36%|███▌ | 6199/17285 [55:35:06<96:44:21, 31.41s/it] 36%|███▌ | 6200/17285 [55:35:41<99:39:09, 32.36s/it] {'loss': 1.4462, 'learning_rate': 0.00015341133394754838, 'epoch': 1.08} + 36%|███▌ | 6200/17285 [55:35:41<99:39:09, 32.36s/it] 36%|███▌ | 6201/17285 [55:36:12<98:23:29, 31.96s/it] 36%|███▌ | 6202/17285 [55:36:41<96:03:53, 31.20s/it] 36%|███▌ | 6203/17285 [55:37:11<94:18:32, 30.64s/it] 36%|███▌ | 6204/17285 [55:37:38<90:57:31, 29.55s/it] 36%|███▌ | 6205/17285 [55:38:11<94:39:05, 30.75s/it] 36%|███▌ | 6206/17285 [55:38:44<96:42:56, 31.43s/it] 36%|███▌ | 6207/17285 [55:39:20<101:14:40, 32.90s/it] 36%|███▌ | 6208/17285 [55:39:54<102:08:03, 33.19s/it] 36%|███▌ | 6209/17285 [55:40:29<103:25:38, 33.62s/it] 36%|███▌ | 6210/17285 [55:41:04<104:44:05, 34.04s/it] {'loss': 1.4664, 'learning_rate': 0.00015324948580620703, 'epoch': 1.08} + 36%|███▌ | 6210/17285 [55:41:04<104:44:05, 34.04s/it] 36%|███▌ | 6211/17285 [55:41:35<101:49:29, 33.10s/it] 36%|███▌ | 6212/17285 [55:42:08<102:12:44, 33.23s/it] 36%|███▌ | 6213/17285 [55:42:40<100:42:52, 32.75s/it] 36%|███▌ | 6214/17285 [55:43:12<100:04:17, 32.54s/it] 36%|███▌ | 6215/17285 [55:43:49<103:41:28, 33.72s/it] 36%|███▌ | 6216/17285 [55:44:20<101:16:55, 32.94s/it] 36%|███▌ | 6217/17285 [55:44:53<102:01:32, 33.19s/it] 36%|███▌ | 6218/17285 [55:45:25<100:07:08, 32.57s/it] 36%|███▌ | 6219/17285 [55:45:58<100:47:14, 32.79s/it] 36%|███▌ | 6220/17285 [55:46:27<97:30:16, 31.72s/it] {'loss': 1.4747, 'learning_rate': 0.00015308744273933477, 'epoch': 1.08} + 36%|███▌ | 6220/17285 [55:46:27<97:30:16, 31.72s/it] 36%|███▌ | 6221/17285 [55:47:00<98:56:07, 32.19s/it] 36%|███▌ | 6222/17285 [55:47:37<102:52:03, 33.47s/it] 36%|███▌ | 6223/17285 [55:48:06<98:52:08, 32.18s/it] 36%|███▌ | 6224/17285 [55:48:36<96:38:11, 31.45s/it] 36%|███▌ | 6225/17285 [55:49:08<97:27:54, 31.72s/it] 36%|███▌ | 6226/17285 [55:49:37<95:11:00, 30.98s/it] 36%|███▌ | 6227/17285 [55:50:13<99:26:21, 32.37s/it] 36%|███▌ | 6228/17285 [55:50:38<92:41:52, 30.18s/it] 36%|███▌ | 6229/17285 [55:51:17<100:53:47, 32.85s/it] 36%|███▌ | 6230/17285 [55:51:44<95:03:04, 30.95s/it] {'loss': 1.4091, 'learning_rate': 0.00015292520534010784, 'epoch': 1.08} + 36%|███▌ | 6230/17285 [55:51:44<95:03:04, 30.95s/it] 36%|███▌ | 6231/17285 [55:52:21<100:28:37, 32.72s/it][2023-08-25 07:47:37,357] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 36%|███▌ | 6232/17285 [55:53:00<106:20:53, 34.64s/it] 36%|███▌ | 6233/17285 [55:53:39<110:59:15, 36.15s/it] 36%|███▌ | 6234/17285 [55:54:10<105:31:04, 34.37s/it] 36%|███▌ | 6235/17285 [55:54:45<106:50:36, 34.81s/it] 36%|███▌ | 6236/17285 [55:55:18<104:50:37, 34.16s/it] 36%|███▌ | 6237/17285 [55:55:48<100:37:49, 32.79s/it] 36%|███▌ | 6238/17285 [55:56:19<98:59:50, 32.26s/it] 36%|███▌ | 6239/17285 [55:56:46<94:00:17, 30.64s/it] 36%|███▌ | 6240/17285 [55:57:23<100:04:28, 32.62s/it] {'loss': 1.4634, 'learning_rate': 0.00015277902601747382, 'epoch': 1.08} + 36%|███▌ | 6240/17285 [55:57:23<100:04:28, 32.62s/it] 36%|███▌ | 6241/17285 [55:57:57<101:43:05, 33.16s/it] 36%|███▌ | 6242/17285 [55:58:30<101:10:14, 32.98s/it] 36%|███▌ | 6243/17285 [55:58:59<97:34:20, 31.81s/it] 36%|███▌ | 6244/17285 [55:59:28<95:20:23, 31.09s/it] 36%|███▌ | 6245/17285 [56:00:01<96:43:55, 31.54s/it] 36%|███▌ | 6246/17285 [56:00:34<98:25:25, 32.10s/it] 36%|███▌ | 6247/17285 [56:01:15<106:17:42, 34.67s/it] 36%|███▌ | 6248/17285 [56:01:49<105:27:35, 34.40s/it] 36%|███▌ | 6249/17285 [56:02:15<97:46:37, 31.90s/it] 36%|███▌ | 6250/17285 [56:02:42<93:14:30, 30.42s/it] {'loss': 1.4462, 'learning_rate': 0.0001526164210235197, 'epoch': 1.08} + 36%|███▌ | 6250/17285 [56:02:42<93:14:30, 30.42s/it] 36%|███▌ | 6251/17285 [56:03:14<94:35:25, 30.86s/it] 36%|███▌ | 6252/17285 [56:03:50<99:44:46, 32.55s/it] 36%|███▌ | 6253/17285 [56:04:28<104:35:24, 34.13s/it] 36%|███▌ | 6254/17285 [56:05:00<102:18:32, 33.39s/it] 36%|███▌ | 6255/17285 [56:05:28<97:47:22, 31.92s/it] 36%|███▌ | 6256/17285 [56:06:04<101:34:35, 33.16s/it] 36%|███▌ | 6257/17285 [56:06:33<97:40:52, 31.89s/it] 36%|███▌ | 6258/17285 [56:07:09<101:23:58, 33.10s/it] 36%|███▌ | 6259/17285 [56:07:40<99:41:52, 32.55s/it] 36%|███▌ | 6260/17285 [56:08:06<93:13:18, 30.44s/it] {'loss': 1.4771, 'learning_rate': 0.0001524536234214371, 'epoch': 1.09} + 36%|███▌ | 6260/17285 [56:08:06<93:13:18, 30.44s/it] 36%|███▌ | 6261/17285 [56:08:35<92:24:53, 30.18s/it] 36%|███▌ | 6262/17285 [56:09:02<89:20:30, 29.18s/it] 36%|███▌ | 6263/17285 [56:09:32<89:31:27, 29.24s/it] 36%|███▌ | 6264/17285 [56:10:00<88:41:19, 28.97s/it] 36%|███▌ | 6265/17285 [56:10:27<86:41:53, 28.32s/it] 36%|███▋ | 6266/17285 [56:10:51<83:20:01, 27.23s/it] 36%|███▋ | 6267/17285 [56:11:26<90:02:23, 29.42s/it] 36%|███▋ | 6268/17285 [56:11:55<89:27:04, 29.23s/it] 36%|███▋ | 6269/17285 [56:12:22<87:41:38, 28.66s/it] 36%|███▋ | 6270/17285 [56:12:50<87:13:28, 28.51s/it] {'loss': 1.4639, 'learning_rate': 0.0001522906338071643, 'epoch': 1.09} + 36%|███▋ | 6270/17285 [56:12:50<87:13:28, 28.51s/it] 36%|███▋ | 6271/17285 [56:13:23<91:29:29, 29.90s/it] 36%|███▋ | 6272/17285 [56:13:59<96:35:35, 31.57s/it] 36%|███▋ | 6273/17285 [56:14:27<93:25:07, 30.54s/it] 36%|███▋ | 6274/17285 [56:15:05<100:36:01, 32.89s/it] 36%|███▋ | 6275/17285 [56:15:40<102:32:37, 33.53s/it] 36%|███▋ | 6276/17285 [56:16:12<101:07:47, 33.07s/it] 36%|███▋ | 6277/17285 [56:16:46<102:03:09, 33.37s/it] 36%|███▋ | 6278/17285 [56:17:15<97:48:49, 31.99s/it] 36%|███▋ | 6279/17285 [56:17:47<97:17:06, 31.82s/it] 36%|███▋ | 6280/17285 [56:18:19<97:46:12, 31.98s/it] {'loss': 1.4302, 'learning_rate': 0.00015212745277734259, 'epoch': 1.09} + 36%|███▋ | 6280/17285 [56:18:20<97:46:12, 31.98s/it] 36%|███▋ | 6281/17285 [56:18:54<100:17:31, 32.81s/it] 36%|███▋ | 6282/17285 [56:19:26<99:25:51, 32.53s/it] 36%|███▋ | 6283/17285 [56:19:55<96:21:32, 31.53s/it] 36%|███▋ | 6284/17285 [56:20:28<97:33:25, 31.92s/it] 36%|███▋ | 6285/17285 [56:20:53<91:45:28, 30.03s/it] 36%|███▋ | 6286/17285 [56:21:25<93:33:13, 30.62s/it] 36%|███▋ | 6287/17285 [56:21:54<92:15:50, 30.20s/it] 36%|███▋ | 6288/17285 [56:22:24<91:42:36, 30.02s/it] 36%|███▋ | 6289/17285 [56:22:53<90:36:50, 29.67s/it] 36%|███▋ | 6290/17285 [56:23:24<91:41:00, 30.02s/it] {'loss': 1.4412, 'learning_rate': 0.00015196408092931383, 'epoch': 1.09} + 36%|███▋ | 6290/17285 [56:23:24<91:41:00, 30.02s/it] 36%|███▋ | 6291/17285 [56:24:03<100:07:47, 32.79s/it] 36%|███▋ | 6292/17285 [56:24:35<99:11:31, 32.48s/it] 36%|███▋ | 6293/17285 [56:25:07<99:03:17, 32.44s/it] 36%|███▋ | 6294/17285 [56:25:38<98:06:10, 32.13s/it] 36%|███▋ | 6295/17285 [56:26:11<98:26:36, 32.25s/it] 36%|███▋ | 6296/17285 [56:26:50<105:03:40, 34.42s/it] 36%|███▋ | 6297/17285 [56:27:21<101:01:25, 33.10s/it] 36%|███▋ | 6298/17285 [56:27:49<97:12:57, 31.85s/it] 36%|███▋ | 6299/17285 [56:28:15<91:38:09, 30.03s/it] 36%|███▋ | 6300/17285 [56:28:52<98:06:43, 32.15s/it] {'loss': 1.4518, 'learning_rate': 0.0001518005188611184, 'epoch': 1.09} + 36%|███▋ | 6300/17285 [56:28:52<98:06:43, 32.15s/it] 36%|███▋ | 6301/17285 [56:29:22<96:15:42, 31.55s/it] 36%|███▋ | 6302/17285 [56:29:55<97:14:46, 31.88s/it] 36%|███▋ | 6303/17285 [56:30:26<96:16:07, 31.56s/it] 36%|███▋ | 6304/17285 [56:30:58<96:37:29, 31.68s/it] 36%|███▋ | 6305/17285 [56:31:29<96:24:53, 31.61s/it] 36%|███▋ | 6306/17285 [56:32:07<102:13:32, 33.52s/it] 36%|███▋ | 6307/17285 [56:32:41<102:20:24, 33.56s/it] 36%|███▋ | 6308/17285 [56:33:19<105:58:58, 34.76s/it] 36%|███▋ | 6309/17285 [56:33:55<107:55:12, 35.40s/it] 37%|███▋ | 6310/17285 [56:34:28<105:24:36, 34.58s/it] {'loss': 1.4316, 'learning_rate': 0.00015163676717149308, 'epoch': 1.1} + 37%|███▋ | 6310/17285 [56:34:28<105:24:36, 34.58s/it] 37%|███▋ | 6311/17285 [56:34:55<98:28:42, 32.31s/it] 37%|███▋ | 6312/17285 [56:35:30<100:48:03, 33.07s/it] 37%|███▋ | 6313/17285 [56:35:57<95:00:52, 31.18s/it] 37%|███▋ | 6314/17285 [56:36:39<104:46:58, 34.38s/it] 37%|███▋ | 6315/17285 [56:37:07<99:43:04, 32.72s/it] 37%|███▋ | 6316/17285 [56:37:48<106:47:09, 35.05s/it] 37%|███▋ | 6317/17285 [56:38:23<106:31:46, 34.97s/it] 37%|███▋ | 6318/17285 [56:38:56<105:28:30, 34.62s/it] 37%|███▋ | 6319/17285 [56:39:28<102:35:39, 33.68s/it] 37%|███▋ | 6320/17285 [56:39:55<96:52:17, 31.80s/it] {'loss': 1.4317, 'learning_rate': 0.00015147282645986866, 'epoch': 1.1} + 37%|███▋ | 6320/17285 [56:39:55<96:52:17, 31.80s/it] 37%|███▋ | 6321/17285 [56:40:31<100:33:09, 33.02s/it] 37%|███▋ | 6322/17285 [56:41:02<98:26:56, 32.33s/it] 37%|███▋ | 6323/17285 [56:41:34<98:34:18, 32.37s/it] 37%|███▋ | 6324/17285 [56:42:02<93:57:43, 30.86s/it] 37%|███▋ | 6325/17285 [56:42:38<98:49:02, 32.46s/it] 37%|███▋ | 6326/17285 [56:43:09<97:12:23, 31.93s/it] 37%|███▋ | 6327/17285 [56:43:38<94:37:35, 31.09s/it] 37%|███▋ | 6328/17285 [56:44:14<98:55:10, 32.50s/it] 37%|███▋ | 6329/17285 [56:44:41<93:57:21, 30.87s/it] 37%|███▋ | 6330/17285 [56:45:08<91:01:27, 29.91s/it] {'loss': 1.4911, 'learning_rate': 0.00015130869732636804, 'epoch': 1.1} + 37%|███▋ | 6330/17285 [56:45:08<91:01:27, 29.91s/it] 37%|███▋ | 6331/17285 [56:45:42<94:36:22, 31.09s/it] 37%|███▋ | 6332/17285 [56:46:15<96:17:30, 31.65s/it] 37%|███▋ | 6333/17285 [56:46:45<94:38:35, 31.11s/it] 37%|███▋ | 6334/17285 [56:47:16<94:38:06, 31.11s/it] 37%|███▋ | 6335/17285 [56:47:52<98:55:58, 32.53s/it] 37%|███▋ | 6336/17285 [56:48:24<98:46:07, 32.47s/it] 37%|███▋ | 6337/17285 [56:48:51<93:54:46, 30.88s/it] 37%|███▋ | 6338/17285 [56:49:29<100:02:00, 32.90s/it] 37%|███▋ | 6339/17285 [56:50:08<105:56:15, 34.84s/it] 37%|███▋ | 6340/17285 [56:50:45<107:14:29, 35.27s/it] {'loss': 1.4359, 'learning_rate': 0.00015114438037180364, 'epoch': 1.1} + 37%|███▋ | 6340/17285 [56:50:45<107:14:29, 35.27s/it] 37%|███▋ | 6341/17285 [56:51:17<104:32:30, 34.39s/it] 37%|███▋ | 6342/17285 [56:51:48<101:09:19, 33.28s/it] 37%|███▋ | 6343/17285 [56:52:12<92:49:15, 30.54s/it] 37%|███▋ | 6344/17285 [56:52:41<91:59:09, 30.27s/it] 37%|███▋ | 6345/17285 [56:53:16<96:12:29, 31.66s/it] 37%|███▋ | 6346/17285 [56:54:00<106:44:07, 35.13s/it] 37%|███▋ | 6347/17285 [56:54:44<115:33:40, 38.03s/it] 37%|███▋ | 6348/17285 [56:55:15<108:57:58, 35.87s/it] 37%|███▋ | 6349/17285 [56:55:48<106:14:53, 34.98s/it] 37%|███▋ | 6350/17285 [56:56:15<98:26:08, 32.41s/it] {'loss': 1.4517, 'learning_rate': 0.00015097987619767556, 'epoch': 1.1} + 37%|███▋ | 6350/17285 [56:56:15<98:26:08, 32.41s/it] 37%|███▋ | 6351/17285 [56:56:39<91:14:54, 30.04s/it] 37%|███▋ | 6352/17285 [56:57:11<93:17:53, 30.72s/it] 37%|███▋ | 6353/17285 [56:57:39<90:50:02, 29.91s/it] 37%|███▋ | 6354/17285 [56:58:13<94:24:37, 31.09s/it] 37%|███▋ | 6355/17285 [56:58:39<89:50:44, 29.59s/it][2023-08-25 08:53:46,761] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 37%|███▋ | 6356/17285 [56:59:09<89:59:26, 29.64s/it] 37%|███▋ | 6357/17285 [56:59:37<88:51:05, 29.27s/it] 37%|███▋ | 6358/17285 [57:00:15<96:06:56, 31.67s/it] 37%|███▋ | 6359/17285 [57:00:49<98:13:12, 32.36s/it] 37%|███▋ | 6360/17285 [57:01:30<105:52:39, 34.89s/it] {'loss': 1.4638, 'learning_rate': 0.0001508316628659255, 'epoch': 1.1} + 37%|███▋ | 6360/17285 [57:01:30<105:52:39, 34.89s/it] 37%|███▋ | 6361/17285 [57:02:10<110:50:40, 36.53s/it] 37%|███▋ | 6362/17285 [57:02:44<108:28:59, 35.75s/it] 37%|███▋ | 6363/17285 [57:03:28<115:50:08, 38.18s/it] 37%|███▋ | 6364/17285 [57:04:09<118:15:39, 38.98s/it] 37%|███▋ | 6365/17285 [57:04:35<107:06:18, 35.31s/it] 37%|███▋ | 6366/17285 [57:05:13<109:35:59, 36.14s/it] 37%|███▋ | 6367/17285 [57:05:50<109:55:01, 36.24s/it] 37%|███▋ | 6368/17285 [57:06:23<107:05:12, 35.31s/it] 37%|███�� | 6369/17285 [57:06:56<105:22:05, 34.75s/it] 37%|███▋ | 6370/17285 [57:07:24<98:59:02, 32.65s/it] {'loss': 1.4206, 'learning_rate': 0.000150666804634212, 'epoch': 1.11} + 37%|███▋ | 6370/17285 [57:07:24<98:59:02, 32.65s/it] 37%|███▋ | 6371/17285 [57:07:54<96:29:08, 31.83s/it] 37%|███▋ | 6372/17285 [57:08:28<98:24:58, 32.47s/it] 37%|███▋ | 6373/17285 [57:08:58<96:02:19, 31.68s/it] 37%|███▋ | 6374/17285 [57:09:24<91:01:20, 30.03s/it] 37%|███▋ | 6375/17285 [57:10:01<96:54:30, 31.98s/it] 37%|███▋ | 6376/17285 [57:10:28<92:27:11, 30.51s/it] 37%|███▋ | 6377/17285 [57:11:04<97:31:36, 32.19s/it] 37%|███▋ | 6378/17285 [57:11:33<95:06:16, 31.39s/it] 37%|███▋ | 6379/17285 [57:12:02<92:27:33, 30.52s/it] 37%|███▋ | 6380/17285 [57:12:28<88:09:20, 29.10s/it] {'loss': 1.4665, 'learning_rate': 0.0001505017609311527, 'epoch': 1.11} + 37%|███▋ | 6380/17285 [57:12:28<88:09:20, 29.10s/it] 37%|███▋ | 6381/17285 [57:12:56<87:38:53, 28.94s/it] 37%|███▋ | 6382/17285 [57:13:35<96:48:02, 31.96s/it] 37%|███▋ | 6383/17285 [57:13:59<89:46:37, 29.65s/it] 37%|███▋ | 6384/17285 [57:14:34<94:24:30, 31.18s/it] 37%|███▋ | 6385/17285 [57:15:09<98:10:29, 32.42s/it] 37%|███▋ | 6386/17285 [57:15:35<92:17:10, 30.48s/it] 37%|███▋ | 6387/17285 [57:16:03<89:54:30, 29.70s/it] 37%|███▋ | 6388/17285 [57:16:35<91:21:18, 30.18s/it] 37%|███▋ | 6389/17285 [57:17:08<94:42:23, 31.29s/it] 37%|███▋ | 6390/17285 [57:17:38<93:11:08, 30.79s/it] {'loss': 1.4544, 'learning_rate': 0.00015033653236090806, 'epoch': 1.11} + 37%|███▋ | 6390/17285 [57:17:38<93:11:08, 30.79s/it][2023-08-25 09:12:43,270] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 37%|███▋ | 6391/17285 [57:18:06<90:11:53, 29.81s/it] 37%|███▋ | 6392/17285 [57:18:38<92:15:41, 30.49s/it] 37%|███▋ | 6393/17285 [57:19:17<100:09:06, 33.10s/it] 37%|███▋ | 6394/17285 [57:19:47<97:38:10, 32.27s/it] 37%|███▋ | 6395/17285 [57:20:18<95:52:21, 31.69s/it] 37%|███▋ | 6396/17285 [57:20:50<96:15:14, 31.82s/it] 37%|███▋ | 6397/17285 [57:21:24<98:41:16, 32.63s/it] 37%|███▋ | 6398/17285 [57:22:03<104:01:22, 34.40s/it] 37%|███▋ | 6399/17285 [57:22:38<104:49:01, 34.66s/it] 37%|███▋ | 6400/17285 [57:23:07<99:17:22, 32.84s/it] {'loss': 1.4356, 'learning_rate': 0.00015018766908612838, 'epoch': 1.11} + 37%|███▋ | 6400/17285 [57:23:07<99:17:22, 32.84s/it] 37%|███▋ | 6401/17285 [57:23:40<100:00:42, 33.08s/it] 37%|███▋ | 6402/17285 [57:24:08<95:37:46, 31.63s/it] 37%|███▋ | 6403/17285 [57:24:43<97:47:49, 32.35s/it] 37%|███▋ | 6404/17285 [57:25:19<101:52:22, 33.70s/it] 37%|███▋ | 6405/17285 [57:25:50<99:23:12, 32.89s/it] 37%|███▋ | 6406/17285 [57:26:26<101:38:32, 33.63s/it] 37%|███▋ | 6407/17285 [57:27:00<102:32:51, 33.94s/it] 37%|███▋ | 6408/17285 [57:27:29<97:46:50, 32.36s/it] 37%|███▋ | 6409/17285 [57:28:11<106:08:18, 35.13s/it] 37%|███▋ | 6410/17285 [57:28:40<100:57:45, 33.42s/it] {'loss': 1.4497, 'learning_rate': 0.00015002209093511546, 'epoch': 1.11} + 37%|███▋ | 6410/17285 [57:28:40<100:57:45, 33.42s/it] 37%|███▋ | 6411/17285 [57:29:07<94:45:14, 31.37s/it] 37%|███▋ | 6412/17285 [57:29:32<88:51:43, 29.42s/it] 37%|███▋ | 6413/17285 [57:29:59<87:13:33, 28.88s/it] 37%|███▋ | 6414/17285 [57:30:33<91:16:14, 30.22s/it] 37%|███▋ | 6415/17285 [57:31:03<91:06:33, 30.17s/it] 37%|███▋ | 6416/17285 [57:31:33<91:30:56, 30.31s/it] 37%|███▋ | 6417/17285 [57:32:10<97:09:32, 32.18s/it] 37%|███▋ | 6418/17285 [57:32:46<100:33:41, 33.31s/it] 37%|███▋ | 6419/17285 [57:33:14<95:50:15, 31.75s/it] 37%|███▋ | 6420/17285 [57:33:47<96:42:28, 32.04s/it] {'loss': 1.4469, 'learning_rate': 0.00014985632967280134, 'epoch': 1.11} + 37%|███▋ | 6420/17285 [57:33:47<96:42:28, 32.04s/it] 37%|███▋ | 6421/17285 [57:34:19<97:26:37, 32.29s/it] 37%|███▋ | 6422/17285 [57:34:47<93:19:17, 30.93s/it] 37%|███▋ | 6423/17285 [57:35:16<91:02:00, 30.17s/it] 37%|███▋ | 6424/17285 [57:35:44<89:06:51, 29.54s/it] 37%|███▋ | 6425/17285 [57:36:16<91:31:11, 30.34s/it] 37%|███▋ | 6426/17285 [57:36:48<93:33:14, 31.02s/it] 37%|███▋ | 6427/17285 [57:37:22<95:47:22, 31.76s/it] 37%|███▋ | 6428/17285 [57:37:55<97:06:54, 32.20s/it] 37%|███▋ | 6429/17285 [57:38:31<100:26:07, 33.31s/it] 37%|███▋ | 6430/17285 [57:39:04<100:05:43, 33.20s/it] {'loss': 1.4334, 'learning_rate': 0.00014969038590597315, 'epoch': 1.12} + 37%|███▋ | 6430/17285 [57:39:04<100:05:43, 33.20s/it] 37%|███▋ | 6431/17285 [57:39:31<94:11:26, 31.24s/it] 37%|███▋ | 6432/17285 [57:40:09<100:39:52, 33.39s/it] 37%|███▋ | 6433/17285 [57:40:45<102:36:18, 34.04s/it] 37%|███▋ | 6434/17285 [57:41:17<100:50:10, 33.45s/it] 37%|███▋ | 6435/17285 [57:41:53<103:27:51, 34.33s/it] 37%|███▋ | 6436/17285 [57:42:18<94:46:17, 31.45s/it] 37%|███▋ | 6437/17285 [57:42:51<96:02:30, 31.87s/it] 37%|███▋ | 6438/17285 [57:43:21<94:14:16, 31.28s/it] 37%|███▋ | 6439/17285 [57:43:50<93:00:10, 30.87s/it] 37%|███▋ | 6440/17285 [57:44:24<95:01:27, 31.54s/it] {'loss': 1.4714, 'learning_rate': 0.0001495242602420861, 'epoch': 1.12} + 37%|███▋ | 6440/17285 [57:44:24<95:01:27, 31.54s/it] 37%|███▋ | 6441/17285 [57:44:50<90:27:15, 30.03s/it] 37%|███▋ | 6442/17285 [57:45:16<86:24:11, 28.69s/it] 37%|███▋ | 6443/17285 [57:45:53<94:27:40, 31.37s/it] 37%|███▋ | 6444/17285 [57:46:24<94:17:35, 31.31s/it] 37%|███▋ | 6445/17285 [57:46:53<91:31:09, 30.39s/it] 37%|███▋ | 6446/17285 [57:47:28<96:03:29, 31.90s/it] 37%|███▋ | 6447/17285 [57:47:58<94:38:42, 31.44s/it] 37%|███▋ | 6448/17285 [57:48:34<98:08:27, 32.60s/it] 37%|███▋ | 6449/17285 [57:49:07<98:37:49, 32.77s/it] 37%|███▋ | 6450/17285 [57:49:38<97:14:49, 32.31s/it] {'loss': 1.4577, 'learning_rate': 0.00014935795328926125, 'epoch': 1.12} + 37%|███▋ | 6450/17285 [57:49:38<97:14:49, 32.31s/it] 37%|███▋ | 6451/17285 [57:50:14<100:08:37, 33.28s/it] 37%|███▋ | 6452/17285 [57:50:48<101:16:10, 33.65s/it] 37%|███▋ | 6453/17285 [57:51:17<96:55:52, 32.21s/it] 37%|███▋ | 6454/17285 [57:51:52<99:00:14, 32.91s/it] 37%|███▋ | 6455/17285 [57:52:18<93:16:20, 31.00s/it] 37%|███▋ | 6456/17285 [57:52:48<91:57:31, 30.57s/it] 37%|███▋ | 6457/17285 [57:53:24<96:41:52, 32.15s/it] 37%|███▋ | 6458/17285 [57:53:55<95:39:11, 31.80s/it] 37%|███▋ | 6459/17285 [57:54:29<97:43:35, 32.50s/it] 37%|███▋ | 6460/17285 [57:54:55<92:15:34, 30.68s/it] {'loss': 1.4247, 'learning_rate': 0.00014919146565628327, 'epoch': 1.12} + 37%|███▋ | 6460/17285 [57:54:55<92:15:34, 30.68s/it] 37%|███▋ | 6461/17285 [57:55:25<91:51:18, 30.55s/it] 37%|███▋ | 6462/17285 [57:56:00<95:26:59, 31.75s/it] 37%|███▋ | 6463/17285 [57:56:31<94:38:25, 31.48s/it] 37%|███▋ | 6464/17285 [57:57:06<97:50:17, 32.55s/it] 37%|███▋ | 6465/17285 [57:57:41<100:19:49, 33.38s/it] 37%|███▋ | 6466/17285 [57:58:09<95:36:50, 31.82s/it] 37%|███▋ | 6467/17285 [57:58:41<95:51:00, 31.90s/it] 37%|███▋ | 6468/17285 [57:59:07<90:21:05, 30.07s/it] 37%|███▋ | 6469/17285 [57:59:43<95:08:09, 31.67s/it] 37%|███▋ | 6470/17285 [58:00:14<94:34:48, 31.48s/it] {'loss': 1.4811, 'learning_rate': 0.00014902479795259822, 'epoch': 1.12} + 37%|███▋ | 6470/17285 [58:00:14<94:34:48, 31.48s/it] 37%|███▋ | 6471/17285 [58:00:40<89:38:52, 29.84s/it] 37%|███▋ | 6472/17285 [58:01:17<96:47:22, 32.22s/it] 37%|███▋ | 6473/17285 [58:01:53<99:44:27, 33.21s/it] 37%|███▋ | 6474/17285 [58:02:26<99:25:59, 33.11s/it] 37%|███▋ | 6475/17285 [58:02:56<96:23:38, 32.10s/it] 37%|███▋ | 6476/17285 [58:03:24<93:24:18, 31.11s/it] 37%|███▋ | 6477/17285 [58:03:54<92:18:53, 30.75s/it] 37%|███▋ | 6478/17285 [58:04:42<107:22:19, 35.77s/it] 37%|███▋ | 6479/17285 [58:05:13<103:25:00, 34.45s/it] 37%|███▋ | 6480/17285 [58:05:51<106:15:00, 35.40s/it] {'loss': 1.4309, 'learning_rate': 0.00014885795078831132, 'epoch': 1.12} + 37%|███▋ | 6480/17285 [58:05:51<106:15:00, 35.40s/it] 37%|███▋ | 6481/17285 [58:06:21<101:39:30, 33.87s/it] 38%|███▊ | 6482/17285 [58:06:49<96:06:02, 32.02s/it] 38%|��██▊ | 6483/17285 [58:07:18<93:32:44, 31.18s/it] 38%|███▊ | 6484/17285 [58:07:57<100:44:08, 33.58s/it] 38%|███▊ | 6485/17285 [58:08:29<99:35:19, 33.20s/it] 38%|███▊ | 6486/17285 [58:09:02<99:26:00, 33.15s/it] 38%|███▊ | 6487/17285 [58:09:37<100:55:03, 33.65s/it] 38%|███▊ | 6488/17285 [58:10:04<94:53:01, 31.64s/it] 38%|███▊ | 6489/17285 [58:10:33<92:17:23, 30.77s/it] 38%|███▊ | 6490/17285 [58:11:11<98:35:15, 32.88s/it] {'loss': 1.4853, 'learning_rate': 0.00014869092477418482, 'epoch': 1.13} + 38%|███▊ | 6490/17285 [58:11:11<98:35:15, 32.88s/it] 38%|███▊ | 6491/17285 [58:11:37<92:21:28, 30.80s/it] 38%|███▊ | 6492/17285 [58:12:10<94:43:05, 31.59s/it] 38%|███▊ | 6493/17285 [58:12:45<97:16:25, 32.45s/it] 38%|███▊ | 6494/17285 [58:13:20<99:28:04, 33.18s/it] 38%|███▊ | 6495/17285 [58:13:49<95:46:41, 31.96s/it] 38%|███▊ | 6496/17285 [58:14:20<95:21:49, 31.82s/it] 38%|███▊ | 6497/17285 [58:14:54<97:12:46, 32.44s/it] 38%|███▊ | 6498/17285 [58:15:21<92:30:12, 30.87s/it] 38%|███▊ | 6499/17285 [58:15:51<91:29:14, 30.54s/it] 38%|███▊ | 6500/17285 [58:16:17<87:25:11, 29.18s/it] {'loss': 1.4507, 'learning_rate': 0.00014852372052163553, 'epoch': 1.13} + 38%|███▊ | 6500/17285 [58:16:17<87:25:11, 29.18s/it] 38%|███▊ | 6501/17285 [58:16:51<92:00:18, 30.71s/it] 38%|███▊ | 6502/17285 [58:17:29<98:25:25, 32.86s/it] 38%|███▊ | 6503/17285 [58:17:58<94:29:10, 31.55s/it] 38%|███▊ | 6504/17285 [58:18:27<92:04:38, 30.75s/it] 38%|███▊ | 6505/17285 [58:18:58<92:20:51, 30.84s/it] 38%|███▊ | 6506/17285 [58:19:34<97:25:33, 32.54s/it] 38%|███▊ | 6507/17285 [58:20:02<93:03:36, 31.08s/it] 38%|███▊ | 6508/17285 [58:20:26<87:18:49, 29.17s/it] 38%|███▊ | 6509/17285 [58:20:55<86:33:19, 28.92s/it] 38%|███▊ | 6510/17285 [58:21:28<90:10:32, 30.13s/it] {'loss': 1.4455, 'learning_rate': 0.00014835633864273287, 'epoch': 1.13} + 38%|███▊ | 6510/17285 [58:21:28<90:10:32, 30.13s/it] 38%|███▊ | 6511/17285 [58:22:00<92:22:00, 30.86s/it] 38%|███▊ | 6512/17285 [58:22:32<92:49:49, 31.02s/it] 38%|███▊ | 6513/17285 [58:23:05<94:26:26, 31.56s/it] 38%|███▊ | 6514/17285 [58:23:36<94:06:21, 31.45s/it] 38%|███▊ | 6515/17285 [58:24:17<103:18:04, 34.53s/it][2023-08-25 10:19:27,567] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 38%|███▊ | 6516/17285 [58:24:50<101:24:03, 33.90s/it] 38%|███▊ | 6517/17285 [58:25:19<96:41:44, 32.33s/it] 38%|███▊ | 6518/17285 [58:25:53<98:10:49, 32.83s/it] 38%|███▊ | 6519/17285 [58:26:31<103:30:59, 34.61s/it] 38%|███▊ | 6520/17285 [58:27:11<107:45:56, 36.04s/it] {'loss': 1.4421, 'learning_rate': 0.0001482055435875876, 'epoch': 1.13} + 38%|███▊ | 6520/17285 [58:27:11<107:45:56, 36.04s/it] 38%|███▊ | 6521/17285 [58:27:41<102:28:08, 34.27s/it] 38%|███▊ | 6522/17285 [58:28:09<96:46:04, 32.37s/it] 38%|███▊ | 6523/17285 [58:28:40<95:55:49, 32.09s/it] 38%|███▊ | 6524/17285 [58:29:07<91:02:26, 30.46s/it] 38%|███▊ | 6525/17285 [58:29:36<90:06:22, 30.15s/it] 38%|███▊ | 6526/17285 [58:30:13<95:50:57, 32.07s/it] 38%|███▊ | 6527/17285 [58:30:44<95:04:24, 31.81s/it] 38%|███▊ | 6528/17285 [58:31:15<94:24:13, 31.59s/it] 38%|███▊ | 6529/17285 [58:31:50<97:43:34, 32.71s/it] 38%|███▊ | 6530/17285 [58:32:28<101:49:44, 34.09s/it] {'loss': 1.4758, 'learning_rate': 0.0001480378259071914, 'epoch': 1.13} + 38%|███▊ | 6530/17285 [58:32:28<101:49:44, 34.09s/it] 38%|███▊ | 6531/17285 [58:33:00<99:57:21, 33.46s/it] 38%|███▊ | 6532/17285 [58:33:29<96:30:30, 32.31s/it] 38%|███▊ | 6533/17285 [58:34:12<106:03:56, 35.51s/it] 38%|███▊ | 6534/17285 [58:34:44<103:01:52, 34.50s/it] 38%|███▊ | 6535/17285 [58:35:19<103:16:25, 34.58s/it] 38%|███▊ | 6536/17285 [58:35:52<101:45:14, 34.08s/it] 38%|███▊ | 6537/17285 [58:36:23<98:37:01, 33.03s/it] 38%|███▊ | 6538/17285 [58:36:53<95:42:32, 32.06s/it] 38%|███▊ | 6539/17285 [58:37:24<95:29:14, 31.99s/it] 38%|███▊ | 6540/17285 [58:38:03<101:08:11, 33.88s/it] {'loss': 1.4564, 'learning_rate': 0.00014786993237911187, 'epoch': 1.14} + 38%|███▊ | 6540/17285 [58:38:03<101:08:11, 33.88s/it] 38%|███▊ | 6541/17285 [58:38:32<96:53:52, 32.47s/it] 38%|███▊ | 6542/17285 [58:38:58<91:23:12, 30.62s/it] 38%|███▊ | 6543/17285 [58:39:26<89:18:00, 29.93s/it] 38%|███▊ | 6544/17285 [58:39:57<89:44:36, 30.08s/it] 38%|███▊ | 6545/17285 [58:40:26<89:11:08, 29.89s/it] 38%|███▊ | 6546/17285 [58:41:01<93:05:31, 31.21s/it] 38%|███▊ | 6547/17285 [58:41:29<90:48:01, 30.44s/it][2023-08-25 10:36:34,163] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 38%|███▊ | 6548/17285 [58:41:56<87:53:12, 29.47s/it] 38%|███▊ | 6549/17285 [58:42:31<92:41:13, 31.08s/it] 38%|███▊ | 6550/17285 [58:43:12<100:52:28, 33.83s/it] {'loss': 1.4497, 'learning_rate': 0.00014771867836201847, 'epoch': 1.14} + 38%|███▊ | 6550/17285 [58:43:12<100:52:28, 33.83s/it] 38%|███▊ | 6551/17285 [58:43:47<102:09:45, 34.26s/it] 38%|███▊ | 6552/17285 [58:44:27<106:59:36, 35.89s/it] 38%|███▊ | 6553/17285 [58:45:01<105:49:53, 35.50s/it] 38%|███▊ | 6554/17285 [58:45:27<97:36:44, 32.75s/it] 38%|███▊ | 6555/17285 [58:45:57<94:44:27, 31.79s/it] 38%|███▊ | 6556/17285 [58:46:27<93:21:48, 31.33s/it] 38%|███▊ | 6557/17285 [58:47:02<96:18:33, 32.32s/it] 38%|███▊ | 6558/17285 [58:47:42<102:54:07, 34.53s/it] 38%|███▊ | 6559/17285 [58:48:07<94:43:22, 31.79s/it] 38%|███▊ | 6560/17285 [58:48:38<94:27:38, 31.71s/it] {'loss': 1.4975, 'learning_rate': 0.00014755045241707308, 'epoch': 1.14} + 38%|███▊ | 6560/17285 [58:48:38<94:27:38, 31.71s/it] 38%|███▊ | 6561/17285 [58:49:07<91:45:31, 30.80s/it] 38%|███▊ | 6562/17285 [58:49:36<89:45:50, 30.14s/it] 38%|███▊ | 6563/17285 [58:50:06<89:40:46, 30.11s/it] 38%|███▊ | 6564/17285 [58:50:43<96:00:31, 32.24s/it] 38%|███▊ | 6565/17285 [58:51:13<93:43:34, 31.48s/it] 38%|███▊ | 6566/17285 [58:51:37<87:32:18, 29.40s/it] 38%|███▊ | 6567/17285 [58:52:09<89:56:24, 30.21s/it] 38%|███▊ | 6568/17285 [58:52:44<93:49:15, 31.52s/it] 38%|███▊ | 6569/17285 [58:53:15<93:47:11, 31.51s/it] 38%|███▊ | 6570/17285 [58:53:47<93:28:50, 31.41s/it] {'loss': 1.4897, 'learning_rate': 0.00014738205240852806, 'epoch': 1.14} + 38%|███▊ | 6570/17285 [58:53:47<93:28:50, 31.41s/it] 38%|███▊ | 6571/17285 [58:54:14<89:56:05, 30.22s/it] 38%|███▊ | 6572/17285 [58:54:52<96:50:17, 32.54s/it] 38%|███▊ | 6573/17285 [58:55:27<98:50:14, 33.22s/it] 38%|███▊ | 6574/17285 [58:55:56<95:07:33, 31.97s/it] 38%|███▊ | 6575/17285 [58:56:31<98:20:06, 33.05s/it] 38%|███▊ | 6576/17285 [58:56:58<92:09:01, 30.98s/it] 38%|███▊ | 6577/17285 [58:57:32<95:04:26, 31.96s/it] 38%|███▊ | 6578/17285 [58:58:06<96:41:14, 32.51s/it] 38%|███▊ | 6579/17285 [58:58:38<96:15:21, 32.37s/it] 38%|███▊ | 6580/17285 [58:59:07<93:26:50, 31.43s/it] {'loss': 1.4567, 'learning_rate': 0.00014721347895282978, 'epoch': 1.14} + 38%|███▊ | 6580/17285 [58:59:07<93:26:50, 31.43s/it] 38%|███▊ | 6581/17285 [58:59:42<96:44:08, 32.53s/it] 38%|███▊ | 6582/17285 [59:00:13<95:21:57, 32.08s/it] 38%|███▊ | 6583/17285 [59:00:47<96:53:22, 32.59s/it] 38%|███▊ | 6584/17285 [59:01:24<101:01:06, 33.98s/it] 38%|███▊ | 6585/17285 [59:01:58<100:51:39, 33.93s/it] 38%|███▊ | 6586/17285 [59:02:30<99:12:17, 33.38s/it] 38%|███▊ | 6587/17285 [59:03:05<100:41:35, 33.88s/it] 38%|███▊ | 6588/17285 [59:03:32<94:22:28, 31.76s/it] 38%|███▊ | 6589/17285 [59:04:07<97:30:23, 32.82s/it] 38%|███▊ | 6590/17285 [59:04:37<95:17:39, 32.08s/it] {'loss': 1.4813, 'learning_rate': 0.0001470447326670598, 'epoch': 1.14} + 38%|███▊ | 6590/17285 [59:04:37<95:17:39, 32.08s/it] 38%|███▊ | 6591/17285 [59:05:07<92:58:36, 31.30s/it] 38%|███▊ | 6592/17285 [59:05:43<97:07:49, 32.70s/it] 38%|███▊ | 6593/17285 [59:06:16<97:25:33, 32.80s/it] 38%|███▊ | 6594/17285 [59:06:45<93:55:45, 31.63s/it] 38%|███▊ | 6595/17285 [59:07:13<91:08:35, 30.69s/it] 38%|███▊ | 6596/17285 [59:07:56<101:42:26, 34.25s/it] 38%|██���▊ | 6597/17285 [59:08:35<105:36:01, 35.57s/it] 38%|███▊ | 6598/17285 [59:09:04<100:13:10, 33.76s/it] 38%|███▊ | 6599/17285 [59:09:34<96:59:47, 32.68s/it] 38%|███▊ | 6600/17285 [59:10:02<92:21:28, 31.12s/it] {'loss': 1.4861, 'learning_rate': 0.00014687581416893218, 'epoch': 1.15} + 38%|███▊ | 6600/17285 [59:10:02<92:21:28, 31.12s/it] 38%|███▊ | 6601/17285 [59:10:28<88:06:29, 29.69s/it] 38%|███▊ | 6602/17285 [59:11:01<91:12:00, 30.73s/it] 38%|███▊ | 6603/17285 [59:11:32<90:55:37, 30.64s/it] 38%|███▊ | 6604/17285 [59:12:00<88:26:48, 29.81s/it] 38%|███▊ | 6605/17285 [59:12:34<93:01:32, 31.36s/it] 38%|███▊ | 6606/17285 [59:13:03<90:16:44, 30.43s/it] 38%|███▊ | 6607/17285 [59:13:29<86:28:39, 29.16s/it] 38%|███▊ | 6608/17285 [59:13:57<85:39:58, 28.88s/it] 38%|███▊ | 6609/17285 [59:14:29<88:10:01, 29.73s/it] 38%|███▊ | 6610/17285 [59:14:56<86:13:56, 29.08s/it] {'loss': 1.4759, 'learning_rate': 0.0001467067240767915, 'epoch': 1.15} + 38%|███▊ | 6610/17285 [59:14:56<86:13:56, 29.08s/it] 38%|███▊ | 6611/17285 [59:15:27<87:10:30, 29.40s/it] 38%|███▊ | 6612/17285 [59:15:58<89:11:30, 30.08s/it] 38%|███▊ | 6613/17285 [59:16:40<99:49:39, 33.68s/it] 38%|███▊ | 6614/17285 [59:17:14<99:43:09, 33.64s/it] 38%|███▊ | 6615/17285 [59:17:51<102:43:03, 34.66s/it] 38%|███▊ | 6616/17285 [59:18:21<98:27:10, 33.22s/it] 38%|███▊ | 6617/17285 [59:19:00<103:50:38, 35.04s/it] 38%|███▊ | 6618/17285 [59:19:40<107:52:46, 36.41s/it] 38%|███▊ | 6619/17285 [59:20:11<103:05:03, 34.79s/it] 38%|███▊ | 6620/17285 [59:20:41<98:46:13, 33.34s/it] {'loss': 1.4753, 'learning_rate': 0.00014653746300961038, 'epoch': 1.15} + 38%|███▊ | 6620/17285 [59:20:41<98:46:13, 33.34s/it] 38%|███▊ | 6621/17285 [59:21:12<97:18:25, 32.85s/it] 38%|███▊ | 6622/17285 [59:21:43<95:46:04, 32.33s/it] 38%|███▊ | 6623/17285 [59:22:14<94:14:45, 31.82s/it] 38%|███▊ | 6624/17285 [59:22:45<93:36:31, 31.61s/it] 38%|███▊ | 6625/17285 [59:23:24<99:51:58, 33.73s/it] 38%|███▊ | 6626/17285 [59:23:52<95:10:04, 32.14s/it] 38%|███▊ | 6627/17285 [59:24:25<95:52:24, 32.38s/it] 38%|███▊ | 6628/17285 [59:24:52<90:49:26, 30.68s/it] 38%|███▊ | 6629/17285 [59:25:28<95:27:26, 32.25s/it] 38%|███▊ | 6630/17285 [59:25:58<93:48:09, 31.69s/it] {'loss': 1.464, 'learning_rate': 0.00014636803158698738, 'epoch': 1.15} + 38%|███▊ | 6630/17285 [59:25:58<93:48:09, 31.69s/it] 38%|███▊ | 6631/17285 [59:26:27<91:14:59, 30.83s/it] 38%|███▊ | 6632/17285 [59:26:52<86:18:26, 29.17s/it] 38%|███▊ | 6633/17285 [59:27:30<93:42:13, 31.67s/it] 38%|███▊ | 6634/17285 [59:28:05<96:30:08, 32.62s/it] 38%|███▊ | 6635/17285 [59:28:32<91:33:33, 30.95s/it] 38%|███▊ | 6636/17285 [59:29:02<91:08:59, 30.81s/it] 38%|███▊ | 6637/17285 [59:29:32<89:53:41, 30.39s/it] 38%|███▊ | 6638/17285 [59:30:04<91:51:30, 31.06s/it] 38%|███▊ | 6639/17285 [59:30:34<90:16:25, 30.53s/it] 38%|███▊ | 6640/17285 [59:31:02<88:07:15, 29.80s/it] {'loss': 1.4849, 'learning_rate': 0.00014619843042914466, 'epoch': 1.15} + 38%|███▊ | 6640/17285 [59:31:02<88:07:15, 29.80s/it] 38%|███▊ | 6641/17285 [59:31:28<85:06:18, 28.78s/it] 38%|███▊ | 6642/17285 [59:31:53<81:51:17, 27.69s/it] 38%|███▊ | 6643/17285 [59:32:25<85:04:41, 28.78s/it] 38%|███▊ | 6644/17285 [59:32:56<87:17:28, 29.53s/it] 38%|███▊ | 6645/17285 [59:33:26<88:00:37, 29.78s/it] 38%|███▊ | 6646/17285 [59:33:53<85:24:34, 28.90s/it] 38%|███▊ | 6647/17285 [59:34:25<88:14:16, 29.86s/it] 38%|███▊ | 6648/17285 [59:35:00<92:39:44, 31.36s/it] 38%|███▊ | 6649/17285 [59:35:25<87:05:29, 29.48s/it] 38%|███▊ | 6650/17285 [59:35:53<85:40:48, 29.00s/it] {'loss': 1.4503, 'learning_rate': 0.00014602866015692563, 'epoch': 1.15} + 38%|███▊ | 6650/17285 [59:35:53<85:40:48, 29.00s/it] 38%|███▊ | 6651/17285 [59:36:31<93:58:11, 31.81s/it] 38%|███▊ | 6652/17285 [59:37:05<95:31:09, 32.34s/it][2023-08-25 11:32:13,218] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 38%|███▊ | 6653/17285 [59:37:36<93:56:15, 31.81s/it] 38%|███▊ | 6654/17285 [59:38:05<92:15:17, 31.24s/it] 39%|███▊ | 6655/17285 [59:38:31<86:53:32, 29.43s/it] 39%|███▊ | 6656/17285 [59:39:03<89:13:09, 30.22s/it] 39%|███▊ | 6657/17285 [59:39:38<93:55:04, 31.81s/it] 39%|███▊ | 6658/17285 [59:40:13<96:22:28, 32.65s/it] 39%|███▊ | 6659/17285 [59:40:51<101:28:17, 34.38s/it] 39%|███▊ | 6660/17285 [59:41:26<101:41:36, 34.46s/it] {'loss': 1.4429, 'learning_rate': 0.00014587572283276284, 'epoch': 1.16} + 39%|███▊ | 6660/17285 [59:41:26<101:41:36, 34.46s/it] 39%|███▊ | 6661/17285 [59:41:53<95:19:55, 32.30s/it] 39%|███▊ | 6662/17285 [59:42:19<89:12:08, 30.23s/it] 39%|███▊ | 6663/17285 [59:42:53<92:30:29, 31.35s/it] 39%|███▊ | 6664/17285 [59:43:24<92:21:30, 31.31s/it] 39%|███▊ | 6665/17285 [59:43:58<94:42:38, 32.11s/it] 39%|███▊ | 6666/17285 [59:44:32<96:21:32, 32.67s/it] 39%|███▊ | 6667/17285 [59:45:09<100:13:27, 33.98s/it] 39%|███▊ | 6668/17285 [59:45:35<93:25:45, 31.68s/it] 39%|███▊ | 6669/17285 [59:46:10<95:55:50, 32.53s/it][2023-08-25 11:41:12,880] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 39%|███▊ | 6670/17285 [59:46:35<89:49:25, 30.46s/it] {'loss': 1.481, 'learning_rate': 0.00014572264948280539, 'epoch': 1.16} + 39%|███▊ | 6670/17285 [59:46:35<89:49:25, 30.46s/it] 39%|███▊ | 6671/17285 [59:47:02<86:18:36, 29.27s/it] 39%|███▊ | 6672/17285 [59:47:39<93:03:01, 31.56s/it] 39%|███▊ | 6673/17285 [59:48:06<89:47:39, 30.46s/it] 39%|███▊ | 6674/17285 [59:48:44<96:19:31, 32.68s/it] 39%|███▊ | 6675/17285 [59:49:13<93:02:25, 31.57s/it] 39%|███▊ | 6676/17285 [59:49:48<95:59:23, 32.57s/it] 39%|███▊ | 6677/17285 [59:50:16<92:03:15, 31.24s/it] 39%|███▊ | 6678/17285 [59:50:50<93:47:19, 31.83s/it] 39%|███▊ | 6679/17285 [59:51:20<92:55:55, 31.54s/it] 39%|███▊ | 6680/17285 [59:51:54<94:44:25, 32.16s/it] {'loss': 1.4358, 'learning_rate': 0.00014555240899848083, 'epoch': 1.16} + 39%|███▊ | 6680/17285 [59:51:55<94:44:25, 32.16s/it] 39%|███▊ | 6681/17285 [59:52:31<98:38:45, 33.49s/it] 39%|███▊ | 6682/17285 [59:53:03<97:39:00, 33.15s/it] 39%|███▊ | 6683/17285 [59:53:43<103:15:28, 35.06s/it] 39%|███▊ | 6684/17285 [59:54:23<107:42:15, 36.58s/it] 39%|███▊ | 6685/17285 [59:54:53<101:52:30, 34.60s/it] 39%|███▊ | 6686/17285 [59:55:32<106:04:00, 36.03s/it] 39%|███▊ | 6687/17285 [59:56:08<106:29:22, 36.17s/it] 39%|███▊ | 6688/17285 [59:56:38<100:54:02, 34.28s/it] 39%|███▊ | 6689/17285 [59:57:05<93:48:14, 31.87s/it] 39%|███▊ | 6690/17285 [59:57:47<103:00:12, 35.00s/it] {'loss': 1.4236, 'learning_rate': 0.00014538200176461162, 'epoch': 1.16} + 39%|███▊ | 6690/17285 [59:57:47<103:00:12, 35.00s/it] 39%|███▊ | 6691/17285 [59:58:19<100:31:14, 34.16s/it] 39%|███▊ | 6692/17285 [59:58:51<98:23:57, 33.44s/it] 39%|███▊ | 6693/17285 [59:59:16<91:02:53, 30.95s/it] 39%|███▊ | 6694/17285 [59:59:47<90:46:45, 30.86s/it] 39%|███▊ | 6695/17285 [60:00:16<89:15:06, 30.34s/it] 39%|███▊ | 6696/17285 [60:00:42<85:38:50, 29.12s/it] 39%|███▊ | 6697/17285 [60:01:13<87:14:20, 29.66s/it] 39%|███▉ | 6698/17285 [60:01:48<91:49:21, 31.22s/it] 39%|███▉ | 6699/17285 [60:02:13<86:04:19, 29.27s/it] 39%|███▉ | 6700/17285 [60:02:40<84:23:22, 28.70s/it] {'loss': 1.4508, 'learning_rate': 0.00014521142840499203, 'epoch': 1.16} + 39%|███▉ | 6700/17285 [60:02:40<84:23:22, 28.70s/it] 39%|███▉ | 6701/17285 [60:03:22<95:51:38, 32.61s/it] 39%|███▉ | 6702/17285 [60:04:05<105:06:45, 35.76s/it] 39%|███▉ | 6703/17285 [60:04:30<95:30:30, 32.49s/it] 39%|███▉ | 6704/17285 [60:05:10<102:52:07, 35.00s/it] 39%|███▉ | 6705/17285 [60:05:39<97:18:20, 33.11s/it] 39%|███▉ | 6706/17285 [60:06:10<94:53:24, 32.29s/it] 39%|███▉ | 6707/17285 [60:06:41<94:22:55, 32.12s/it] 39%|███▉ | 6708/17285 [60:07:12<93:31:16, 31.83s/it] 39%|███▉ | 6709/17285 [60:07:47<95:57:22, 32.66s/it] 39%|███▉ | 6710/17285 [60:08:13<90:18:23, 30.74s/it] {'loss': 1.4858, 'learning_rate': 0.0001450406895440244, 'epoch': 1.16} + 39%|███▉ | 6710/17285 [60:08:13<90:18:23, 30.74s/it] 39%|███▉ | 6711/17285 [60:08:45<91:13:39, 31.06s/it] 39%|███▉ | 6712/17285 [60:09:18<92:49:17, 31.60s/it] 39%|███▉ | 6713/17285 [60:09:45<89:05:29, 30.34s/it] 39%|███▉ | 6714/17285 [60:10:19<92:10:33, 31.39s/it] 39%|███▉ | 6715/17285 [60:11:00<100:18:41, 34.16s/it] 39%|███▉ | 6716/17285 [60:11:34<100:43:19, 34.31s/it] 39%|███▉ | 6717/17285 [60:12:12<103:13:49, 35.17s/it] 39%|███▉ | 6718/17285 [60:12:37<94:23:15, 32.16s/it] 39%|███▉ | 6719/17285 [60:13:03<88:53:55, 30.29s/it] 39%|███▉ | 6720/17285 [60:13:31<87:22:38, 29.77s/it] {'loss': 1.4396, 'learning_rate': 0.0001448697858067168, 'epoch': 1.17} + 39%|███▉ | 6720/17285 [60:13:31<87:22:38, 29.77s/it] 39%|███▉ | 6721/17285 [60:14:10<94:58:38, 32.37s/it] 39%|███▉ | 6722/17285 [60:14:36<89:56:29, 30.65s/it] 39%|███▉ | 6723/17285 [60:15:07<89:37:50, 30.55s/it] 39%|███▉ | 6724/17285 [60:15:32<84:39:24, 28.86s/it] 39%|███▉ | 6725/17285 [60:16:02<86:16:04, 29.41s/it] 39%|███▉ | 6726/17285 [60:16:37<90:55:48, 31.00s/it] 39%|███▉ | 6727/17285 [60:17:06<89:18:11, 30.45s/it] 39%|███▉ | 6728/17285 [60:17:37<90:04:40, 30.72s/it] 39%|███▉ | 6729/17285 [60:18:05<87:08:37, 29.72s/it] 39%|███▉ | 6730/17285 [60:18:43<94:34:58, 32.26s/it] {'loss': 1.4119, 'learning_rate': 0.00014469871781868098, 'epoch': 1.17} + 39%|███▉ | 6730/17285 [60:18:45<94:34:58, 32.26s/it] 39%|███▉ | 6731/17285 [60:19:14<93:23:15, 31.85s/it] 39%|███▉ | 6732/17285 [60:19:47<94:07:13, 32.11s/it] 39%|███▉ | 6733/17285 [60:20:20<94:59:45, 32.41s/it] 39%|███▉ | 6734/17285 [60:20:53<95:36:27, 32.62s/it] 39%|███▉ | 6735/17285 [60:21:22<92:20:07, 31.51s/it] 39%|███▉ | 6736/17285 [60:21:51<89:53:32, 30.68s/it] 39%|███▉ | 6737/17285 [60:22:24<92:19:00, 31.51s/it] 39%|███▉ | 6738/17285 [60:22:54<90:49:52, 31.00s/it] 39%|███▉ | 6739/17285 [60:23:23<89:25:47, 30.53s/it] 39%|███▉ | 6740/17285 [60:23:57<92:00:49, 31.41s/it] {'loss': 1.4862, 'learning_rate': 0.00014452748620612992, 'epoch': 1.17} + 39%|███▉ | 6740/17285 [60:23:57<92:00:49, 31.41s/it] 39%|███▉ | 6741/17285 [60:24:28<91:43:27, 31.32s/it] 39%|███▉ | 6742/17285 [60:24:53<86:26:43, 29.52s/it] 39%|███▉ | 6743/17285 [60:25:34<96:33:54, 32.98s/it] 39%|███▉ | 6744/17285 [60:26:06<95:53:30, 32.75s/it] 39%|███▉ | 6745/17285 [60:26:42<98:46:59, 33.74s/it] 39%|███▉ | 6746/17285 [60:27:09<92:01:41, 31.44s/it] 39%|███▉ | 6747/17285 [60:27:41<92:41:28, 31.67s/it] 39%|███▉ | 6748/17285 [60:28:18<97:50:58, 33.43s/it] 39%|███▉ | 6749/17285 [60:28:45<91:58:51, 31.43s/it] 39%|███▉ | 6750/17285 [60:29:12<87:46:00, 29.99s/it] {'loss': 1.4585, 'learning_rate': 0.00014435609159587555, 'epoch': 1.17} + 39%|███▉ | 6750/17285 [60:29:12<87:46:00, 29.99s/it] 39%|███▉ | 6751/17285 [60:29:43<88:48:35, 30.35s/it] 39%|███▉ | 6752/17285 [60:30:16<91:06:01, 31.14s/it] 39%|███▉ | 6753/17285 [60:30:50<93:20:26, 31.91s/it] 39%|███▉ | 6754/17285 [60:31:16<88:15:15, 30.17s/it][2023-08-25 12:26:26,855] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 39%|███▉ | 6755/17285 [60:31:49<91:11:35, 31.18s/it] 39%|███▉ | 6756/17285 [60:32:33<102:19:35, 34.99s/it] 39%|███▉ | 6757/17285 [60:33:05<99:28:59, 34.02s/it] 39%|███▉ | 6758/17285 [60:33:32<93:12:44, 31.88s/it] 39%|███▉ | 6759/17285 [60:34:04<93:14:51, 31.89s/it] 39%|███▉ | 6760/17285 [60:34:28<86:34:02, 29.61s/it] {'loss': 1.4725, 'learning_rate': 0.0001442016976021512, 'epoch': 1.17} + 39%|███▉ | 6760/17285 [60:34:28<86:34:02, 29.61s/it] 39%|███▉ | 6761/17285 [60:34:55<84:36:57, 28.94s/it] 39%|███▉ | 6762/17285 [60:35:26<86:03:49, 29.44s/it] 39%|███▉ | 6763/17285 [60:35:54<84:55:49, 29.06s/it] 39%|███▉ | 6764/17285 [60:36:24<85:58:41, 29.42s/it] 39%|███▉ | 6765/17285 [60:36:57<88:38:24, 30.33s/it] 39%|███▉ | 6766/17285 [60:37:30<91:21:17, 31.27s/it] 39%|███▉ | 6767/17285 [60:37:55<85:27:00, 29.25s/it] 39%|███▉ | 6768/17285 [60:38:34<94:24:38, 32.32s/it] 39%|███▉ | 6769/17285 [60:39:04<91:52:02, 31.45s/it] 39%|███▉ | 6770/17285 [60:39:34<91:15:21, 31.24s/it] {'loss': 1.4497, 'learning_rate': 0.00014402999502526254, 'epoch': 1.17} + 39%|███▉ | 6770/17285 [60:39:34<91:15:21, 31.24s/it] 39%|███▉ | 6771/17285 [60:40:05<90:53:15, 31.12s/it] 39%|███▉ | 6772/17285 [60:40:39<93:15:24, 31.93s/it] 39%|███▉ | 6773/17285 [60:41:14<95:47:30, 32.81s/it] 39%|███▉ | 6774/17285 [60:41:49<97:38:02, 33.44s/it] 39%|███▉ | 6775/17285 [60:42:22<97:15:23, 33.31s/it] 39%|███▉ | 6776/17285 [60:42:53<95:22:15, 32.67s/it] 39%|███▉ | 6777/17285 [60:43:30<99:28:09, 34.08s/it] 39%|███▉ | 6778/17285 [60:44:06<100:56:35, 34.59s/it] 39%|███▉ | 6779/17285 [60:44:46<105:08:51, 36.03s/it] 39%|███▉ | 6780/17285 [60:45:21<104:37:00, 35.85s/it] {'loss': 1.4455, 'learning_rate': 0.00014385813127179106, 'epoch': 1.18} + 39%|███▉ | 6780/17285 [60:45:21<104:37:00, 35.85s/it] 39%|███▉ | 6781/17285 [60:45:54<102:29:05, 35.12s/it] 39%|███▉ | 6782/17285 [60:46:20<93:55:38, 32.19s/it] 39%|███▉ | 6783/17285 [60:46:54<95:16:28, 32.66s/it] 39%|███▉ | 6784/17285 [60:47:26<95:23:43, 32.70s/it] 39%|███▉ | 6785/17285 [60:48:00<95:57:05, 32.90s/it] 39%|███▉ | 6786/17285 [60:48:30<94:01:52, 32.24s/it] 39%|███▉ | 6787/17285 [60:49:00<92:05:20, 31.58s/it] 39%|███▉ | 6788/17285 [60:49:28<88:48:00, 30.45s/it] 39%|███▉ | 6789/17285 [60:49:58<88:07:34, 30.23s/it] 39%|███▉ | 6790/17285 [60:50:42<100:17:12, 34.40s/it] {'loss': 1.4252, 'learning_rate': 0.00014368610697086277, 'epoch': 1.18} + 39%|███▉ | 6790/17285 [60:50:42<100:17:12, 34.40s/it] 39%|███▉ | 6791/17285 [60:51:09<93:48:19, 32.18s/it] 39%|███▉ | 6792/17285 [60:51:42<94:07:34, 32.29s/it] 39%|███▉ | 6793/17285 [60:52:06<86:54:35, 29.82s/it] 39%|███▉ | 6794/17285 [60:52:42<92:33:14, 31.76s/it] 39%|███▉ | 6795/17285 [60:53:10<89:33:56, 30.74s/it] 39%|███▉ | 6796/17285 [60:53:41<89:08:31, 30.60s/it] 39%|███▉ | 6797/17285 [60:54:10<88:02:26, 30.22s/it] 39%|███▉ | 6798/17285 [60:54:40<88:15:40, 30.30s/it] 39%|███▉ | 6799/17285 [60:55:20<95:58:59, 32.95s/it] 39%|███▉ | 6800/17285 [60:55:50<93:52:50, 32.23s/it] {'loss': 1.4629, 'learning_rate': 0.00014351392275219134, 'epoch': 1.18} + 39%|███▉ | 6800/17285 [60:55:50<93:52:50, 32.23s/it] 39%|███▉ | 6801/17285 [60:56:18<90:17:18, 31.00s/it] 39%|███▉ | 6802/17285 [60:56:50<90:55:55, 31.23s/it] 39%|███▉ | 6803/17285 [60:57:29<98:07:24, 33.70s/it] 39%|███▉ | 6804/17285 [60:57:57<92:49:46, 31.88s/it] 39%|███▉ | 6805/17285 [60:58:32<95:37:30, 32.85s/it] 39%|███▉ | 6806/17285 [60:59:10<99:33:22, 34.20s/it] 39%|███▉ | 6807/17285 [60:59:43<98:57:54, 34.00s/it] 39%|███▉ | 6808/17285 [61:00:08<90:41:05, 31.16s/it] 39%|███▉ | 6809/17285 [61:00:45<96:07:44, 33.03s/it] 39%|███▉ | 6810/17285 [61:01:17<95:09:00, 32.70s/it] {'loss': 1.4628, 'learning_rate': 0.00014334157924607578, 'epoch': 1.18} + 39%|███▉ | 6810/17285 [61:01:17<95:09:00, 32.70s/it] 39%|███▉ | 6811/17285 [61:01:48<94:03:55, 32.33s/it] 39%|███▉ | 6812/17285 [61:02:28<100:24:03, 34.51s/it] 39%|███▉ | 6813/17285 [61:02:58<96:33:02, 33.19s/it] 39%|███▉ | 6814/17285 [61:03:27<92:49:53, 31.92s/it] 39%|███▉ | 6815/17285 [61:04:04<96:53:53, 33.32s/it] 39%|███▉ | 6816/17285 [61:04:40<99:11:45, 34.11s/it] 39%|███▉ | 6817/17285 [61:05:06<92:36:14, 31.85s/it] 39%|███▉ | 6818/17285 [61:05:47<100:46:06, 34.66s/it] 39%|███▉ | 6819/17285 [61:06:20<98:35:19, 33.91s/it] 39%|███▉ | 6820/17285 [61:06:50<95:44:02, 32.93s/it] {'loss': 1.4321, 'learning_rate': 0.00014316907708339822, 'epoch': 1.18} + 39%|███▉ | 6820/17285 [61:06:50<95:44:02, 32.93s/it] 39%|███▉ | 6821/17285 [61:07:27<98:40:20, 33.95s/it] 39%|███▉ | 6822/17285 [61:08:02<99:55:09, 34.38s/it] 39%|███▉ | 6823/17285 [61:08:38<100:55:15, 34.73s/it] 39%|███▉ | 6824/17285 [61:09:04<93:45:23, 32.26s/it] 39%|███▉ | 6825/17285 [61:09:37<94:26:56, 32.51s/it] 39%|███▉ | 6826/17285 [61:10:03<88:37:14, 30.50s/it] 39%|███▉ | 6827/17285 [61:10:35<89:56:33, 30.96s/it] 40%|███▉ | 6828/17285 [61:11:08<91:55:42, 31.65s/it] 40%|███▉ | 6829/17285 [61:11:38<90:16:40, 31.08s/it] 40%|███▉ | 6830/17285 [61:12:11<92:12:55, 31.75s/it] {'loss': 1.4558, 'learning_rate': 0.00014299641689562156, 'epoch': 1.19} + 40%|███▉ | 6830/17285 [61:12:11<92:12:55, 31.75s/it] 40%|███▉ | 6831/17285 [61:12:45<93:34:17, 32.22s/it] 40%|███▉ | 6832/17285 [61:13:14<91:11:04, 31.40s/it] 40%|███▉ | 6833/17285 [61:13:45<90:55:47, 31.32s/it] 40%|███▉ | 6834/17285 [61:14:13<88:07:33, 30.36s/it] 40%|███▉ | 6835/17285 [61:14:49<92:40:24, 31.93s/it] 40%|███▉ | 6836/17285 [61:15:20<91:51:00, 31.65s/it] 40%|███▉ | 6837/17285 [61:15:55<94:38:17, 32.61s/it] 40%|███▉ | 6838/17285 [61:16:27<94:22:33, 32.52s/it] 40%|███▉ | 6839/17285 [61:16:52<87:55:05, 30.30s/it] 40%|███▉ | 6840/17285 [61:17:22<87:38:29, 30.21s/it] {'loss': 1.4762, 'learning_rate': 0.0001428235993147873, 'epoch': 1.19} + 40%|███▉ | 6840/17285 [61:17:22<87:38:29, 30.21s/it] 40%|███▉ | 6841/17285 [61:17:52<87:09:47, 30.04s/it] 40%|███▉ | 6842/17285 [61:18:23<87:43:26, 30.24s/it] 40%|███▉ | 6843/17285 [61:19:01<94:51:51, 32.71s/it] 40%|███▉ | 6844/17285 [61:19:35<95:48:27, 33.03s/it] 40%|███▉ | 6845/17285 [61:20:03<91:55:57, 31.70s/it] 40%|███▉ | 6846/17285 [61:20:33<89:43:11, 30.94s/it] 40%|███▉ | 6847/17285 [61:21:06<92:00:05, 31.73s/it] 40%|███▉ | 6848/17285 [61:21:40<93:41:20, 32.32s/it] 40%|███▉ | 6849/17285 [61:22:06<88:06:33, 30.39s/it] 40%|███▉ | 6850/17285 [61:22:31<83:13:55, 28.71s/it] {'loss': 1.4601, 'learning_rate': 0.00014265062497351285, 'epoch': 1.19} + 40%|███▉ | 6850/17285 [61:22:31<83:13:55, 28.71s/it] 40%|███▉ | 6851/17285 [61:23:02<85:35:33, 29.53s/it] 40%|███▉ | 6852/17285 [61:23:34<87:20:45, 30.14s/it] 40%|███▉ | 6853/17285 [61:24:14<96:40:00, 33.36s/it] 40%|███▉ | 6854/17285 [61:24:50<98:52:50, 34.13s/it] 40%|███▉ | 6855/17285 [61:25:16<91:28:40, 31.57s/it] 40%|███▉ | 6856/17285 [61:25:45<89:04:15, 30.75s/it] 40%|███▉ | 6857/17285 [61:26:15<88:41:27, 30.62s/it] 40%|███▉ | 6858/17285 [61:26:45<87:40:24, 30.27s/it] 40%|███▉ | 6859/17285 [61:27:22<94:05:13, 32.49s/it] 40%|███▉ | 6860/17285 [61:28:00<98:16:52, 33.94s/it] {'loss': 1.4782, 'learning_rate': 0.00014247749450498962, 'epoch': 1.19} + 40%|███▉ | 6860/17285 [61:28:00<98:16:52, 33.94s/it] 40%|███▉ | 6861/17285 [61:28:35<99:12:56, 34.26s/it] 40%|███▉ | 6862/17285 [61:29:12<101:57:20, 35.21s/it] 40%|███▉ | 6863/17285 [61:29:42<97:05:32, 33.54s/it] 40%|███▉ | 6864/17285 [61:30:08<90:56:21, 31.42s/it] 40%|███▉ | 6865/17285 [61:30:46<96:47:32, 33.44s/it] 40%|███▉ | 6866/17285 [61:31:20<97:26:53, 33.67s/it] 40%|███▉ | 6867/17285 [61:31:50<94:01:24, 32.49s/it] 40%|███▉ | 6868/17285 [61:32:27<98:09:19, 33.92s/it] 40%|███▉ | 6869/17285 [61:33:04<100:10:58, 34.63s/it] 40%|███▉ | 6870/17285 [61:33:43<103:51:04, 35.90s/it] {'loss': 1.4407, 'learning_rate': 0.00014230420854298054, 'epoch': 1.19} + 40%|███▉ | 6870/17285 [61:33:43<103:51:04, 35.90s/it] 40%|███▉ | 6871/17285 [61:34:13<99:17:11, 34.32s/it] 40%|███▉ | 6872/17285 [61:34:38<90:45:23, 31.38s/it] 40%|███▉ | 6873/17285 [61:35:12<93:05:06, 32.18s/it] 40%|███▉ | 6874/17285 [61:35:47<95:17:18, 32.95s/it] 40%|███▉ | 6875/17285 [61:36:12<88:31:56, 30.62s/it] 40%|███▉ | 6876/17285 [61:36:49<94:23:15, 32.64s/it] 40%|███▉ | 6877/17285 [61:37:20<92:36:36, 32.03s/it] 40%|███▉ | 6878/17285 [61:37:56<96:17:13, 33.31s/it] 40%|███▉ | 6879/17285 [61:38:32<98:35:29, 34.11s/it] 40%|███▉ | 6880/17285 [61:39:06<98:16:42, 34.00s/it] {'loss': 1.4164, 'learning_rate': 0.00014213076772181767, 'epoch': 1.19} + 40%|███▉ | 6880/17285 [61:39:06<98:16:42, 34.00s/it] 40%|███▉ | 6881/17285 [61:39:37<96:14:22, 33.30s/it] 40%|███▉ | 6882/17285 [61:40:14<99:33:51, 34.45s/it] 40%|███▉ | 6883/17285 [61:40:45<95:54:51, 33.19s/it] 40%|███▉ | 6884/17285 [61:41:21<98:37:19, 34.14s/it] 40%|███▉ | 6885/17285 [61:41:49<93:20:10, 32.31s/it] 40%|███▉ | 6886/17285 [61:42:29<100:07:46, 34.66s/it] 40%|███▉ | 6887/17285 [61:43:05<101:10:49, 35.03s/it] 40%|███▉ | 6888/17285 [61:43:32<94:04:29, 32.57s/it] 40%|███▉ | 6889/17285 [61:43:58<88:22:47, 30.60s/it] 40%|███▉ | 6890/17285 [61:44:28<87:53:42, 30.44s/it] {'loss': 1.4223, 'learning_rate': 0.00014195717267640004, 'epoch': 1.2} + 40%|███▉ | 6890/17285 [61:44:28<87:53:42, 30.44s/it] 40%|███▉ | 6891/17285 [61:44:57<86:45:12, 30.05s/it] 40%|███▉ | 6892/17285 [61:45:31<90:24:51, 31.32s/it] 40%|███▉ | 6893/17285 [61:46:06<93:13:02, 32.29s/it] 40%|███▉ | 6894/17285 [61:46:36<91:30:06, 31.70s/it] 40%|███▉ | 6895/17285 [61:47:11<94:12:32, 32.64s/it] 40%|███▉ | 6896/17285 [61:47:41<91:55:04, 31.85s/it] 40%|███▉ | 6897/17285 [61:48:12<90:38:05, 31.41s/it] 40%|███▉ | 6898/17285 [61:48:50<96:29:17, 33.44s/it] 40%|███▉ | 6899/17285 [61:49:25<97:46:18, 33.89s/it] 40%|███▉ | 6900/17285 [61:50:03<101:27:15, 35.17s/it] {'loss': 1.3949, 'learning_rate': 0.00014178342404219118, 'epoch': 1.2} + 40%|███▉ | 6900/17285 [61:50:03<101:27:15, 35.17s/it] 40%|███▉ | 6901/17285 [61:50:34<98:00:03, 33.98s/it] 40%|███▉ | 6902/17285 [61:51:07<96:48:47, 33.57s/it] 40%|███▉ | 6903/17285 [61:51:39<95:38:03, 33.16s/it] 40%|███▉ | 6904/17285 [61:52:06<90:31:27, 31.39s/it] 40%|███▉ | 6905/17285 [61:52:36<89:10:21, 30.93s/it] 40%|███▉ | 6906/17285 [61:53:02<84:39:21, 29.36s/it] 40%|███▉ | 6907/17285 [61:53:44<95:41:28, 33.19s/it] 40%|███▉ | 6908/17285 [61:54:10<89:39:16, 31.10s/it] 40%|███▉ | 6909/17285 [61:54:43<91:05:29, 31.60s/it] 40%|███▉ | 6910/17285 [61:55:13<89:43:02, 31.13s/it] {'loss': 1.4718, 'learning_rate': 0.00014160952245521682, 'epoch': 1.2} + 40%|███▉ | 6910/17285 [61:55:13<89:43:02, 31.13s/it] 40%|███▉ | 6911/17285 [61:55:42<87:56:24, 30.52s/it] 40%|███▉ | 6912/17285 [61:56:24<97:49:44, 33.95s/it] 40%|███▉ | 6913/17285 [61:57:01<100:52:20, 35.01s/it][2023-08-25 13:52:13,921] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 40%|████ | 6914/17285 [61:57:36<100:42:23, 34.96s/it] 40%|████ | 6915/17285 [61:58:08<97:30:49, 33.85s/it] 40%|████ | 6916/17285 [61:58:32<89:27:14, 31.06s/it] 40%|████ | 6917/17285 [61:58:57<84:25:31, 29.31s/it] 40%|████ | 6918/17285 [61:59:27<84:32:43, 29.36s/it] 40%|████ | 6919/17285 [61:59:57<85:21:00, 29.64s/it] 40%|████ | 6920/17285 [62:00:32<90:15:36, 31.35s/it] {'loss': 1.4722, 'learning_rate': 0.00014145288077845185, 'epoch': 1.2} + 40%|████ | 6920/17285 [62:00:32<90:15:36, 31.35s/it] 40%|████ | 6921/17285 [62:01:00<86:46:36, 30.14s/it] 40%|████ | 6922/17285 [62:01:35<91:00:15, 31.61s/it] 40%|████ | 6923/17285 [62:02:09<93:12:21, 32.38s/it] 40%|████ | 6924/17285 [62:02:40<91:55:04, 31.94s/it] 40%|████ | 6925/17285 [62:03:13<93:08:07, 32.36s/it] 40%|████ | 6926/17285 [62:03:43<90:53:25, 31.59s/it] 40%|████ | 6927/17285 [62:04:13<89:35:42, 31.14s/it] 40%|████ | 6928/17285 [62:04:42<87:36:36, 30.45s/it] 40%|████ | 6929/17285 [62:05:14<88:46:54, 30.86s/it] 40%|████ | 6930/17285 [62:05:45<89:30:16, 31.12s/it] {'loss': 1.4446, 'learning_rate': 0.00014127869033547745, 'epoch': 1.2} + 40%|████ | 6930/17285 [62:05:45<89:30:16, 31.12s/it] 40%|████ | 6931/17285 [62:06:15<88:19:51, 30.71s/it] 40%|████ | 6932/17285 [62:06:47<89:06:06, 30.98s/it] 40%|████ | 6933/17285 [62:07:22<92:38:30, 32.22s/it][2023-08-25 14:02:30,238] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 40%|████ | 6934/17285 [62:07:53<91:15:43, 31.74s/it] 40%|████ | 6935/17285 [62:08:19<86:59:02, 30.26s/it] 40%|████ | 6936/17285 [62:08:51<87:56:38, 30.59s/it] 40%|████ | 6937/17285 [62:09:30<95:43:31, 33.30s/it] 40%|████ | 6938/17285 [62:09:56<89:31:36, 31.15s/it] 40%|████ | 6939/17285 [62:10:34<95:17:35, 33.16s/it] 40%|████ | 6940/17285 [62:10:59<87:49:50, 30.56s/it] {'loss': 1.4475, 'learning_rate': 0.00014112178972372757, 'epoch': 1.2} + 40%|████ | 6940/17285 [62:10:59<87:49:50, 30.56s/it] 40%|████ | 6941/17285 [62:11:32<89:44:38, 31.23s/it] 40%|████ | 6942/17285 [62:12:12<97:25:57, 33.91s/it] 40%|████ | 6943/17285 [62:12:44<95:57:32, 33.40s/it] 40%|████ | 6944/17285 [62:13:09<88:48:31, 30.92s/it] 40%|████ | 6945/17285 [62:13:37<86:28:12, 30.11s/it] 40%|████ | 6946/17285 [62:14:13<91:37:41, 31.90s/it] 40%|████ | 6947/17285 [62:14:51<96:50:34, 33.72s/it] 40%|████ | 6948/17285 [62:15:26<97:53:34, 34.09s/it] 40%|████ | 6949/17285 [62:16:00<97:55:34, 34.11s/it] 40%|████ | 6950/17285 [62:16:36<99:05:33, 34.52s/it] {'loss': 1.4202, 'learning_rate': 0.00014094731272664267, 'epoch': 1.21} + 40%|████ | 6950/17285 [62:16:36<99:05:33, 34.52s/it] 40%|████ | 6951/17285 [62:17:08<96:31:17, 33.62s/it] 40%|████ | 6952/17285 [62:17:47<101:31:07, 35.37s/it] 40%|████ | 6953/17285 [62:18:16<96:10:36, 33.51s/it] 40%|████ | 6954/17285 [62:18:44<91:26:38, 31.87s/it] 40%|████ | 6955/17285 [62:19:19<93:46:54, 32.68s/it] 40%|████ | 6956/17285 [62:19:49<91:47:52, 31.99s/it] 40%|████ | 6957/17285 [62:20:17<88:03:59, 30.70s/it] 40%|████ | 6958/17285 [62:20:47<87:39:57, 30.56s/it] 40%|████ | 6959/17285 [62:21:19<88:48:38, 30.96s/it] 40%|████ | 6960/17285 [62:21:45<84:28:39, 29.45s/it] {'loss': 1.4854, 'learning_rate': 0.00014077268583746858, 'epoch': 1.21} + 40%|████ | 6960/17285 [62:21:45<84:28:39, 29.45s/it] 40%|████ | 6961/17285 [62:22:19<88:09:12, 30.74s/it] 40%|████ | 6962/17285 [62:22:44<83:33:23, 29.14s/it] 40%|████ | 6963/17285 [62:23:18<87:50:25, 30.64s/it] 40%|████ | 6964/17285 [62:23:49<87:54:46, 30.66s/it] 40%|████ | 6965/17285 [62:24:24<91:32:30, 31.93s/it] 40%|████ | 6966/17285 [62:24:52<88:02:12, 30.71s/it] 40%|████ | 6967/17285 [62:25:22<87:36:22, 30.57s/it] 40%|████ | 6968/17285 [62:25:52<87:00:01, 30.36s/it] 40%|████ | 6969/17285 [62:26:16<82:04:24, 28.64s/it] 40%|████ | 6970/17285 [62:26:53<88:35:31, 30.92s/it] {'loss': 1.4541, 'learning_rate': 0.0001405979096954461, 'epoch': 1.21} + 40%|████ | 6970/17285 [62:26:53<88:35:31, 30.92s/it] 40%|████ | 6971/17285 [62:27:27<91:31:58, 31.95s/it] 40%|████ | 6972/17285 [62:28:00<92:03:07, 32.13s/it] 40%|████ | 6973/17285 [62:28:26<87:22:11, 30.50s/it] 40%|████ | 6974/17285 [62:28:54<84:41:45, 29.57s/it] 40%|████ | 6975/17285 [62:29:23<84:48:54, 29.62s/it] 40%|████ | 6976/17285 [62:29:55<86:42:46, 30.28s/it] 40%|████ | 6977/17285 [62:30:27<88:01:46, 30.74s/it] 40%|████ | 6978/17285 [62:31:03<92:14:41, 32.22s/it] 40%|████ | 6979/17285 [62:31:33<90:34:09, 31.64s/it] 40%|████ | 6980/17285 [62:32:11<95:56:33, 33.52s/it] {'loss': 1.4756, 'learning_rate': 0.00014042298494036228, 'epoch': 1.21} + 40%|████ | 6980/17285 [62:32:11<95:56:33, 33.52s/it] 40%|████ | 6981/17285 [62:32:44<95:48:46, 33.48s/it] 40%|████ | 6982/17285 [62:33:18<95:46:44, 33.47s/it] 40%|████ | 6983/17285 [62:33:52<96:30:26, 33.72s/it] 40%|████ | 6984/17285 [62:34:23<93:48:54, 32.79s/it] 40%|████ | 6985/17285 [62:34:58<96:03:55, 33.58s/it] 40%|████ | 6986/17285 [62:35:29<93:56:12, 32.84s/it] 40%|████ | 6987/17285 [62:35:59<91:29:56, 31.99s/it] 40%|████ | 6988/17285 [62:36:31<91:20:03, 31.93s/it] 40%|████ | 6989/17285 [62:37:01<89:53:07, 31.43s/it] 40%|████ | 6990/17285 [62:37:28<86:02:58, 30.09s/it] {'loss': 1.435, 'learning_rate': 0.00014024791221254815, 'epoch': 1.21} + 40%|████ | 6990/17285 [62:37:28<86:02:58, 30.09s/it] 40%|████ | 6991/17285 [62:37:53<81:46:36, 28.60s/it] 40%|████ | 6992/17285 [62:38:20<80:08:47, 28.03s/it] 40%|████ | 6993/17285 [62:38:46<78:27:17, 27.44s/it] 40%|████ | 6994/17285 [62:39:18<82:27:49, 28.85s/it] 40%|████ | 6995/17285 [62:39:57<90:43:33, 31.74s/it] 40%|████ | 6996/17285 [62:40:35<96:06:26, 33.63s/it] 40%|████ | 6997/17285 [62:41:06<94:24:49, 33.04s/it] 40%|████ | 6998/17285 [62:41:44<98:15:23, 34.39s/it] 40%|████ | 6999/17285 [62:42:14<94:14:28, 32.98s/it] 40%|████ | 7000/17285 [62:42:46<93:28:49, 32.72s/it] {'loss': 1.3868, 'learning_rate': 0.0001400726921528765, 'epoch': 1.21} + 40%|████ | 7000/17285 [62:42:46<93:28:49, 32.72s/it][INFO|trainer.py:3081] 2023-08-25 14:37:23,407 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-25 14:37:23,408 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-25 14:37:23,408 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-4000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-7000 +[INFO|tokenization_utils_base.py:2210] 2023-08-25 14:38:48,636 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-7000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-25 14:38:48,641 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-7000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-7000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-7000 + 41%|████ | 7001/17285 [62:44:46<168:23:36, 58.95s/it] 41%|████ | 7002/17285 [62:45:16<143:40:53, 50.30s/it] 41%|████ | 7003/17285 [62:45:52<131:16:37, 45.96s/it] 41%|████ | 7004/17285 [62:46:17<113:27:50, 39.73s/it] 41%|████ | 7005/17285 [62:46:48<106:16:17, 37.22s/it] 41%|████ | 7006/17285 [62:47:20<101:20:00, 35.49s/it] 41%|████ | 7007/17285 [62:47:58<103:45:17, 36.34s/it] 41%|████ | 7008/17285 [62:48:23<94:09:07, 32.98s/it] 41%|████ | 7009/17285 [62:48:54<92:15:54, 32.32s/it] 41%|████ | 7010/17285 [62:49:25<91:08:04, 31.93s/it] {'loss': 1.4526, 'learning_rate': 0.0001398973254027594, 'epoch': 1.22} + 41%|████ | 7010/17285 [62:49:25<91:08:04, 31.93s/it] 41%|████ | 7011/17285 [62:49:58<91:34:06, 32.09s/it] 41%|████ | 7012/17285 [62:50:33<94:27:00, 33.10s/it] 41%|████ | 7013/17285 [62:51:03<91:22:22, 32.02s/it] 41%|████ | 7014/17285 [62:51:40<96:01:58, 33.66s/it] 41%|████ | 7015/17285 [62:52:09<92:00:41, 32.25s/it] 41%|████ | 7016/17285 [62:52:38<88:51:25, 31.15s/it] 41%|████ | 7017/17285 [62:53:10<89:46:36, 31.48s/it] 41%|████ | 7018/17285 [62:53:45<92:35:22, 32.47s/it] 41%|████ | 7019/17285 [62:54:19<94:02:29, 32.98s/it] 41%|████ | 7020/17285 [62:54:47<89:59:36, 31.56s/it] {'loss': 1.4434, 'learning_rate': 0.00013972181260414585, 'epoch': 1.22} + 41%|████ | 7020/17285 [62:54:47<89:59:36, 31.56s/it] 41%|████ | 7021/17285 [62:55:28<98:07:18, 34.42s/it] 41%|████ | 7022/17285 [62:55:58<94:43:30, 33.23s/it] 41%|████ | 7023/17285 [62:56:34<96:43:02, 33.93s/it] 41%|████ | 7024/17285 [62:57:08<97:05:02, 34.06s/it] 41%|████ | 7025/17285 [62:57:34<90:04:22, 31.60s/it] 41%|████ | 7026/17285 [62:58:07<90:53:47, 31.90s/it] 41%|████ | 7027/17285 [62:58:41<92:31:21, 32.47s/it] 41%|████ | 7028/17285 [62:59:12<91:55:53, 32.27s/it] 41%|████ | 7029/17285 [62:59:38<86:31:09, 30.37s/it] 41%|████ | 7030/17285 [63:00:08<85:31:40, 30.02s/it] {'loss': 1.4733, 'learning_rate': 0.0001395461543995196, 'epoch': 1.22} + 41%|████ | 7030/17285 [63:00:08<85:31:40, 30.02s/it] 41%|████ | 7031/17285 [63:00:35<83:31:21, 29.32s/it] 41%|████ | 7032/17285 [63:01:13<90:32:45, 31.79s/it] 41%|████ | 7033/17285 [63:01:38<85:11:35, 29.92s/it] 41%|████ | 7034/17285 [63:02:06<83:04:29, 29.17s/it] 41%|████ | 7035/17285 [63:02:47<93:41:30, 32.91s/it] 41%|████ | 7036/17285 [63:03:13<87:46:51, 30.83s/it] 41%|████ | 7037/17285 [63:03:44<87:45:52, 30.83s/it] 41%|████ | 7038/17285 [63:04:18<90:00:38, 31.62s/it] 41%|████ | 7039/17285 [63:04:49<89:53:39, 31.58s/it] 41%|████ | 7040/17285 [63:05:23<91:42:58, 32.23s/it] {'loss': 1.4456, 'learning_rate': 0.00013937035143189657, 'epoch': 1.22} + 41%|████ | 7040/17285 [63:05:23<91:42:58, 32.23s/it] 41%|████ | 7041/17285 [63:05:53<89:30:15, 31.45s/it] 41%|████ | 7042/17285 [63:06:21<87:14:13, 30.66s/it] 41%|████ | 7043/17285 [63:07:03<96:35:31, 33.95s/it] 41%|████ | 7044/17285 [63:07:36<96:02:45, 33.76s/it] 41%|████ | 7045/17285 [63:08:21<105:07:32, 36.96s/it] 41%|████ | 7046/17285 [63:08:53<101:12:32, 35.58s/it] 41%|████ | 7047/17285 [63:09:28<100:25:27, 35.31s/it] 41%|████ | 7048/17285 [63:09:58<95:41:54, 33.65s/it] 41%|████ | 7049/17285 [63:10:31<95:31:21, 33.60s/it] 41%|████ | 7050/17285 [63:11:04<95:16:06, 33.51s/it] {'loss': 1.4451, 'learning_rate': 0.00013919440434482266, 'epoch': 1.22} + 41%|████ | 7050/17285 [63:11:04<95:16:06, 33.51s/it] 41%|████ | 7051/17285 [63:11:34<91:33:03, 32.20s/it] 41%|████ | 7052/17285 [63:12:04<89:55:24, 31.64s/it] 41%|████ | 7053/17285 [63:12:35<89:27:06, 31.47s/it] 41%|████ | 7054/17285 [63:13:04<87:14:39, 30.70s/it] 41%|████ | 7055/17285 [63:13:40<91:26:27, 32.18s/it] 41%|████ | 7056/17285 [63:14:13<92:33:15, 32.57s/it] 41%|████ | 7057/17285 [63:14:47<93:55:17, 33.06s/it] 41%|████ | 7058/17285 [63:15:21<94:44:18, 33.35s/it] 41%|████ | 7059/17285 [63:15:51<91:19:27, 32.15s/it] 41%|████ | 7060/17285 [63:16:22<90:36:38, 31.90s/it] {'loss': 1.4572, 'learning_rate': 0.00013901831378237124, 'epoch': 1.23} + 41%|████ | 7060/17285 [63:16:22<90:36:38, 31.90s/it] 41%|████ | 7061/17285 [63:16:49<86:29:07, 30.45s/it] 41%|████ | 7062/17285 [63:17:20<86:44:51, 30.55s/it] 41%|████ | 7063/17285 [63:18:05<99:28:06, 35.03s/it] 41%|████ | 7064/17285 [63:18:38<97:24:02, 34.31s/it] 41%|████ | 7065/17285 [63:19:11<96:14:33, 33.90s/it] 41%|████ | 7066/17285 [63:19:35<88:02:48, 31.02s/it] 41%|████ | 7067/17285 [63:20:09<90:34:11, 31.91s/it] 41%|████ | 7068/17285 [63:20:46<94:41:58, 33.37s/it] 41%|████ | 7069/17285 [63:21:12<88:34:33, 31.21s/it] 41%|████ | 7070/17285 [63:21:49<93:03:51, 32.80s/it] {'loss': 1.4455, 'learning_rate': 0.000138842080389141, 'epoch': 1.23} + 41%|████ | 7070/17285 [63:21:49<93:03:51, 32.80s/it] 41%|████ | 7071/17285 [63:22:30<100:08:07, 35.29s/it] 41%|████ | 7072/17285 [63:22:57<93:07:46, 32.83s/it] 41%|████ | 7073/17285 [63:23:26<90:21:04, 31.85s/it] 41%|████ | 7074/17285 [63:23:58<89:55:30, 31.70s/it] 41%|████ | 7075/17285 [63:24:23<84:49:44, 29.91s/it] 41%|████ | 7076/17285 [63:24:49<80:55:50, 28.54s/it] 41%|████ | 7077/17285 [63:25:22<85:08:04, 30.02s/it] 41%|████ | 7078/17285 [63:25:51<84:16:16, 29.72s/it] 41%|████ | 7079/17285 [63:26:21<84:17:53, 29.73s/it] 41%|████ | 7080/17285 [63:26:55<88:08:17, 31.09s/it] {'loss': 1.438, 'learning_rate': 0.00013866570481025346, 'epoch': 1.23} + 41%|████ | 7080/17285 [63:26:55<88:08:17, 31.09s/it] 41%|████ | 7081/17285 [63:27:32<93:13:09, 32.89s/it] 41%|████ | 7082/17285 [63:28:07<94:48:51, 33.45s/it] 41%|████ | 7083/17285 [63:28:47<100:31:42, 35.47s/it] 41%|████ | 7084/17285 [63:29:21<99:15:11, 35.03s/it] 41%|████ | 7085/17285 [63:29:57<99:40:50, 35.18s/it] 41%|████ | 7086/17285 [63:30:21<90:30:27, 31.95s/it] 41%|████ | 7087/17285 [63:30:50<88:11:21, 31.13s/it] 41%|████ | 7088/17285 [63:31:17<84:07:22, 29.70s/it] 41%|████ | 7089/17285 [63:31:50<86:42:14, 30.61s/it] 41%|████ | 7090/17285 [63:32:32<97:09:44, 34.31s/it] {'loss': 1.4261, 'learning_rate': 0.00013848918769135055, 'epoch': 1.23} + 41%|████ | 7090/17285 [63:32:32<97:09:44, 34.31s/it] 41%|████ | 7091/17285 [63:33:12<101:43:33, 35.92s/it] 41%|████ | 7092/17285 [63:33:43<97:46:23, 34.53s/it] 41%|████ | 7093/17285 [63:34:10<91:11:29, 32.21s/it] 41%|████ | 7094/17285 [63:34:46<93:51:52, 33.16s/it] 41%|████ | 7095/17285 [63:35:16<91:08:36, 32.20s/it] 41%|████ | 7096/17285 [63:35:49<92:21:51, 32.63s/it] 41%|████ | 7097/17285 [63:36:20<90:47:59, 32.08s/it] 41%|████ | 7098/17285 [63:36:53<91:12:42, 32.23s/it] 41%|████ | 7099/17285 [63:37:24<90:35:56, 32.02s/it] 41%|████ | 7100/17285 [63:38:05<97:54:55, 34.61s/it] {'loss': 1.436, 'learning_rate': 0.00013831252967859238, 'epoch': 1.23} + 41%|████ | 7100/17285 [63:38:05<97:54:55, 34.61s/it] 41%|████ | 7101/17285 [63:38:31<90:57:35, 32.15s/it] 41%|████ | 7102/17285 [63:39:08<94:34:46, 33.44s/it] 41%|████ | 7103/17285 [63:39:35<89:02:46, 31.48s/it] 41%|████ | 7104/17285 [63:40:12<93:49:48, 33.18s/it] 41%|████ | 7105/17285 [63:40:38<87:54:45, 31.09s/it] 41%|████ | 7106/17285 [63:41:08<87:12:35, 30.84s/it] 41%|████ | 7107/17285 [63:41:37<85:48:29, 30.35s/it] 41%|████ | 7108/17285 [63:42:07<85:11:30, 30.14s/it] 41%|████ | 7109/17285 [63:42:33<81:27:22, 28.82s/it] 41%|████ | 7110/17285 [63:43:05<84:24:30, 29.86s/it] {'loss': 1.4295, 'learning_rate': 0.00013813573141865484, 'epoch': 1.23} + 41%|████ | 7110/17285 [63:43:05<84:24:30, 29.86s/it] 41%|████ | 7111/17285 [63:43:44<91:51:01, 32.50s/it] 41%|████ | 7112/17285 [63:44:16<91:27:35, 32.37s/it][2023-08-25 15:39:23,436] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 41%|████ | 7113/17285 [63:44:46<89:25:18, 31.65s/it] 41%|████ | 7114/17285 [63:45:21<92:44:16, 32.82s/it] 41%|████ | 7115/17285 [63:45:54<92:16:39, 32.66s/it] 41%|████ | 7116/17285 [63:46:23<89:23:52, 31.65s/it] 41%|████ | 7117/17285 [63:46:49<84:46:00, 30.01s/it] 41%|████ | 7118/17285 [63:47:20<85:07:14, 30.14s/it] 41%|████ | 7119/17285 [63:47:45<81:07:25, 28.73s/it] 41%|████ | 7120/17285 [63:48:15<81:53:22, 29.00s/it] {'loss': 1.4416, 'learning_rate': 0.00013797649360826399, 'epoch': 1.24} + 41%|████ | 7120/17285 [63:48:15<81:53:22, 29.00s/it] 41%|████ | 7121/17285 [63:48:48<85:20:22, 30.23s/it] 41%|████ | 7122/17285 [63:49:14<82:12:56, 29.12s/it] 41%|████ | 7123/17285 [63:49:50<88:14:55, 31.26s/it] 41%|████ | 7124/17285 [63:50:30<95:21:14, 33.78s/it] 41%|████ | 7125/17285 [63:50:58<90:42:11, 32.14s/it] 41%|████ | 7126/17285 [63:51:30<90:31:10, 32.08s/it] 41%|████ | 7127/17285 [63:52:15<101:29:02, 35.97s/it] 41%|████ | 7128/17285 [63:52:41<92:29:26, 32.78s/it] 41%|████ | 7129/17285 [63:53:16<94:52:12, 33.63s/it] 41%|████ | 7130/17285 [63:53:47<92:14:02, 32.70s/it] {'loss': 1.4494, 'learning_rate': 0.00013779943066211437, 'epoch': 1.24} + 41%|████ | 7130/17285 [63:53:47<92:14:02, 32.70s/it] 41%|████▏ | 7131/17285 [63:54:18<90:29:26, 32.08s/it] 41%|████▏ | 7132/17285 [63:54:50<91:07:42, 32.31s/it] 41%|████▏ | 7133/17285 [63:55:33<100:11:08, 35.53s/it] 41%|████▏ | 7134/17285 [63:56:07<98:42:40, 35.01s/it] 41%|████▏ | 7135/17285 [63:56:35<92:13:28, 32.71s/it] 41%|████▏ | 7136/17285 [63:57:04<89:27:05, 31.73s/it] 41%|████▏ | 7137/17285 [63:57:37<90:50:03, 32.22s/it] 41%|████▏ | 7138/17285 [63:58:08<89:31:31, 31.76s/it] 41%|████▏ | 7139/17285 [63:58:44<92:47:45, 32.93s/it] 41%|████▏ | 7140/17285 [63:59:20<95:38:41, 33.94s/it] {'loss': 1.4558, 'learning_rate': 0.0001376222293470401, 'epoch': 1.24} + 41%|████▏ | 7140/17285 [63:59:20<95:38:41, 33.94s/it] 41%|████▏ | 7141/17285 [63:59:54<96:04:17, 34.09s/it] 41%|████▏ | 7142/17285 [64:00:28<95:56:29, 34.05s/it] 41%|████▏ | 7143/17285 [64:01:00<93:51:07, 33.31s/it] 41%|████▏ | 7144/17285 [64:01:25<86:48:55, 30.82s/it] 41%|████▏ | 7145/17285 [64:02:00<90:27:34, 32.12s/it] 41%|████▏ | 7146/17285 [64:02:31<89:09:49, 31.66s/it] 41%|████▏ | 7147/17285 [64:02:56<83:55:13, 29.80s/it] 41%|████▏ | 7148/17285 [64:03:23<81:11:59, 28.84s/it] 41%|████▏ | 7149/17285 [64:04:02<90:07:48, 32.01s/it] 41%|████▏ | 7150/17285 [64:04:35<91:03:18, 32.34s/it] {'loss': 1.4371, 'learning_rate': 0.00013744489031170578, 'epoch': 1.24} + 41%|████▏ | 7150/17285 [64:04:35<91:03:18, 32.34s/it] 41%|████▏ | 7151/17285 [64:05:07<90:30:20, 32.15s/it] 41%|████▏ | 7152/17285 [64:05:46<96:11:01, 34.17s/it] 41%|████▏ | 7153/17285 [64:06:17<93:42:57, 33.30s/it] 41%|████▏ | 7154/17285 [64:06:47<90:25:09, 32.13s/it][2023-08-25 16:01:59,158] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 41%|████▏ | 7155/17285 [64:07:21<92:43:34, 32.95s/it] 41%|████▏ | 7156/17285 [64:07:52<90:51:02, 32.29s/it] 41%|████▏ | 7157/17285 [64:08:26<92:28:52, 32.87s/it] 41%|████▏ | 7158/17285 [64:08:53<87:25:43, 31.08s/it] 41%|████▏ | 7159/17285 [64:09:23<86:34:49, 30.78s/it] 41%|████▏ | 7160/17285 [64:09:53<85:48:49, 30.51s/it] {'loss': 1.4876, 'learning_rate': 0.0001372851679656103, 'epoch': 1.24} + 41%|████▏ | 7160/17285 [64:09:53<85:48:49, 30.51s/it] 41%|████▏ | 7161/17285 [64:10:23<85:11:16, 30.29s/it] 41%|████▏ | 7162/17285 [64:10:54<85:37:53, 30.45s/it] 41%|████▏ | 7163/17285 [64:11:32<91:51:45, 32.67s/it] 41%|████▏ | 7164/17285 [64:12:11<97:46:45, 34.78s/it] 41%|████▏ | 7165/17285 [64:12:42<94:14:55, 33.53s/it] 41%|████▏ | 7166/17285 [64:13:11<90:33:51, 32.22s/it] 41%|████▏ | 7167/17285 [64:13:45<91:55:01, 32.70s/it] 41%|████▏ | 7168/17285 [64:14:12<87:05:32, 30.99s/it] 41%|████▏ | 7169/17285 [64:14:41<85:12:07, 30.32s/it] 41%|████▏ | 7170/17285 [64:15:09<83:07:56, 29.59s/it] {'loss': 1.4465, 'learning_rate': 0.00013710756905065686, 'epoch': 1.24} + 41%|████▏ | 7170/17285 [64:15:09<83:07:56, 29.59s/it] 41%|████▏ | 7171/17285 [64:15:38<82:39:57, 29.42s/it] 41%|████▏ | 7172/17285 [64:16:05<80:47:04, 28.76s/it] 41%|████▏ | 7173/17285 [64:16:37<83:44:59, 29.82s/it] 42%|████▏ | 7174/17285 [64:17:07<83:38:55, 29.78s/it] 42%|████▏ | 7175/17285 [64:17:47<92:16:07, 32.86s/it] 42%|████▏ | 7176/17285 [64:18:24<95:40:15, 34.07s/it] 42%|████▏ | 7177/17285 [64:18:54<92:29:29, 32.94s/it] 42%|████▏ | 7178/17285 [64:19:20<86:40:33, 30.87s/it] 42%|████▏ | 7179/17285 [64:20:00<94:06:01, 33.52s/it] 42%|████▏ | 7180/17285 [64:20:29<90:34:31, 32.27s/it] {'loss': 1.5011, 'learning_rate': 0.00013692983429941337, 'epoch': 1.25} + 42%|████▏ | 7180/17285 [64:20:29<90:34:31, 32.27s/it] 42%|████▏ | 7181/17285 [64:21:08<96:15:10, 34.29s/it] 42%|████▏ | 7182/17285 [64:21:43<96:44:39, 34.47s/it] 42%|████▏ | 7183/17285 [64:22:32<108:32:35, 38.68s/it] 42%|████▏ | 7184/17285 [64:23:19<115:39:50, 41.22s/it] 42%|████▏ | 7185/17285 [64:23:55<111:24:21, 39.71s/it] 42%|████▏ | 7186/17285 [64:24:24<102:02:15, 36.37s/it] 42%|████▏ | 7187/17285 [64:24:54<97:20:56, 34.71s/it] 42%|████▏ | 7188/17285 [64:25:22<91:04:15, 32.47s/it] 42%|████▏ | 7189/17285 [64:25:53<90:19:28, 32.21s/it] 42%|████▏ | 7190/17285 [64:26:20<85:40:54, 30.56s/it] {'loss': 1.4399, 'learning_rate': 0.00013675196436249725, 'epoch': 1.25} + 42%|████▏ | 7190/17285 [64:26:20<85:40:54, 30.56s/it] 42%|████▏ | 7191/17285 [64:26:53<87:42:12, 31.28s/it] 42%|████▏ | 7192/17285 [64:27:34<95:39:59, 34.12s/it] 42%|████▏ | 7193/17285 [64:28:03<91:15:40, 32.55s/it] 42%|████▏ | 7194/17285 [64:28:40<95:02:57, 33.91s/it] 42%|████▏ | 7195/17285 [64:29:09<91:14:53, 32.56s/it] 42%|████▏ | 7196/17285 [64:29:38<87:58:28, 31.39s/it] 42%|████▏ | 7197/17285 [64:30:08<87:08:46, 31.10s/it] 42%|████▏ | 7198/17285 [64:30:37<85:08:32, 30.39s/it] 42%|████▏ | 7199/17285 [64:31:09<86:54:26, 31.02s/it] 42%|████▏ | 7200/17285 [64:31:45<90:51:12, 32.43s/it] {'loss': 1.4586, 'learning_rate': 0.00013657395989102067, 'epoch': 1.25} + 42%|████▏ | 7200/17285 [64:31:45<90:51:12, 32.43s/it] 42%|████▏ | 7201/17285 [64:32:12<86:29:27, 30.88s/it] 42%|████▏ | 7202/17285 [64:32:41<84:27:53, 30.16s/it] 42%|████▏ | 7203/17285 [64:33:16<88:17:36, 31.53s/it] 42%|████▏ | 7204/17285 [64:33:51<91:21:49, 32.63s/it] 42%|████▏ | 7205/17285 [64:34:30<97:02:44, 34.66s/it] 42%|████▏ | 7206/17285 [64:35:08<99:58:05, 35.71s/it] 42%|████▏ | 7207/17285 [64:35:39<95:33:45, 34.14s/it] 42%|████▏ | 7208/17285 [64:36:05<88:36:15, 31.65s/it] 42%|████▏ | 7209/17285 [64:36:48<98:21:56, 35.14s/it] 42%|████▏ | 7210/17285 [64:37:19<94:38:51, 33.82s/it] {'loss': 1.431, 'learning_rate': 0.00013639582153658842, 'epoch': 1.25} + 42%|████▏ | 7210/17285 [64:37:19<94:38:51, 33.82s/it] 42%|████▏ | 7211/17285 [64:37:46<89:27:02, 31.97s/it] 42%|████▏ | 7212/17285 [64:38:19<89:55:54, 32.14s/it] 42%|████▏ | 7213/17285 [64:38:47<86:14:33, 30.83s/it] 42%|████▏ | 7214/17285 [64:39:19<87:44:03, 31.36s/it] 42%|████▏ | 7215/17285 [64:39:53<89:40:25, 32.06s/it] 42%|████▏ | 7216/17285 [64:40:20<85:14:17, 30.48s/it] 42%|████▏ | 7217/17285 [64:40:54<88:11:47, 31.54s/it] 42%|████▏ | 7218/17285 [64:41:32<93:31:23, 33.44s/it] 42%|████▏ | 7219/17285 [64:42:04<92:47:41, 33.19s/it] 42%|████▏ | 7220/17285 [64:42:32<88:10:46, 31.54s/it] {'loss': 1.4681, 'learning_rate': 0.00013621754995129522, 'epoch': 1.25} + 42%|████▏ | 7220/17285 [64:42:32<88:10:46, 31.54s/it] 42%|████▏ | 7221/17285 [64:43:04<88:24:20, 31.62s/it] 42%|████▏ | 7222/17285 [64:43:41<93:19:52, 33.39s/it] 42%|████▏ | 7223/17285 [64:44:18<96:05:02, 34.38s/it] 42%|████▏ | 7224/17285 [64:44:47<91:55:47, 32.89s/it] 42%|████▏ | 7225/17285 [64:45:15<87:28:16, 31.30s/it] 42%|████▏ | 7226/17285 [64:45:45<86:10:12, 30.84s/it] 42%|████▏ | 7227/17285 [64:46:20<90:18:05, 32.32s/it] 42%|████▏ | 7228/17285 [64:46:54<91:34:45, 32.78s/it] 42%|████▏ | 7229/17285 [64:47:21<86:44:57, 31.06s/it] 42%|████▏ | 7230/17285 [64:47:51<85:29:45, 30.61s/it] {'loss': 1.47, 'learning_rate': 0.0001360391457877237, 'epoch': 1.25} + 42%|████▏ | 7230/17285 [64:47:52<85:29:45, 30.61s/it] 42%|████▏ | 7231/17285 [64:48:24<87:10:32, 31.21s/it] 42%|████▏ | 7232/17285 [64:48:53<85:38:57, 30.67s/it] 42%|████▏ | 7233/17285 [64:49:29<90:05:47, 32.27s/it] 42%|████▏ | 7234/17285 [64:50:01<89:36:58, 32.10s/it] 42%|████▏ | 7235/17285 [64:50:34<90:32:49, 32.43s/it] 42%|████▏ | 7236/17285 [64:51:06<89:57:31, 32.23s/it] 42%|████▏ | 7237/17285 [64:51:40<92:06:12, 33.00s/it] 42%|████▏ | 7238/17285 [64:52:12<90:45:17, 32.52s/it] 42%|████▏ | 7239/17285 [64:52:45<91:24:14, 32.75s/it] 42%|████▏ | 7240/17285 [64:53:11<85:48:50, 30.75s/it] {'loss': 1.4658, 'learning_rate': 0.0001358606096989416, 'epoch': 1.26} + 42%|████▏ | 7240/17285 [64:53:15<85:48:50, 30.75s/it] 42%|████▏ | 7241/17285 [64:53:49<91:58:16, 32.96s/it] 42%|████▏ | 7242/17285 [64:54:20<90:07:08, 32.30s/it] 42%|████▏ | 7243/17285 [64:54:49<87:11:25, 31.26s/it] 42%|████▏ | 7244/17285 [64:55:20<86:54:33, 31.16s/it] 42%|████▏ | 7245/17285 [64:55:52<87:32:40, 31.39s/it] 42%|████▏ | 7246/17285 [64:56:29<92:46:54, 33.27s/it] 42%|████▏ | 7247/17285 [64:57:02<91:56:54, 32.98s/it] 42%|████▏ | 7248/17285 [64:57:39<95:57:26, 34.42s/it] 42%|████▏ | 7249/17285 [64:58:05<88:13:26, 31.65s/it] 42%|████▏ | 7250/17285 [64:58:33<85:05:42, 30.53s/it] {'loss': 1.4789, 'learning_rate': 0.0001356819423384997, 'epoch': 1.26} + 42%|████▏ | 7250/17285 [64:58:33<85:05:42, 30.53s/it] 42%|████▏ | 7251/17285 [64:59:06<87:13:57, 31.30s/it] 42%|████▏ | 7252/17285 [64:59:37<87:00:22, 31.22s/it] 42%|████▏ | 7253/17285 [65:00:11<89:12:02, 32.01s/it] 42%|████▏ | 7254/17285 [65:00:45<91:03:27, 32.68s/it] 42%|████▏ | 7255/17285 [65:01:10<84:37:33, 30.37s/it] 42%|████▏ | 7256/17285 [65:01:39<83:54:30, 30.12s/it] 42%|████▏ | 7257/17285 [65:02:11<85:27:57, 30.68s/it] 42%|████▏ | 7258/17285 [65:02:45<88:14:56, 31.68s/it] 42%|████▏ | 7259/17285 [65:03:20<90:29:25, 32.49s/it] 42%|████▏ | 7260/17285 [65:03:57<94:20:04, 33.88s/it] {'loss': 1.4218, 'learning_rate': 0.00013550314436042932, 'epoch': 1.26} + 42%|████▏ | 7260/17285 [65:04:02<94:20:04, 33.88s/it] 42%|████▏ | 7261/17285 [65:04:40<102:20:20, 36.75s/it] 42%|████▏ | 7262/17285 [65:05:10<96:42:11, 34.73s/it] 42%|████▏ | 7263/17285 [65:05:40<92:10:32, 33.11s/it] 42%|████▏ | 7264/17285 [65:06:10<89:46:29, 32.25s/it] 42%|████▏ | 7265/17285 [65:06:41<88:53:48, 31.94s/it] 42%|████▏ | 7266/17285 [65:07:16<91:03:14, 32.72s/it] 42%|████▏ | 7267/17285 [65:07:43<86:46:48, 31.18s/it] 42%|████▏ | 7268/17285 [65:08:10<82:52:34, 29.78s/it] 42%|████▏ | 7269/17285 [65:08:40<83:41:54, 30.08s/it] 42%|████▏ | 7270/17285 [65:09:16<88:30:56, 31.82s/it] {'loss': 1.4351, 'learning_rate': 0.0001353242164192399, 'epoch': 1.26} + 42%|████▏ | 7270/17285 [65:09:16<88:30:56, 31.82s/it] 42%|████▏ | 7271/17285 [65:09:53<92:37:10, 33.30s/it] 42%|████▏ | 7272/17285 [65:10:24<90:21:47, 32.49s/it] 42%|████▏ | 7273/17285 [65:10:53<88:00:52, 31.65s/it] 42%|████▏ | 7274/17285 [65:11:28<90:51:29, 32.67s/it] 42%|████▏ | 7275/17285 [65:11:58<88:01:32, 31.66s/it] 42%|████▏ | 7276/17285 [65:12:24<83:22:58, 29.99s/it] 42%|████▏ | 7277/17285 [65:13:09<96:16:27, 34.63s/it] 42%|████▏ | 7278/17285 [65:13:41<93:25:26, 33.61s/it] 42%|████▏ | 7279/17285 [65:14:19<97:10:25, 34.96s/it] 42%|████▏ | 7280/17285 [65:14:57<100:08:19, 36.03s/it] {'loss': 1.4711, 'learning_rate': 0.00013514515916991657, 'epoch': 1.26} + 42%|████▏ | 7280/17285 [65:14:57<100:08:19, 36.03s/it] 42%|████▏ | 7281/17285 [65:15:29<96:43:11, 34.81s/it] 42%|████▏ | 7282/17285 [65:15:55<89:26:35, 32.19s/it] 42%|████▏ | 7283/17285 [65:16:29<91:06:01, 32.79s/it] 42%|████▏ | 7284/17285 [65:16:57<86:36:19, 31.17s/it] 42%|████▏ | 7285/17285 [65:17:28<86:14:01, 31.04s/it] 42%|████▏ | 7286/17285 [65:17:59<86:16:14, 31.06s/it] 42%|████▏ | 7287/17285 [65:18:27<83:47:06, 30.17s/it] 42%|████▏ | 7288/17285 [65:18:59<85:42:34, 30.86s/it] 42%|████▏ | 7289/17285 [65:19:31<86:32:50, 31.17s/it] 42%|████▏ | 7290/17285 [65:20:04<87:37:02, 31.56s/it] {'loss': 1.4263, 'learning_rate': 0.00013496597326791786, 'epoch': 1.27} + 42%|████▏ | 7290/17285 [65:20:04<87:37:02, 31.56s/it] 42%|████▏ | 7291/17285 [65:20:46<96:24:38, 34.73s/it] 42%|████▏ | 7292/17285 [65:21:16<92:41:23, 33.39s/it] 42%|████▏ | 7293/17285 [65:21:52<94:40:14, 34.11s/it] 42%|████▏ | 7294/17285 [65:22:18<87:42:26, 31.60s/it] 42%|████▏ | 7295/17285 [65:22:48<86:46:22, 31.27s/it] 42%|████▏ | 7296/17285 [65:23:34<99:02:37, 35.70s/it] 42%|████▏ | 7297/17285 [65:24:02<92:20:04, 33.28s/it] 42%|████▏ | 7298/17285 [65:24:27<85:33:57, 30.84s/it] 42%|████▏ | 7299/17285 [65:24:54<82:30:09, 29.74s/it] 42%|████▏ | 7300/17285 [65:25:26<84:07:51, 30.33s/it] {'loss': 1.4692, 'learning_rate': 0.00013478665936917332, 'epoch': 1.27} + 42%|████▏ | 7300/17285 [65:25:26<84:07:51, 30.33s/it] 42%|████▏ | 7301/17285 [65:25:59<86:39:34, 31.25s/it] 42%|████▏ | 7302/17285 [65:26:28<84:18:56, 30.41s/it] 42%|████▏ | 7303/17285 [65:26:57<83:44:34, 30.20s/it] 42%|████▏ | 7304/17285 [65:27:29<85:07:26, 30.70s/it] 42%|████▏ | 7305/17285 [65:28:02<86:32:50, 31.22s/it] 42%|████▏ | 7306/17285 [65:28:33<87:02:06, 31.40s/it] 42%|████▏ | 7307/17285 [65:29:06<88:13:20, 31.83s/it] 42%|████▏ | 7308/17285 [65:29:41<90:25:53, 32.63s/it] 42%|████▏ | 7309/17285 [65:30:10<87:30:43, 31.58s/it] 42%|████▏ | 7310/17285 [65:30:42<87:53:36, 31.72s/it] {'loss': 1.457, 'learning_rate': 0.00013460721813008086, 'epoch': 1.27} + 42%|████▏ | 7310/17285 [65:30:42<87:53:36, 31.72s/it] 42%|████▏ | 7311/17285 [65:31:21<94:02:49, 33.95s/it] 42%|████▏ | 7312/17285 [65:31:51<90:34:41, 32.70s/it] 42%|████▏ | 7313/17285 [65:32:25<92:03:17, 33.23s/it] 42%|████▏ | 7314/17285 [65:32:57<91:11:41, 32.93s/it] 42%|████▏ | 7315/17285 [65:33:29<89:59:28, 32.49s/it] 42%|████▏ | 7316/17285 [65:34:03<91:32:57, 33.06s/it] 42%|████▏ | 7317/17285 [65:34:34<89:40:39, 32.39s/it] 42%|████▏ | 7318/17285 [65:35:03<86:32:07, 31.26s/it] 42%|████▏ | 7319/17285 [65:35:33<85:45:37, 30.98s/it] 42%|████▏ | 7320/17285 [65:36:09<89:29:49, 32.33s/it] {'loss': 1.4114, 'learning_rate': 0.0001344276502075047, 'epoch': 1.27} + 42%|████▏ | 7320/17285 [65:36:09<89:29:49, 32.33s/it] 42%|████▏ | 7321/17285 [65:36:44<92:17:35, 33.35s/it] 42%|████▏ | 7322/17285 [65:37:21<94:39:32, 34.20s/it] 42%|████▏ | 7323/17285 [65:38:08<105:17:42, 38.05s/it] 42%|████▏ | 7324/17285 [65:38:44<103:54:09, 37.55s/it] 42%|████▏ | 7325/17285 [65:39:14<97:30:18, 35.24s/it] 42%|████▏ | 7326/17285 [65:39:54<101:43:11, 36.77s/it] 42%|████▏ | 7327/17285 [65:40:23<95:25:59, 34.50s/it] 42%|████▏ | 7328/17285 [65:41:04<100:15:45, 36.25s/it] 42%|████▏ | 7329/17285 [65:41:50<108:38:26, 39.28s/it] 42%|████▏ | 7330/17285 [65:42:25<104:44:15, 37.88s/it] {'loss': 1.395, 'learning_rate': 0.00013424795625877276, 'epoch': 1.27} + 42%|████▏ | 7330/17285 [65:42:25<104:44:15, 37.88s/it] 42%|████▏ | 7331/17285 [65:42:52<96:03:15, 34.74s/it] 42%|████▏ | 7332/17285 [65:43:19<89:45:49, 32.47s/it] 42%|████▏ | 7333/17285 [65:43:53<90:42:41, 32.81s/it] 42%|████▏ | 7334/17285 [65:44:19<85:11:10, 30.82s/it] 42%|████▏ | 7335/17285 [65:44:50<85:00:31, 30.76s/it] 42%|████▏ | 7336/17285 [65:45:23<87:13:04, 31.56s/it] 42%|████▏ | 7337/17285 [65:45:53<85:57:30, 31.11s/it] 42%|████▏ | 7338/17285 [65:46:25<86:41:52, 31.38s/it] 42%|████▏ | 7339/17285 [65:46:57<86:46:15, 31.41s/it] 42%|████▏ | 7340/17285 [65:47:29<87:56:26, 31.83s/it] {'loss': 1.4456, 'learning_rate': 0.0001340681369416742, 'epoch': 1.27} + 42%|████▏ | 7340/17285 [65:47:29<87:56:26, 31.83s/it] 42%|████▏ | 7341/17285 [65:47:59<86:10:40, 31.20s/it] 42%|████▏ | 7342/17285 [65:48:30<85:41:47, 31.03s/it] 42%|████▏ | 7343/17285 [65:48:54<80:10:37, 29.03s/it] 42%|████▏ | 7344/17285 [65:49:27<83:29:57, 30.24s/it] 42%|████▏ | 7345/17285 [65:49:57<83:20:20, 30.18s/it] 42%|████▏ | 7346/17285 [65:50:24<80:35:45, 29.19s/it] 43%|████▎ | 7347/17285 [65:50:58<84:46:47, 30.71s/it] 43%|████▎ | 7348/17285 [65:51:32<87:27:52, 31.69s/it] 43%|████▎ | 7349/17285 [65:51:57<81:37:49, 29.58s/it] 43%|████▎ | 7350/17285 [65:52:31<85:27:34, 30.97s/it] {'loss': 1.4459, 'learning_rate': 0.00013388819291445723, 'epoch': 1.28} + 43%|████▎ | 7350/17285 [65:52:31<85:27:34, 30.97s/it] 43%|████▎ | 7351/17285 [65:53:06<88:55:16, 32.22s/it] 43%|████▎ | 7352/17285 [65:53:41<90:37:50, 32.85s/it] 43%|████▎ | 7353/17285 [65:54:08<86:16:50, 31.27s/it] 43%|████▎ | 7354/17285 [65:54:41<87:19:48, 31.66s/it] 43%|████▎ | 7355/17285 [65:55:07<82:24:03, 29.87s/it] 43%|████��� | 7356/17285 [65:55:33<79:42:22, 28.90s/it] 43%|████▎ | 7357/17285 [65:56:03<80:49:46, 29.31s/it] 43%|████▎ | 7358/17285 [65:56:30<78:22:44, 28.42s/it] 43%|████▎ | 7359/17285 [65:57:07<85:45:29, 31.10s/it] 43%|████▎ | 7360/17285 [65:57:31<80:07:01, 29.06s/it] {'loss': 1.4642, 'learning_rate': 0.0001337081248358265, 'epoch': 1.28} + 43%|████▎ | 7360/17285 [65:57:31<80:07:01, 29.06s/it] 43%|████▎ | 7361/17285 [65:58:00<79:59:08, 29.02s/it] 43%|████▎ | 7362/17285 [65:58:32<82:24:08, 29.90s/it] 43%|████▎ | 7363/17285 [65:59:01<81:39:07, 29.63s/it] 43%|████▎ | 7364/17285 [65:59:41<89:41:02, 32.54s/it] 43%|████▎ | 7365/17285 [66:00:16<92:23:05, 33.53s/it] 43%|████▎ | 7366/17285 [66:00:44<87:12:16, 31.65s/it] 43%|████▎ | 7367/17285 [66:01:16<87:49:36, 31.88s/it] 43%|████▎ | 7368/17285 [66:01:56<94:02:21, 34.14s/it] 43%|████▎ | 7369/17285 [66:02:39<101:53:15, 36.99s/it] 43%|████▎ | 7370/17285 [66:03:11<97:36:09, 35.44s/it] {'loss': 1.4538, 'learning_rate': 0.0001335279333649408, 'epoch': 1.28} + 43%|████▎ | 7370/17285 [66:03:11<97:36:09, 35.44s/it] 43%|████▎ | 7371/17285 [66:03:42<93:46:58, 34.05s/it] 43%|████▎ | 7372/17285 [66:04:07<86:09:34, 31.29s/it] 43%|████▎ | 7373/17285 [66:04:37<84:56:58, 30.85s/it] 43%|████▎ | 7374/17285 [66:05:14<90:34:56, 32.90s/it] 43%|████▎ | 7375/17285 [66:05:50<92:51:22, 33.73s/it] 43%|████▎ | 7376/17285 [66:06:19<88:54:57, 32.30s/it] 43%|████▎ | 7377/17285 [66:06:49<87:16:07, 31.71s/it] 43%|████▎ | 7378/17285 [66:07:19<85:58:31, 31.24s/it] 43%|████▎ | 7379/17285 [66:07:52<87:03:53, 31.64s/it] 43%|████▎ | 7380/17285 [66:08:26<89:05:37, 32.38s/it] {'loss': 1.4443, 'learning_rate': 0.00013334761916141064, 'epoch': 1.28} + 43%|████▎ | 7380/17285 [66:08:26<89:05:37, 32.38s/it] 43%|████▎ | 7381/17285 [66:09:02<91:53:00, 33.40s/it] 43%|████▎ | 7382/17285 [66:09:29<86:34:06, 31.47s/it] 43%|████▎ | 7383/17285 [66:10:02<88:21:05, 32.12s/it] 43%|████▎ | 7384/17285 [66:10:31<85:19:28, 31.02s/it] 43%|████▎ | 7385/17285 [66:10:57<81:39:38, 29.69s/it] 43%|████▎ | 7386/17285 [66:11:26<80:48:48, 29.39s/it] 43%|████▎ | 7387/17285 [66:11:54<79:28:34, 28.91s/it] 43%|████▎ | 7388/17285 [66:12:31<86:06:57, 31.32s/it] 43%|████▎ | 7389/17285 [66:12:59<83:53:41, 30.52s/it] 43%|████▎ | 7390/17285 [66:13:36<88:46:02, 32.30s/it] {'loss': 1.4769, 'learning_rate': 0.00013316718288529567, 'epoch': 1.28} + 43%|████▎ | 7390/17285 [66:13:36<88:46:02, 32.30s/it] 43%|████▎ | 7391/17285 [66:14:11<91:09:58, 33.17s/it] 43%|████▎ | 7392/17285 [66:14:40<87:53:16, 31.98s/it] 43%|████▎ | 7393/17285 [66:15:14<89:05:35, 32.42s/it] 43%|████▎ | 7394/17285 [66:15:41<84:52:33, 30.89s/it] 43%|████▎ | 7395/17285 [66:16:12<84:43:37, 30.84s/it] 43%|████▎ | 7396/17285 [66:16:43<84:44:34, 30.85s/it] 43%|████▎ | 7397/17285 [66:17:12<83:45:05, 30.49s/it][2023-08-25 18:12:33,877] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 43%|████▎ | 7398/17285 [66:17:56<94:43:40, 34.49s/it] 43%|████▎ | 7399/17285 [66:18:31<94:51:14, 34.54s/it] 43%|████▎ | 7400/17285 [66:19:09<97:57:34, 35.68s/it] {'loss': 1.4329, 'learning_rate': 0.00013300468641063172, 'epoch': 1.28} + 43%|████▎ | 7400/17285 [66:19:09<97:57:34, 35.68s/it] 43%|████▎ | 7401/17285 [66:19:36<90:40:58, 33.03s/it] 43%|████▎ | 7402/17285 [66:20:06<88:19:51, 32.18s/it] 43%|████▎ | 7403/17285 [66:20:37<87:30:23, 31.88s/it] 43%|████▎ | 7404/17285 [66:21:10<88:06:57, 32.10s/it] 43%|████▎ | 7405/17285 [66:21:43<88:30:29, 32.25s/it] 43%|████▎ | 7406/17285 [66:22:08<82:59:27, 30.24s/it] 43%|████▎ | 7407/17285 [66:22:36<81:21:05, 29.65s/it] 43%|████▎ | 7408/17285 [66:23:07<82:03:33, 29.91s/it] 43%|████▎ | 7409/17285 [66:23:33<78:44:20, 28.70s/it] 43%|████▎ | 7410/17285 [66:24:04<80:29:28, 29.34s/it] {'loss': 1.4563, 'learning_rate': 0.00013282402001666874, 'epoch': 1.29} + 43%|���███▎ | 7410/17285 [66:24:04<80:29:28, 29.34s/it] 43%|████▎ | 7411/17285 [66:24:36<83:07:36, 30.31s/it] 43%|████▎ | 7412/17285 [66:25:07<83:39:33, 30.50s/it] 43%|████▎ | 7413/17285 [66:25:38<84:01:01, 30.64s/it] 43%|████▎ | 7414/17285 [66:26:08<83:44:04, 30.54s/it] 43%|████▎ | 7415/17285 [66:26:36<81:10:39, 29.61s/it] 43%|████▎ | 7416/17285 [66:27:04<79:46:01, 29.10s/it] 43%|████▎ | 7417/17285 [66:27:44<89:10:51, 32.53s/it] 43%|████▎ | 7418/17285 [66:28:10<83:37:50, 30.51s/it] 43%|████▎ | 7419/17285 [66:28:41<83:59:50, 30.65s/it] 43%|████▎ | 7420/17285 [66:29:07<79:40:46, 29.08s/it] {'loss': 1.487, 'learning_rate': 0.00013264323346681258, 'epoch': 1.29} + 43%|████▎ | 7420/17285 [66:29:07<79:40:46, 29.08s/it] 43%|████▎ | 7421/17285 [66:29:34<78:40:06, 28.71s/it] 43%|████▎ | 7422/17285 [66:30:00<76:23:27, 27.88s/it] 43%|████▎ | 7423/17285 [66:30:28<76:34:30, 27.95s/it] 43%|████▎ | 7424/17285 [66:30:59<78:43:07, 28.74s/it] 43%|████▎ | 7425/17285 [66:31:36<85:38:09, 31.27s/it] 43%|████▎ | 7426/17285 [66:32:07<85:38:31, 31.27s/it] 43%|████▎ | 7427/17285 [66:32:41<87:15:22, 31.86s/it] 43%|████▎ | 7428/17285 [66:33:11<85:34:19, 31.25s/it] 43%|████▎ | 7429/17285 [66:33:41<84:35:50, 30.90s/it] 43%|████▎ | 7430/17285 [66:34:12<85:13:35, 31.13s/it] {'loss': 1.4135, 'learning_rate': 0.00013246232742285206, 'epoch': 1.29} + 43%|████▎ | 7430/17285 [66:34:14<85:13:35, 31.13s/it] 43%|████▎ | 7431/17285 [66:34:50<90:12:05, 32.95s/it] 43%|████▎ | 7432/17285 [66:35:33<98:55:02, 36.14s/it] 43%|████▎ | 7433/17285 [66:36:03<93:32:53, 34.18s/it] 43%|████▎ | 7434/17285 [66:36:33<90:25:06, 33.04s/it] 43%|████▎ | 7435/17285 [66:37:07<91:14:40, 33.35s/it] 43%|████▎ | 7436/17285 [66:37:41<91:16:48, 33.36s/it] 43%|████▎ | 7437/17285 [66:38:14<91:03:44, 33.29s/it] 43%|████▎ | 7438/17285 [66:38:41<85:46:10, 31.36s/it] 43%|████▎ | 7439/17285 [66:39:13<86:42:50, 31.71s/it] 43%|████▎ | 7440/17285 [66:39:41<83:35:06, 30.56s/it] {'loss': 1.485, 'learning_rate': 0.00013228130254701342, 'epoch': 1.29} + 43%|████▎ | 7440/17285 [66:39:44<83:35:06, 30.56s/it] 43%|████▎ | 7441/17285 [66:40:21<91:14:34, 33.37s/it] 43%|████▎ | 7442/17285 [66:40:52<89:46:03, 32.83s/it] 43%|████▎ | 7443/17285 [66:41:35<97:46:56, 35.77s/it] 43%|████▎ | 7444/17285 [66:42:13<99:16:59, 36.32s/it] 43%|████▎ | 7445/17285 [66:42:43<94:16:06, 34.49s/it] 43%|████▎ | 7446/17285 [66:43:16<93:29:46, 34.21s/it][2023-08-25 18:38:28,927] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 43%|████▎ | 7447/17285 [66:43:51<93:59:47, 34.40s/it] 43%|████▎ | 7448/17285 [66:44:20<89:03:25, 32.59s/it] 43%|████▎ | 7449/17285 [66:44:49<86:27:49, 31.65s/it] 43%|████▎ | 7450/17285 [66:45:21<87:02:27, 31.86s/it] {'loss': 1.4554, 'learning_rate': 0.0001321182791051834, 'epoch': 1.29} + 43%|████▎ | 7450/17285 [66:45:21<87:02:27, 31.86s/it] 43%|████▎ | 7451/17285 [66:45:55<88:41:37, 32.47s/it] 43%|████▎ | 7452/17285 [66:46:29<89:24:55, 32.74s/it] 43%|████▎ | 7453/17285 [66:46:56<84:53:03, 31.08s/it] 43%|████▎ | 7454/17285 [66:47:26<84:25:45, 30.92s/it] 43%|████▎ | 7455/17285 [66:48:00<86:52:43, 31.82s/it] 43%|████▎ | 7456/17285 [66:48:32<87:00:56, 31.87s/it] 43%|████▎ | 7457/17285 [66:49:08<90:17:28, 33.07s/it] 43%|████▎ | 7458/17285 [66:49:40<89:15:40, 32.70s/it] 43%|████▎ | 7459/17285 [66:50:18<93:23:44, 34.22s/it] 43%|████▎ | 7460/17285 [66:50:46<88:31:27, 32.44s/it] {'loss': 1.4543, 'learning_rate': 0.00013193703027476557, 'epoch': 1.29} + 43%|████▎ | 7460/17285 [66:50:46<88:31:27, 32.44s/it] 43%|████▎ | 7461/17285 [66:51:22<91:00:16, 33.35s/it] 43%|████▎ | 7462/17285 [66:51:53<89:28:45, 32.79s/it] 43%|████▎ | 7463/17285 [66:52:23<87:26:27, 32.05s/it] 43%|████▎ | 7464/17285 [66:52:50<83:13:58, 30.51s/it] 43%|████▎ | 7465/17285 [66:53:21<83:07:15, 30.47s/it] 43%|████▎ | 7466/17285 [66:53:53<84:39:50, 31.04s/it] 43%|████▎ | 7467/17285 [66:54:29<89:03:08, 32.65s/it] 43%|████▎ | 7468/17285 [66:54:54<82:39:11, 30.31s/it] 43%|████▎ | 7469/17285 [66:55:21<79:19:27, 29.09s/it] 43%|████▎ | 7470/17285 [66:55:55<83:56:48, 30.79s/it] {'loss': 1.4419, 'learning_rate': 0.00013175566453537692, 'epoch': 1.3} + 43%|████▎ | 7470/17285 [66:55:55<83:56:48, 30.79s/it] 43%|████▎ | 7471/17285 [66:56:26<83:46:56, 30.73s/it] 43%|████▎ | 7472/17285 [66:57:03<88:56:50, 32.63s/it] 43%|████▎ | 7473/17285 [66:57:37<90:04:21, 33.05s/it] 43%|████▎ | 7474/17285 [66:58:15<93:48:27, 34.42s/it] 43%|████▎ | 7475/17285 [66:58:49<94:01:49, 34.51s/it] 43%|████▎ | 7476/17285 [66:59:20<90:36:24, 33.25s/it] 43%|████▎ | 7477/17285 [66:59:55<92:17:45, 33.88s/it] 43%|████▎ | 7478/17285 [67:00:25<88:48:01, 32.60s/it] 43%|████▎ | 7479/17285 [67:00:58<89:18:11, 32.79s/it] 43%|████▎ | 7480/17285 [67:01:32<90:07:01, 33.09s/it] {'loss': 1.4353, 'learning_rate': 0.0001315741825509265, 'epoch': 1.3} + 43%|████▎ | 7480/17285 [67:01:32<90:07:01, 33.09s/it] 43%|████▎ | 7481/17285 [67:02:05<90:32:06, 33.24s/it] 43%|████▎ | 7482/17285 [67:02:39<90:41:45, 33.31s/it] 43%|████▎ | 7483/17285 [67:03:09<88:22:33, 32.46s/it] 43%|████▎ | 7484/17285 [67:03:39<86:13:20, 31.67s/it] 43%|████▎ | 7485/17285 [67:04:12<87:42:29, 32.22s/it] 43%|████▎ | 7486/17285 [67:04:43<86:03:48, 31.62s/it] 43%|████▎ | 7487/17285 [67:05:17<88:05:18, 32.37s/it] 43%|████▎ | 7488/17285 [67:05:51<89:29:33, 32.88s/it] 43%|████▎ | 7489/17285 [67:06:20<86:19:35, 31.72s/it] 43%|████▎ | 7490/17285 [67:06:49<84:22:16, 31.01s/it] {'loss': 1.4382, 'learning_rate': 0.00013139258498574873, 'epoch': 1.3} + 43%|████▎ | 7490/17285 [67:06:49<84:22:16, 31.01s/it] 43%|████▎ | 7491/17285 [67:07:23<86:15:20, 31.71s/it] 43%|████▎ | 7492/17285 [67:07:55<87:08:32, 32.03s/it] 43%|████▎ | 7493/17285 [67:08:25<85:01:15, 31.26s/it] 43%|████▎ | 7494/17285 [67:09:00<87:53:34, 32.32s/it] 43%|████▎ | 7495/17285 [67:09:33<88:24:28, 32.51s/it] 43%|████▎ | 7496/17285 [67:10:07<89:35:39, 32.95s/it] 43%|████▎ | 7497/17285 [67:10:45<94:28:01, 34.74s/it] 43%|████▎ | 7498/17285 [67:11:12<87:38:53, 32.24s/it] 43%|████▎ | 7499/17285 [67:11:44<87:23:45, 32.15s/it] 43%|████▎ | 7500/17285 [67:12:11<83:30:59, 30.73s/it] {'loss': 1.4579, 'learning_rate': 0.00013121087250460132, 'epoch': 1.3} + 43%|████▎ | 7500/17285 [67:12:11<83:30:59, 30.73s/it] 43%|████▎ | 7501/17285 [67:12:46<86:34:35, 31.86s/it] 43%|████▎ | 7502/17285 [67:13:17<86:14:16, 31.73s/it] 43%|████▎ | 7503/17285 [67:13:50<87:26:11, 32.18s/it] 43%|████▎ | 7504/17285 [67:14:33<96:16:12, 35.43s/it] 43%|████▎ | 7505/17285 [67:15:00<89:19:17, 32.88s/it] 43%|████▎ | 7506/17285 [67:15:41<96:02:00, 35.35s/it] 43%|████▎ | 7507/17285 [67:16:13<92:40:21, 34.12s/it] 43%|████▎ | 7508/17285 [67:16:41<87:42:28, 32.30s/it] 43%|████▎ | 7509/17285 [67:17:17<90:38:14, 33.38s/it] 43%|████▎ | 7510/17285 [67:17:52<92:08:05, 33.93s/it] {'loss': 1.4475, 'learning_rate': 0.00013102904577266255, 'epoch': 1.3} + 43%|████▎ | 7510/17285 [67:17:52<92:08:05, 33.93s/it] 43%|████▎ | 7511/17285 [67:18:31<96:14:05, 35.45s/it] 43%|████▎ | 7512/17285 [67:19:02<93:00:14, 34.26s/it] 43%|████▎ | 7513/17285 [67:19:36<92:51:31, 34.21s/it] 43%|████▎ | 7514/17285 [67:20:09<91:38:12, 33.76s/it] 43%|████▎ | 7515/17285 [67:20:38<87:23:33, 32.20s/it] 43%|████▎ | 7516/17285 [67:21:22<97:00:01, 35.75s/it] 43%|████▎ | 7517/17285 [67:21:51<91:50:00, 33.85s/it] 43%|████▎ | 7518/17285 [67:22:23<90:19:58, 33.30s/it] 44%|████▎ | 7519/17285 [67:22:55<88:59:18, 32.80s/it] 44%|████▎ | 7520/17285 [67:23:24<85:43:26, 31.60s/it] {'loss': 1.442, 'learning_rate': 0.00013084710545552893, 'epoch': 1.31} + 44%|████▎ | 7520/17285 [67:23:24<85:43:26, 31.60s/it] 44%|████▎ | 7521/17285 [67:23:49<80:59:29, 29.86s/it] 44%|████▎ | 7522/17285 [67:24:20<81:18:19, 29.98s/it] 44%|████▎ | 7523/17285 [67:24:52<83:09:44, 30.67s/it] 44%|████▎ | 7524/17285 [67:25:22<82:56:48, 30.59s/it] 44%|████▎ | 7525/17285 [67:25:57<86:15:41, 31.82s/it] 44%|████▎ | 7526/17285 [67:26:27<84:59:25, 31.35s/it] 44%|████▎ | 7527/17285 [67:27:06<90:53:24, 33.53s/it] 44%|████▎ | 7528/17285 [67:27:31<84:17:20, 31.10s/it] 44%|████▎ | 7529/17285 [67:27:57<80:08:37, 29.57s/it] 44%|████▎ | 7530/17285 [67:28:26<79:46:23, 29.44s/it] {'loss': 1.4578, 'learning_rate': 0.00013066505221921273, 'epoch': 1.31} + 44%|████▎ | 7530/17285 [67:28:26<79:46:23, 29.44s/it] 44%|████▎ | 7531/17285 [67:28:59<82:09:44, 30.32s/it] 44%|████▎ | 7532/17285 [67:29:24<78:18:59, 28.91s/it] 44%|████▎ | 7533/17285 [67:29:55<79:49:27, 29.47s/it] 44%|████▎ | 7534/17285 [67:30:28<82:34:41, 30.49s/it] 44%|████▎ | 7535/17285 [67:31:01<84:35:13, 31.23s/it] 44%|████▎ | 7536/17285 [67:31:44<94:18:11, 34.82s/it] 44%|████▎ | 7537/17285 [67:32:13<88:57:47, 32.85s/it] 44%|████▎ | 7538/17285 [67:32:57<98:11:28, 36.27s/it] 44%|████▎ | 7539/17285 [67:33:30<96:03:37, 35.48s/it] 44%|████▎ | 7540/17285 [67:34:03<93:51:26, 34.67s/it] {'loss': 1.4778, 'learning_rate': 0.00013048288673013966, 'epoch': 1.31} + 44%|████▎ | 7540/17285 [67:34:03<93:51:26, 34.67s/it] 44%|████▎ | 7541/17285 [67:34:32<88:41:24, 32.77s/it] 44%|████▎ | 7542/17285 [67:35:12<95:09:59, 35.16s/it] 44%|████▎ | 7543/17285 [67:35:48<95:34:51, 35.32s/it] 44%|████▎ | 7544/17285 [67:36:21<93:37:38, 34.60s/it] 44%|████▎ | 7545/17285 [67:36:54<92:11:20, 34.07s/it] 44%|████▎ | 7546/17285 [67:37:22<87:51:23, 32.48s/it] 44%|████▎ | 7547/17285 [67:37:55<87:35:50, 32.38s/it] 44%|████▎ | 7548/17285 [67:38:28<88:45:27, 32.82s/it] 44%|████▎ | 7549/17285 [67:39:11<96:20:05, 35.62s/it] 44%|████▎ | 7550/17285 [67:39:43<93:38:56, 34.63s/it] {'loss': 1.4279, 'learning_rate': 0.00013030060965514632, 'epoch': 1.31} + 44%|████▎ | 7550/17285 [67:39:43<93:38:56, 34.63s/it] 44%|████▎ | 7551/17285 [67:40:10<87:19:25, 32.30s/it] 44%|████▎ | 7552/17285 [67:40:45<89:23:20, 33.06s/it] 44%|████▎ | 7553/17285 [67:41:16<87:51:48, 32.50s/it] 44%|████▎ | 7554/17285 [67:41:49<88:16:10, 32.66s/it] 44%|████▎ | 7555/17285 [67:42:27<92:52:04, 34.36s/it] 44%|████▎ | 7556/17285 [67:43:01<92:15:36, 34.14s/it] 44%|████▎ | 7557/17285 [67:43:39<95:52:44, 35.48s/it] 44%|████▎ | 7558/17285 [67:44:13<94:06:57, 34.83s/it] 44%|████▎ | 7559/17285 [67:44:53<98:18:44, 36.39s/it] 44%|████▎ | 7560/17285 [67:45:25<94:48:24, 35.10s/it] {'loss': 1.4175, 'learning_rate': 0.00013011822166147767, 'epoch': 1.31} + 44%|████▎ | 7560/17285 [67:45:25<94:48:24, 35.10s/it] 44%|████▎ | 7561/17285 [67:45:54<89:54:13, 33.28s/it] 44%|████▎ | 7562/17285 [67:46:22<85:36:49, 31.70s/it] 44%|████▍ | 7563/17285 [67:46:47<80:07:10, 29.67s/it] 44%|████▍ | 7564/17285 [67:47:23<85:11:11, 31.55s/it] 44%|████▍ | 7565/17285 [67:47:53<84:20:07, 31.24s/it] 44%|████▍ | 7566/17285 [67:48:25<84:20:42, 31.24s/it] 44%|████▍ | 7567/17285 [67:49:02<89:16:47, 33.07s/it] 44%|████▍ | 7568/17285 [67:49:43<95:35:47, 35.42s/it] 44%|████▍ | 7569/17285 [67:50:14<92:04:36, 34.12s/it] 44%|████▍ | 7570/17285 [67:50:53<96:18:05, 35.69s/it] {'loss': 1.4537, 'learning_rate': 0.00012993572341678483, 'epoch': 1.31} + 44%|████▍ | 7570/17285 [67:50:53<96:18:05, 35.69s/it] 44%|████▍ | 7571/17285 [67:51:23<91:44:41, 34.00s/it] 44%|████▍ | 7572/17285 [67:51:52<87:24:27, 32.40s/it] 44%|████▍ | 7573/17285 [67:52:31<93:08:48, 34.53s/it] 44%|████▍ | 7574/17285 [67:53:04<91:40:40, 33.99s/it] 44%|████▍ | 7575/17285 [67:53:36<90:08:14, 33.42s/it] 44%|████▍ | 7576/17285 [67:54:10<90:21:02, 33.50s/it] 44%|████▍ | 7577/17285 [67:54:35<83:40:44, 31.03s/it] 44%|████▍ | 7578/17285 [67:55:12<88:24:52, 32.79s/it] 44%|████▍ | 7579/17285 [67:55:47<90:30:20, 33.57s/it] 44%|████▍ | 7580/17285 [67:56:18<87:54:31, 32.61s/it] {'loss': 1.473, 'learning_rate': 0.00012975311558912248, 'epoch': 1.32} + 44%|████▍ | 7580/17285 [67:56:18<87:54:31, 32.61s/it] 44%|████▍ | 7581/17285 [67:56:48<85:45:07, 31.81s/it] 44%|████▍ | 7582/17285 [67:57:20<85:53:42, 31.87s/it] 44%|████▍ | 7583/17285 [67:57:54<88:04:29, 32.68s/it] 44%|████▍ | 7584/17285 [67:58:35<94:45:08, 35.16s/it] 44%|████▍ | 7585/17285 [67:59:10<93:59:12, 34.88s/it] 44%|████▍ | 7586/17285 [67:59:44<93:52:42, 34.85s/it] 44%|████▍ | 7587/17285 [68:00:11<86:57:14, 32.28s/it] 44%|████▍ | 7588/17285 [68:00:49<92:11:33, 34.23s/it] 44%|████▍ | 7589/17285 [68:01:19<88:35:13, 32.89s/it] 44%|████▍ | 7590/17285 [68:02:00<94:38:31, 35.14s/it] {'loss': 1.4041, 'learning_rate': 0.00012957039884694638, 'epoch': 1.32} + 44%|████▍ | 7590/17285 [68:02:00<94:38:31, 35.14s/it] 44%|████▍ | 7591/17285 [68:02:31<91:20:12, 33.92s/it] 44%|████▍ | 7592/17285 [68:03:05<92:05:15, 34.20s/it] 44%|████▍ | 7593/17285 [68:03:35<88:26:36, 32.85s/it] 44%|████▍ | 7594/17285 [68:04:01<82:37:31, 30.69s/it] 44%|████▍ | 7595/17285 [68:04:32<83:10:13, 30.90s/it] 44%|████▍ | 7596/17285 [68:05:10<88:52:01, 33.02s/it] 44%|████▍ | 7597/17285 [68:05:40<86:04:19, 31.98s/it] 44%|████▍ | 7598/17285 [68:06:06<81:03:21, 30.12s/it] 44%|████▍ | 7599/17285 [68:06:41<85:22:40, 31.73s/it] 44%|████▍ | 7600/17285 [68:07:14<86:01:45, 31.98s/it] {'loss': 1.4453, 'learning_rate': 0.00012938757385911104, 'epoch': 1.32} + 44%|████▍ | 7600/17285 [68:07:14<86:01:45, 31.98s/it] 44%|████▍ | 7601/17285 [68:07:45<85:51:21, 31.92s/it] 44%|████▍ | 7602/17285 [68:08:17<85:17:37, 31.71s/it] 44%|████▍ | 7603/17285 [68:08:57<92:02:09, 34.22s/it] 44%|████▍ | 7604/17285 [68:09:39<98:27:25, 36.61s/it] 44%|████▍ | 7605/17285 [68:10:05<90:05:38, 33.51s/it] 44%|████▍ | 7606/17285 [68:10:32<85:01:22, 31.62s/it] 44%|████▍ | 7607/17285 [68:11:01<82:45:00, 30.78s/it] 44%|████▍ | 7608/17285 [68:11:32<82:54:26, 30.84s/it] 44%|████▍ | 7609/17285 [68:12:05<84:45:45, 31.54s/it] 44%|████▍ | 7610/17285 [68:12:41<88:17:22, 32.85s/it] {'loss': 1.4795, 'learning_rate': 0.00012920464129486723, 'epoch': 1.32} + 44%|████▍ | 7610/17285 [68:12:41<88:17:22, 32.85s/it] 44%|████▍ | 7611/17285 [68:13:18<91:08:34, 33.92s/it][2023-08-25 20:08:28,853] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 44%|████▍ | 7612/17285 [68:13:51<90:50:14, 33.81s/it] 44%|████▍ | 7613/17285 [68:14:29<94:16:23, 35.09s/it] 44%|████▍ | 7614/17285 [68:15:00<90:51:29, 33.82s/it] 44%|████▍ | 7615/17285 [68:15:32<89:21:31, 33.27s/it] 44%|████▍ | 7616/17285 [68:15:57<82:33:14, 30.74s/it] 44%|████▍ | 7617/17285 [68:16:32<85:48:50, 31.95s/it] 44%|████▍ | 7618/17285 [68:17:04<86:26:08, 32.19s/it] 44%|████▍ | 7619/17285 [68:17:41<89:44:01, 33.42s/it] 44%|████▍ | 7620/17285 [68:18:06<83:28:52, 31.09s/it] {'loss': 1.4592, 'learning_rate': 0.00012903991056267166, 'epoch': 1.32} + 44%|████▍ | 7620/17285 [68:18:06<83:28:52, 31.09s/it] 44%|████▍ | 7621/17285 [68:18:36<82:21:26, 30.68s/it] 44%|████▍ | 7622/17285 [68:19:13<87:24:14, 32.56s/it] 44%|████▍ | 7623/17285 [68:19:45<86:36:55, 32.27s/it] 44%|████▍ | 7624/17285 [68:20:11<82:12:39, 30.63s/it] 44%|████▍ | 7625/17285 [68:20:39<79:57:03, 29.80s/it] 44%|████▍ | 7626/17285 [68:21:04<76:08:37, 28.38s/it] 44%|████▍ | 7627/17285 [68:21:37<79:30:29, 29.64s/it] 44%|████▍ | 7628/17285 [68:22:04<77:47:36, 29.00s/it] 44%|████▍ | 7629/17285 [68:22:31<75:47:45, 28.26s/it] 44%|████▍ | 7630/17285 [68:23:09<83:14:08, 31.04s/it] {'loss': 1.4767, 'learning_rate': 0.0001288567754484459, 'epoch': 1.32} + 44%|████▍ | 7630/17285 [68:23:09<83:14:08, 31.04s/it] 44%|████▍ | 7631/17285 [68:23:35<79:42:37, 29.72s/it] 44%|████▍ | 7632/17285 [68:24:05<79:56:28, 29.81s/it] 44%|████▍ | 7633/17285 [68:24:40<83:42:24, 31.22s/it] 44%|████▍ | 7634/17285 [68:25:08<81:29:23, 30.40s/it] 44%|████▍ | 7635/17285 [68:25:38<80:52:54, 30.17s/it] 44%|████▍ | 7636/17285 [68:26:05<78:36:35, 29.33s/it] 44%|████▍ | 7637/17285 [68:26:39<82:34:49, 30.81s/it] 44%|████▍ | 7638/17285 [68:27:07<80:14:26, 29.94s/it] 44%|████▍ | 7639/17285 [68:27:36<78:54:59, 29.45s/it] 44%|████▍ | 7640/17285 [68:28:08<81:07:00, 30.28s/it] {'loss': 1.4769, 'learning_rate': 0.00012867353470085696, 'epoch': 1.33} + 44%|████▍ | 7640/17285 [68:28:08<81:07:00, 30.28s/it] 44%|████▍ | 7641/17285 [68:28:45<86:20:30, 32.23s/it] 44%|████▍ | 7642/17285 [68:29:12<82:39:13, 30.86s/it] 44%|████▍ | 7643/17285 [68:29:42<81:43:08, 30.51s/it] 44%|████▍ | 7644/17285 [68:30:11<80:29:28, 30.06s/it] 44%|████▍ | 7645/17285 [68:30:44<82:51:29, 30.94s/it] 44%|████▍ | 7646/17285 [68:31:16<83:41:06, 31.25s/it] 44%|████▍ | 7647/17285 [68:32:00<94:07:11, 35.16s/it] 44%|████▍ | 7648/17285 [68:32:28<88:09:59, 32.94s/it] 44%|████▍ | 7649/17285 [68:32:57<84:39:24, 31.63s/it] 44%|████▍ | 7650/17285 [68:33:29<85:10:33, 31.82s/it] {'loss': 1.4212, 'learning_rate': 0.00012849018899067748, 'epoch': 1.33} + 44%|████▍ | 7650/17285 [68:33:29<85:10:33, 31.82s/it] 44%|████▍ | 7651/17285 [68:34:05<88:14:03, 32.97s/it] 44%|████▍ | 7652/17285 [68:34:38<88:59:36, 33.26s/it] 44%|████▍ | 7653/17285 [68:35:14<91:06:36, 34.05s/it] 44%|████▍ | 7654/17285 [68:35:43<87:01:20, 32.53s/it] 44%|████▍ | 7655/17285 [68:36:09<81:40:54, 30.54s/it] 44%|████▍ | 7656/17285 [68:36:37<79:09:24, 29.59s/it] 44%|████▍ | 7657/17285 [68:37:14<85:12:00, 31.86s/it] 44%|████▍ | 7658/17285 [68:37:42<81:53:26, 30.62s/it] 44%|████▍ | 7659/17285 [68:38:15<83:47:27, 31.34s/it] 44%|████▍ | 7660/17285 [68:38:39<78:14:31, 29.26s/it] {'loss': 1.4932, 'learning_rate': 0.00012830673898906435, 'epoch': 1.33} + 44%|████▍ | 7660/17285 [68:38:39<78:14:31, 29.26s/it] 44%|████▍ | 7661/17285 [68:39:11<80:17:46, 30.04s/it] 44%|████▍ | 7662/17285 [68:39:36<76:11:50, 28.51s/it] 44%|████▍ | 7663/17285 [68:40:11<81:37:13, 30.54s/it] 44%|████▍ | 7664/17285 [68:40:44<83:48:32, 31.36s/it] 44%|████▍ | 7665/17285 [68:41:10<78:54:58, 29.53s/it] 44%|████▍ | 7666/17285 [68:41:37<77:19:48, 28.94s/it] 44%|████▍ | 7667/17285 [68:42:03<74:35:11, 27.92s/it] 44%|████▍ | 7668/17285 [68:42:31<75:10:05, 28.14s/it] 44%|████▍ | 7669/17285 [68:43:15<87:14:13, 32.66s/it] 44%|████▍ | 7670/17285 [68:43:45<85:37:33, 32.06s/it] {'loss': 1.4644, 'learning_rate': 0.00012812318536755622, 'epoch': 1.33} + 44%|████▍ | 7670/17285 [68:43:45<85:37:33, 32.06s/it] 44%|████▍ | 7671/17285 [68:44:21<88:46:20, 33.24s/it] 44%|████▍ | 7672/17285 [68:44:49<84:46:34, 31.75s/it][2023-08-25 20:39:55,508] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 44%|████▍ | 7673/17285 [68:45:18<82:04:33, 30.74s/it] 44%|████▍ | 7674/17285 [68:45:52<84:42:34, 31.73s/it] 44%|████▍ | 7675/17285 [68:46:30<89:26:30, 33.51s/it] 44%|████▍ | 7676/17285 [68:47:01<88:01:17, 32.98s/it] 44%|████▍ | 7677/17285 [68:47:28<82:45:55, 31.01s/it] 44%|████▍ | 7678/17285 [68:47:54<78:45:02, 29.51s/it] 44%|████▍ | 7679/17285 [68:48:23<78:12:22, 29.31s/it] 44%|████▍ | 7680/17285 [68:48:55<81:00:37, 30.36s/it] {'loss': 1.4359, 'learning_rate': 0.00012795789906852118, 'epoch': 1.33} + 44%|████▍ | 7680/17285 [68:48:55<81:00:37, 30.36s/it] 44%|████▍ | 7681/17285 [68:49:31<85:24:49, 32.02s/it] 44%|████▍ | 7682/17285 [68:50:05<86:31:15, 32.44s/it] 44%|████▍ | 7683/17285 [68:50:46<93:15:44, 34.97s/it] 44%|████▍ | 7684/17285 [68:51:16<89:50:54, 33.69s/it] 44%|████▍ | 7685/17285 [68:51:43<84:30:29, 31.69s/it] 44%|████▍ | 7686/17285 [68:52:11<81:10:07, 30.44s/it] 44%|████▍ | 7687/17285 [68:52:36<77:05:39, 28.92s/it] 44%|████▍ | 7688/17285 [68:53:08<79:25:37, 29.79s/it] 44%|████▍ | 7689/17285 [68:53:42<82:41:42, 31.02s/it] 44%|████▍ | 7690/17285 [68:54:25<92:06:23, 34.56s/it] {'loss': 1.4672, 'learning_rate': 0.0001277741504206582, 'epoch': 1.33} + 44%|████▍ | 7690/17285 [68:54:25<92:06:23, 34.56s/it] 44%|████▍ | 7691/17285 [68:54:51<85:54:00, 32.23s/it] 45%|████▍ | 7692/17285 [68:55:25<86:42:56, 32.54s/it] 45%|████▍ | 7693/17285 [68:55:51<81:41:18, 30.66s/it] 45%|████▍ | 7694/17285 [68:56:24<83:31:03, 31.35s/it] 45%|████▍ | 7695/17285 [68:57:03<89:32:26, 33.61s/it] 45%|████▍ | 7696/17285 [68:57:30<84:31:26, 31.73s/it] 45%|████▍ | 7697/17285 [68:58:00<83:13:05, 31.25s/it] 45%|████▍ | 7698/17285 [68:58:25<77:54:27, 29.26s/it] 45%|████▍ | 7699/17285 [68:58:59<81:34:07, 30.63s/it] 45%|████▍ | 7700/17285 [68:59:38<88:44:44, 33.33s/it] {'loss': 1.4161, 'learning_rate': 0.00012759030010249867, 'epoch': 1.34} + 45%|████▍ | 7700/17285 [68:59:38<88:44:44, 33.33s/it] 45%|████▍ | 7701/17285 [69:00:05<83:26:38, 31.34s/it] 45%|████▍ | 7702/17285 [69:00:32<79:35:45, 29.90s/it] 45%|████▍ | 7703/17285 [69:00:57<76:06:08, 28.59s/it] 45%|████▍ | 7704/17285 [69:01:33<82:13:31, 30.90s/it] 45%|████▍ | 7705/17285 [69:02:05<82:46:13, 31.10s/it] 45%|████▍ | 7706/17285 [69:02:30<77:49:07, 29.25s/it] 45%|████▍ | 7707/17285 [69:03:07<84:01:22, 31.58s/it] 45%|████▍ | 7708/17285 [69:03:42<86:42:29, 32.59s/it] 45%|████▍ | 7709/17285 [69:04:18<89:30:01, 33.65s/it] 45%|████▍ | 7710/17285 [69:04:44<83:12:50, 31.29s/it] {'loss': 1.4479, 'learning_rate': 0.00012740634878704655, 'epoch': 1.34} + 45%|████▍ | 7710/17285 [69:04:44<83:12:50, 31.29s/it] 45%|████▍ | 7711/17285 [69:05:23<89:36:03, 33.69s/it] 45%|████▍ | 7712/17285 [69:05:50<83:50:10, 31.53s/it] 45%|████▍ | 7713/17285 [69:06:16<79:45:18, 30.00s/it] 45%|████▍ | 7714/17285 [69:06:48<81:34:13, 30.68s/it] 45%|████▍ | 7715/17285 [69:07:18<80:45:55, 30.38s/it] 45%|████▍ | 7716/17285 [69:07:48<80:46:12, 30.39s/it] 45%|████▍ | 7717/17285 [69:08:19<80:44:54, 30.38s/it] 45%|████▍ | 7718/17285 [69:08:51<81:59:58, 30.86s/it] 45%|████▍ | 7719/17285 [69:09:23<82:53:53, 31.20s/it] 45%|████▍ | 7720/17285 [69:09:55<83:53:38, 31.58s/it] {'loss': 1.5016, 'learning_rate': 0.00012722229714767566, 'epoch': 1.34} + 45%|████▍ | 7720/17285 [69:09:55<83:53:38, 31.58s/it] 45%|████▍ | 7721/17285 [69:10:24<81:31:46, 30.69s/it] 45%|████▍ | 7722/17285 [69:10:50<78:18:21, 29.48s/it] 45%|████▍ | 7723/17285 [69:11:25<82:14:45, 30.96s/it] 45%|████▍ | 7724/17285 [69:11:56<82:18:41, 30.99s/it] 45%|████▍ | 7725/17285 [69:12:32<85:56:54, 32.37s/it] 45%|████▍ | 7726/17285 [69:12:58<81:21:26, 30.64s/it] 45%|████▍ | 7727/17285 [69:13:27<79:56:17, 30.11s/it] 45%|████▍ | 7728/17285 [69:13:53<77:02:55, 29.02s/it] 45%|████▍ | 7729/17285 [69:14:18<73:30:07, 27.69s/it] 45%|████▍ | 7730/17285 [69:14:49<76:22:11, 28.77s/it] {'loss': 1.4459, 'learning_rate': 0.00012703814585812706, 'epoch': 1.34} + 45%|████▍ | 7730/17285 [69:14:49<76:22:11, 28.77s/it] 45%|████▍ | 7731/17285 [69:15:17<75:48:24, 28.56s/it] 45%|████▍ | 7732/17285 [69:15:48<77:12:58, 29.10s/it] 45%|████▍ | 7733/17285 [69:16:22<81:36:49, 30.76s/it] 45%|████▍ | 7734/17285 [69:16:53<81:29:01, 30.71s/it] 45%|████▍ | 7735/17285 [69:17:29<85:53:39, 32.38s/it] 45%|████▍ | 7736/17285 [69:17:56<81:06:31, 30.58s/it] 45%|████▍ | 7737/17285 [69:18:28<82:16:54, 31.02s/it] 45%|████▍ | 7738/17285 [69:19:00<82:54:43, 31.26s/it] 45%|████▍ | 7739/17285 [69:19:28<80:55:54, 30.52s/it] 45%|████▍ | 7740/17285 [69:20:01<82:24:13, 31.08s/it] {'loss': 1.4491, 'learning_rate': 0.00012685389559250655, 'epoch': 1.34} + 45%|████▍ | 7740/17285 [69:20:01<82:24:13, 31.08s/it] 45%|████▍ | 7741/17285 [69:20:34<84:13:29, 31.77s/it] 45%|████▍ | 7742/17285 [69:21:03<82:00:36, 30.94s/it] 45%|████▍ | 7743/17285 [69:21:37<84:31:59, 31.89s/it] 45%|████▍ | 7744/17285 [69:22:10<84:58:39, 32.06s/it] 45%|████▍ | 7745/17285 [69:22:41<84:06:39, 31.74s/it] 45%|████▍ | 7746/17285 [69:23:08<80:25:01, 30.35s/it] 45%|████▍ | 7747/17285 [69:23:39<80:51:55, 30.52s/it] 45%|████▍ | 7748/17285 [69:24:15<85:16:00, 32.19s/it] 45%|████▍ | 7749/17285 [69:24:44<82:54:57, 31.30s/it] 45%|████▍ | 7750/17285 [69:25:20<86:25:06, 32.63s/it] {'loss': 1.4229, 'learning_rate': 0.00012666954702528224, 'epoch': 1.35} + 45%|████▍ | 7750/17285 [69:25:20<86:25:06, 32.63s/it] 45%|████▍ | 7751/17285 [69:25:51<84:57:50, 32.08s/it] 45%|████▍ | 7752/17285 [69:26:27<88:45:46, 33.52s/it] 45%|████▍ | 7753/17285 [69:27:03<90:47:09, 34.29s/it] 45%|████▍ | 7754/17285 [69:27:31<85:02:03, 32.12s/it] 45%|████▍ | 7755/17285 [69:28:01<84:05:04, 31.76s/it] 45%|████▍ | 7756/17285 [69:28:31<82:24:12, 31.13s/it] 45%|████▍ | 7757/17285 [69:28:58<79:18:03, 29.96s/it] 45%|████▍ | 7758/17285 [69:29:30<80:31:01, 30.43s/it] 45%|████▍ | 7759/17285 [69:30:11<89:16:24, 33.74s/it] 45%|████▍ | 7760/17285 [69:30:43<87:45:45, 33.17s/it] {'loss': 1.4286, 'learning_rate': 0.00012648510083128212, 'epoch': 1.35} + 45%|████▍ | 7760/17285 [69:30:43<87:45:45, 33.17s/it][2023-08-25 21:25:49,040] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 45%|████▍ | 7761/17285 [69:31:11<83:46:58, 31.67s/it] 45%|████▍ | 7762/17285 [69:31:41<82:14:56, 31.09s/it] 45%|████▍ | 7763/17285 [69:32:17<85:43:55, 32.41s/it] 45%|████▍ | 7764/17285 [69:32:42<80:22:40, 30.39s/it] 45%|████▍ | 7765/17285 [69:33:11<79:10:25, 29.94s/it] 45%|████▍ | 7766/17285 [69:33:41<79:21:38, 30.01s/it] 45%|████▍ | 7767/17285 [69:34:17<83:53:40, 31.73s/it] 45%|████▍ | 7768/17285 [69:34:57<90:07:31, 34.09s/it] 45%|████▍ | 7769/17285 [69:35:39<96:54:46, 36.66s/it] 45%|████▍ | 7770/17285 [69:36:13<94:18:01, 35.68s/it] {'loss': 1.4043, 'learning_rate': 0.00012631901634382203, 'epoch': 1.35} + 45%|████▍ | 7770/17285 [69:36:13<94:18:01, 35.68s/it] 45%|████▍ | 7771/17285 [69:36:41<88:03:55, 33.32s/it] 45%|████▍ | 7772/17285 [69:37:11<86:05:52, 32.58s/it] 45%|████▍ | 7773/17285 [69:37:44<85:54:00, 32.51s/it] 45%|████▍ | 7774/17285 [69:38:13<82:55:31, 31.39s/it] 45%|████▍ | 7775/17285 [69:38:40<79:53:42, 30.24s/it] 45%|████▍ | 7776/17285 [69:39:15<83:18:07, 31.54s/it] 45%|████▍ | 7777/17285 [69:39:50<86:37:10, 32.80s/it] 45%|████▍ | 7778/17285 [69:40:20<84:19:53, 31.93s/it] 45%|████▌ | 7779/17285 [69:40:51<82:57:56, 31.42s/it] 45%|████▌ | 7780/17285 [69:41:17<78:52:24, 29.87s/it] {'loss': 1.44, 'learning_rate': 0.00012613438651937683, 'epoch': 1.35} + 45%|████▌ | 7780/17285 [69:41:17<78:52:24, 29.87s/it] 45%|████▌ | 7781/17285 [69:41:45<77:24:22, 29.32s/it] 45%|████▌ | 7782/17285 [69:42:23<84:46:30, 32.12s/it] 45%|████▌ | 7783/17285 [69:42:59<87:37:23, 33.20s/it] 45%|████▌ | 7784/17285 [69:43:34<89:05:07, 33.76s/it] 45%|████▌ | 7785/17285 [69:44:08<89:24:19, 33.88s/it] 45%|████▌ | 7786/17285 [69:44:46<92:00:14, 34.87s/it] 45%|████▌ | 7787/17285 [69:45:21<92:15:33, 34.97s/it] 45%|████▌ | 7788/17285 [69:45:53<90:24:30, 34.27s/it] 45%|████▌ | 7789/17285 [69:46:20<84:04:51, 31.88s/it] 45%|████▌ | 7790/17285 [69:47:01<91:31:55, 34.70s/it] {'loss': 1.4456, 'learning_rate': 0.00012594966102716905, 'epoch': 1.35} + 45%|████▌ | 7790/17285 [69:47:01<91:31:55, 34.70s/it] 45%|████▌ | 7791/17285 [69:47:31<87:49:12, 33.30s/it] 45%|████▌ | 7792/17285 [69:48:05<88:29:39, 33.56s/it] 45%|████▌ | 7793/17285 [69:48:46<93:52:30, 35.60s/it] 45%|████▌ | 7794/17285 [69:49:12<86:40:31, 32.88s/it] 45%|████▌ | 7795/17285 [69:49:39<81:55:40, 31.08s/it] 45%|████▌ | 7796/17285 [69:50:05<77:55:17, 29.56s/it] 45%|████▌ | 7797/17285 [69:50:35<78:33:53, 29.81s/it] 45%|████▌ | 7798/17285 [69:51:06<79:33:58, 30.19s/it] 45%|████▌ | 7799/17285 [69:51:40<81:59:22, 31.12s/it] 45%|████▌ | 7800/17285 [69:52:17<86:35:47, 32.87s/it] {'loss': 1.4206, 'learning_rate': 0.00012576484054340636, 'epoch': 1.35} + 45%|████▌ | 7800/17285 [69:52:17<86:35:47, 32.87s/it] 45%|████▌ | 7801/17285 [69:52:42<80:55:22, 30.72s/it] 45%|████▌ | 7802/17285 [69:53:15<82:26:32, 31.30s/it] 45%|████▌ | 7803/17285 [69:53:54<88:15:51, 33.51s/it] 45%|████▌ | 7804/17285 [69:54:27<88:17:31, 33.53s/it] 45%|████▌ | 7805/17285 [69:54:54<83:09:11, 31.58s/it] 45%|████▌ | 7806/17285 [69:55:35<90:43:00, 34.45s/it] 45%|████▌ | 7807/17285 [69:56:10<91:06:23, 34.60s/it] 45%|████▌ | 7808/17285 [69:56:42<88:42:01, 33.69s/it] 45%|████▌ | 7809/17285 [69:57:18<90:40:22, 34.45s/it] 45%|████▌ | 7810/17285 [69:57:50<88:30:11, 33.63s/it] {'loss': 1.4273, 'learning_rate': 0.00012557992574464428, 'epoch': 1.36} + 45%|████▌ | 7810/17285 [69:57:50<88:30:11, 33.63s/it] 45%|████▌ | 7811/17285 [69:58:20<85:46:58, 32.60s/it] 45%|████▌ | 7812/17285 [69:58:46<80:11:13, 30.47s/it] 45%|████▌ | 7813/17285 [69:59:11<75:46:43, 28.80s/it] 45%|████▌ | 7814/17285 [69:59:44<79:43:34, 30.30s/it] 45%|████▌ | 7815/17285 [70:00:09<75:38:03, 28.75s/it] 45%|████▌ | 7816/17285 [70:00:40<76:54:03, 29.24s/it] 45%|████▌ | 7817/17285 [70:01:10<77:52:40, 29.61s/it] 45%|████▌ | 7818/17285 [70:01:42<79:39:22, 30.29s/it] 45%|████▌ | 7819/17285 [70:02:11<78:06:46, 29.71s/it] 45%|████▌ | 7820/17285 [70:02:36<74:53:44, 28.49s/it] {'loss': 1.4658, 'learning_rate': 0.00012539491730778355, 'epoch': 1.36} + 45%|████▌ | 7820/17285 [70:02:36<74:53:44, 28.49s/it] 45%|████▌ | 7821/17285 [70:03:13<81:41:46, 31.08s/it] 45%|████▌ | 7822/17285 [70:03:42<79:30:53, 30.25s/it] 45%|████▌ | 7823/17285 [70:04:09<77:20:01, 29.42s/it] 45%|████▌ | 7824/17285 [70:04:49<85:18:42, 32.46s/it] 45%|████▌ | 7825/17285 [70:05:26<88:53:04, 33.82s/it] 45%|████▌ | 7826/17285 [70:06:03<91:49:55, 34.95s/it] 45%|████▌ | 7827/17285 [70:06:34<88:37:41, 33.73s/it] 45%|████▌ | 7828/17285 [70:07:09<89:11:17, 33.95s/it] 45%|████▌ | 7829/17285 [70:07:38<85:24:06, 32.51s/it] 45%|████▌ | 7830/17285 [70:08:16<89:41:37, 34.15s/it] {'loss': 1.423, 'learning_rate': 0.0001252098159100676, 'epoch': 1.36} + 45%|████▌ | 7830/17285 [70:08:16<89:41:37, 34.15s/it] 45%|████▌ | 7831/17285 [70:08:46<86:58:29, 33.12s/it] 45%|████▌ | 7832/17285 [70:09:12<80:36:46, 30.70s/it] 45%|████▌ | 7833/17285 [70:09:40<78:50:38, 30.03s/it] 45%|████▌ | 7834/17285 [70:10:12<80:02:10, 30.49s/it] 45%|████▌ | 7835/17285 [70:10:43<80:51:28, 30.80s/it] 45%|████▌ | 7836/17285 [70:11:20<85:23:24, 32.53s/it] 45%|████▌ | 7837/17285 [70:11:47<81:08:16, 30.92s/it] 45%|████▌ | 7838/17285 [70:12:22<84:51:57, 32.34s/it] 45%|████▌ | 7839/17285 [70:13:00<88:36:40, 33.77s/it] 45%|████▌ | 7840/17285 [70:13:40<93:33:30, 35.66s/it] {'loss': 1.4591, 'learning_rate': 0.00012502462222908025, 'epoch': 1.36} + 45%|████▌ | 7840/17285 [70:13:40<93:33:30, 35.66s/it] 45%|████▌ | 7841/17285 [70:14:11<90:15:07, 34.40s/it] 45%|████▌ | 7842/17285 [70:14:46<90:15:07, 34.41s/it] 45%|████▌ | 7843/17285 [70:15:17<87:43:41, 33.45s/it] 45%|████▌ | 7844/17285 [70:15:42<81:17:02, 30.99s/it] 45%|████▌ | 7845/17285 [70:16:14<82:14:28, 31.36s/it] 45%|████▌ | 7846/17285 [70:16:48<84:22:51, 32.18s/it] 45%|████▌ | 7847/17285 [70:17:22<85:29:50, 32.61s/it] 45%|████▌ | 7848/17285 [70:17:57<87:06:08, 33.23s/it] 45%|████▌ | 7849/17285 [70:18:24<82:22:47, 31.43s/it] 45%|████▌ | 7850/17285 [70:19:08<92:06:37, 35.15s/it] {'loss': 1.3987, 'learning_rate': 0.0001248393369427431, 'epoch': 1.36} + 45%|████▌ | 7850/17285 [70:19:08<92:06:37, 35.15s/it] 45%|████▌ | 7851/17285 [70:19:47<95:18:09, 36.37s/it] 45%|████▌ | 7852/17285 [70:20:18<91:03:32, 34.75s/it] 45%|████▌ | 7853/17285 [70:20:53<91:39:44, 34.99s/it] 45%|████▌ | 7854/17285 [70:21:23<87:14:10, 33.30s/it] 45%|████▌ | 7855/17285 [70:21:55<86:08:31, 32.89s/it] 45%|████▌ | 7856/17285 [70:22:22<81:41:40, 31.19s/it] 45%|████▌ | 7857/17285 [70:22:51<79:52:50, 30.50s/it] 45%|████▌ | 7858/17285 [70:23:24<81:44:55, 31.22s/it] 45%|████▌ | 7859/17285 [70:23:59<84:47:39, 32.38s/it] 45%|████▌ | 7860/17285 [70:24:37<89:21:43, 34.13s/it] {'loss': 1.4278, 'learning_rate': 0.00012465396072931307, 'epoch': 1.36} + 45%|████▌ | 7860/17285 [70:24:37<89:21:43, 34.13s/it] 45%|████▌ | 7861/17285 [70:25:11<89:12:03, 34.08s/it] 45%|████▌ | 7862/17285 [70:25:42<87:04:47, 33.27s/it] 45%|████▌ | 7863/17285 [70:26:10<82:39:10, 31.58s/it] 45%|████▌ | 7864/17285 [70:26:53<91:45:32, 35.06s/it] 46%|████▌ | 7865/17285 [70:27:26<89:37:19, 34.25s/it] 46%|████▌ | 7866/17285 [70:27:55<86:12:46, 32.95s/it] 46%|████▌ | 7867/17285 [70:28:29<86:51:25, 33.20s/it] 46%|████▌ | 7868/17285 [70:28:55<80:41:43, 30.85s/it] 46%|████▌ | 7869/17285 [70:29:27<81:59:07, 31.35s/it] 46%|████▌ | 7870/17285 [70:30:03<85:35:52, 32.73s/it] {'loss': 1.4273, 'learning_rate': 0.00012446849426737996, 'epoch': 1.37} + 46%|████▌ | 7870/17285 [70:30:03<85:35:52, 32.73s/it] 46%|████▌ | 7871/17285 [70:30:32<82:45:54, 31.65s/it] 46%|████▌ | 7872/17285 [70:31:17<93:01:02, 35.57s/it] 46%|████▌ | 7873/17285 [70:31:48<89:21:25, 34.18s/it] 46%|████▌ | 7874/17285 [70:32:23<90:27:24, 34.60s/it] 46%|████▌ | 7875/17285 [70:33:02<93:38:49, 35.83s/it] 46%|████▌ | 7876/17285 [70:33:30<87:34:02, 33.50s/it] 46%|████▌ | 7877/17285 [70:34:00<84:36:50, 32.38s/it] 46%|████▌ | 7878/17285 [70:34:29<81:53:28, 31.34s/it] 46%|████▌ | 7879/17285 [70:35:00<81:50:14, 31.32s/it] 46%|████▌ | 7880/17285 [70:35:32<82:21:45, 31.53s/it] {'loss': 1.4464, 'learning_rate': 0.00012428293823586387, 'epoch': 1.37} + 46%|████▌ | 7880/17285 [70:35:32<82:21:45, 31.53s/it] 46%|████▌ | 7881/17285 [70:36:01<80:39:14, 30.88s/it] 46%|████▌ | 7882/17285 [70:36:39<85:37:35, 32.78s/it] 46%|████▌ | 7883/17285 [70:37:08<82:46:50, 31.70s/it] 46%|████▌ | 7884/17285 [70:37:38<81:11:10, 31.09s/it] 46%|████▌ | 7885/17285 [70:38:10<82:26:58, 31.58s/it] 46%|████▌ | 7886/17285 [70:38:40<81:07:36, 31.07s/it] 46%|████▌ | 7887/17285 [70:39:08<78:26:51, 30.05s/it] 46%|████▌ | 7888/17285 [70:39:37<77:26:50, 29.67s/it] 46%|████▌ | 7889/17285 [70:40:03<75:06:14, 28.78s/it] 46%|████▌ | 7890/17285 [70:40:40<81:32:58, 31.25s/it] {'loss': 1.4407, 'learning_rate': 0.00012409729331401288, 'epoch': 1.37} + 46%|████▌ | 7890/17285 [70:40:40<81:32:58, 31.25s/it] 46%|████▌ | 7891/17285 [70:41:18<86:14:32, 33.05s/it] 46%|████▌ | 7892/17285 [70:41:50<85:55:46, 32.93s/it] 46%|████▌ | 7893/17285 [70:42:28<90:01:37, 34.51s/it] 46%|████▌ | 7894/17285 [70:43:00<87:44:08, 33.63s/it] 46%|████▌ | 7895/17285 [70:43:27<82:47:55, 31.74s/it] 46%|████▌ | 7896/17285 [70:44:00<83:39:13, 32.08s/it] 46%|████▌ | 7897/17285 [70:44:27<79:08:47, 30.35s/it] 46%|████▌ | 7898/17285 [70:44:56<78:48:54, 30.23s/it] 46%|████▌ | 7899/17285 [70:45:29<80:42:28, 30.96s/it] 46%|████▌ | 7900/17285 [70:45:59<80:15:02, 30.78s/it] {'loss': 1.4192, 'learning_rate': 0.0001239115601814004, 'epoch': 1.37} + 46%|████▌ | 7900/17285 [70:45:59<80:15:02, 30.78s/it] 46%|████▌ | 7901/17285 [70:46:30<80:00:43, 30.70s/it] 46%|████▌ | 7902/17285 [70:46:57<77:01:38, 29.55s/it] 46%|████▌ | 7903/17285 [70:47:23<74:36:39, 28.63s/it] 46%|████▌ | 7904/17285 [70:47:49<71:54:55, 27.60s/it] 46%|████▌ | 7905/17285 [70:48:18<73:16:20, 28.12s/it] 46%|████▌ | 7906/17285 [70:48:57<81:32:27, 31.30s/it] 46%|████▌ | 7907/17285 [70:49:32<84:28:03, 32.43s/it] 46%|████▌ | 7908/17285 [70:50:13<91:14:52, 35.03s/it] 46%|████▌ | 7909/17285 [70:50:50<92:55:52, 35.68s/it] 46%|████▌ | 7910/17285 [70:51:21<89:26:26, 34.35s/it] {'loss': 1.4327, 'learning_rate': 0.00012372573951792271, 'epoch': 1.37} + 46%|████▌ | 7910/17285 [70:51:21<89:26:26, 34.35s/it] 46%|████▌ | 7911/17285 [70:51:49<84:39:28, 32.51s/it] 46%|████▌ | 7912/17285 [70:52:20<83:10:28, 31.95s/it] 46%|████▌ | 7913/17285 [70:52:48<79:58:04, 30.72s/it] 46%|████▌ | 7914/17285 [70:53:16<78:04:50, 30.00s/it] 46%|████▌ | 7915/17285 [70:53:50<80:46:23, 31.03s/it][2023-08-25 22:48:56,337] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 46%|████▌ | 7916/17285 [70:54:19<79:10:21, 30.42s/it] 46%|████▌ | 7917/17285 [70:55:00<87:37:49, 33.68s/it] 46%|████▌ | 7918/17285 [70:55:25<80:56:10, 31.11s/it] 46%|████▌ | 7919/17285 [70:55:53<78:36:23, 30.21s/it] 46%|████▌ | 7920/17285 [70:56:21<76:41:15, 29.48s/it] {'loss': 1.4286, 'learning_rate': 0.00012355842664409558, 'epoch': 1.37} + 46%|████▌ | 7920/17285 [70:56:21<76:41:15, 29.48s/it] 46%|████▌ | 7921/17285 [70:57:01<84:34:45, 32.52s/it] 46%|████▌ | 7922/17285 [70:57:38<88:45:13, 34.13s/it] 46%|████▌ | 7923/17285 [70:58:07<84:32:17, 32.51s/it] 46%|████▌ | 7924/17285 [70:58:39<83:39:04, 32.17s/it] 46%|████▌ | 7925/17285 [70:59:10<83:13:26, 32.01s/it] 46%|████▌ | 7926/17285 [70:59:46<85:59:19, 33.08s/it] 46%|████▌ | 7927/17285 [71:00:22<88:44:45, 34.14s/it] 46%|████▌ | 7928/17285 [71:00:49<82:38:45, 31.80s/it] 46%|████▌ | 7929/17285 [71:01:18<80:56:48, 31.15s/it] 46%|████▌ | 7930/17285 [71:01:46<77:52:57, 29.97s/it] {'loss': 1.4381, 'learning_rate': 0.00012337244154623397, 'epoch': 1.38} + 46%|████▌ | 7930/17285 [71:01:46<77:52:57, 29.97s/it] 46%|████▌ | 7931/17285 [71:02:20<81:06:23, 31.21s/it] 46%|████▌ | 7932/17285 [71:02:54<83:23:57, 32.10s/it] 46%|████▌ | 7933/17285 [71:03:32<87:48:41, 33.80s/it] 46%|████▌ | 7934/17285 [71:04:04<86:55:02, 33.46s/it] 46%|████▌ | 7935/17285 [71:04:32<82:47:17, 31.88s/it] 46%|████▌ | 7936/17285 [71:05:08<85:23:53, 32.88s/it] 46%|████▌ | 7937/17285 [71:05:38<83:21:40, 32.10s/it] 46%|████▌ | 7938/17285 [71:06:13<85:42:08, 33.01s/it] 46%|████▌ | 7939/17285 [71:06:42<82:15:40, 31.69s/it] 46%|████▌ | 7940/17285 [71:07:09<79:01:46, 30.44s/it] {'loss': 1.4084, 'learning_rate': 0.0001231863708910095, 'epoch': 1.38} + 46%|████▌ | 7940/17285 [71:07:09<79:01:46, 30.44s/it] 46%|████▌ | 7941/17285 [71:07:36<76:13:01, 29.36s/it] 46%|████▌ | 7942/17285 [71:08:14<83:02:44, 32.00s/it] 46%|████▌ | 7943/17285 [71:08:45<82:10:15, 31.67s/it] 46%|████▌ | 7944/17285 [71:09:16<81:11:46, 31.29s/it] 46%|████▌ | 7945/17285 [71:09:47<81:22:28, 31.36s/it] 46%|████▌ | 7946/17285 [71:10:18<81:10:13, 31.29s/it] 46%|████▌ | 7947/17285 [71:10:52<83:17:26, 32.11s/it] 46%|████▌ | 7948/17285 [71:11:21<80:28:47, 31.03s/it] 46%|████▌ | 7949/17285 [71:11:50<78:52:10, 30.41s/it] 46%|████▌ | 7950/17285 [71:12:23<81:13:57, 31.33s/it] {'loss': 1.4431, 'learning_rate': 0.00012300021535955412, 'epoch': 1.38} + 46%|████▌ | 7950/17285 [71:12:23<81:13:57, 31.33s/it] 46%|████▌ | 7951/17285 [71:12:56<82:11:51, 31.70s/it] 46%|████▌ | 7952/17285 [71:13:20<76:40:04, 29.57s/it] 46%|████▌ | 7953/17285 [71:13:49<75:41:53, 29.20s/it] 46%|████▌ | 7954/17285 [71:14:23<79:48:34, 30.79s/it] 46%|████▌ | 7955/17285 [71:14:53<79:00:17, 30.48s/it] 46%|████▌ | 7956/17285 [71:15:28<82:18:54, 31.76s/it] 46%|████▌ | 7957/17285 [71:16:00<82:33:43, 31.86s/it] 46%|████▌ | 7958/17285 [71:16:28<79:48:10, 30.80s/it] 46%|████▌ | 7959/17285 [71:17:02<82:19:12, 31.78s/it] 46%|████▌ | 7960/17285 [71:17:28<78:02:16, 30.13s/it] {'loss': 1.4226, 'learning_rate': 0.0001228139756333103, 'epoch': 1.38} + 46%|████▌ | 7960/17285 [71:17:28<78:02:16, 30.13s/it] 46%|████▌ | 7961/17285 [71:18:07<84:55:15, 32.79s/it] 46%|████▌ | 7962/17285 [71:18:36<81:47:53, 31.59s/it] 46%|████▌ | 7963/17285 [71:19:08<82:06:28, 31.71s/it] 46%|████▌ | 7964/17285 [71:19:39<81:23:37, 31.44s/it] 46%|████▌ | 7965/17285 [71:20:10<80:46:33, 31.20s/it] 46%|████▌ | 7966/17285 [71:20:46<85:08:32, 32.89s/it] 46%|████▌ | 7967/17285 [71:21:22<87:13:17, 33.70s/it] 46%|████▌ | 7968/17285 [71:21:50<82:33:04, 31.90s/it] 46%|████▌ | 7969/17285 [71:22:19<80:36:24, 31.15s/it] 46%|████▌ | 7970/17285 [71:22:48<79:05:49, 30.57s/it] {'loss': 1.3949, 'learning_rate': 0.00012262765239402884, 'epoch': 1.38} + 46%|████▌ | 7970/17285 [71:22:49<79:05:49, 30.57s/it] 46%|████▌ | 7971/17285 [71:23:19<79:08:07, 30.59s/it] 46%|████▌ | 7972/17285 [71:23:49<78:34:42, 30.37s/it] 46%|████▌ | 7973/17285 [71:24:17<77:02:37, 29.78s/it] 46%|████▌ | 7974/17285 [71:24:47<77:04:04, 29.80s/it] 46%|████▌ | 7975/17285 [71:25:25<83:25:46, 32.26s/it] 46%|████▌ | 7976/17285 [71:25:53<79:38:51, 30.80s/it] 46%|████▌ | 7977/17285 [71:26:25<80:54:29, 31.29s/it] 46%|████▌ | 7978/17285 [71:26:57<81:38:19, 31.58s/it] 46%|████▌ | 7979/17285 [71:27:35<86:27:14, 33.44s/it] 46%|████▌ | 7980/17285 [71:28:02<81:42:25, 31.61s/it] {'loss': 1.4388, 'learning_rate': 0.0001224412463237662, 'epoch': 1.38} + 46%|████▌ | 7980/17285 [71:28:02<81:42:25, 31.61s/it] 46%|████▌ | 7981/17285 [71:28:34<81:37:01, 31.58s/it] 46%|████▌ | 7982/17285 [71:29:12<87:00:51, 33.67s/it] 46%|████▌ | 7983/17285 [71:29:41<83:10:06, 32.19s/it] 46%|████▌ | 7984/17285 [71:30:07<78:37:42, 30.43s/it] 46%|████▌ | 7985/17285 [71:30:33<74:47:07, 28.95s/it] 46%|████▌ | 7986/17285 [71:31:07<78:36:29, 30.43s/it] 46%|████▌ | 7987/17285 [71:31:45<84:31:19, 32.73s/it] 46%|████▌ | 7988/17285 [71:32:16<83:09:24, 32.20s/it] 46%|████▌ | 7989/17285 [71:32:42<78:25:49, 30.37s/it] 46%|████▌ | 7990/17285 [71:33:13<78:35:21, 30.44s/it] {'loss': 1.4102, 'learning_rate': 0.00012225475810488206, 'epoch': 1.39} + 46%|████▌ | 7990/17285 [71:33:13<78:35:21, 30.44s/it][2023-08-25 23:28:16,724] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 46%|████▌ | 7991/17285 [71:33:39<75:29:03, 29.24s/it] 46%|████▌ | 7992/17285 [71:34:06<73:24:08, 28.44s/it] 46%|████▌ | 7993/17285 [71:34:33<72:31:22, 28.10s/it] 46%|████▌ | 7994/17285 [71:35:01<72:43:19, 28.18s/it] 46%|████▋ | 7995/17285 [71:35:38<79:08:35, 30.67s/it] 46%|████▋ | 7996/17285 [71:36:09<79:16:52, 30.73s/it] 46%|████▋ | 7997/17285 [71:36:39<79:09:55, 30.68s/it] 46%|████▋ | 7998/17285 [71:37:18<85:05:50, 32.99s/it] 46%|████▋ | 7999/17285 [71:37:48<83:21:28, 32.32s/it] 46%|████▋ | 8000/17285 [71:38:19<82:01:05, 31.80s/it] {'loss': 1.4059, 'learning_rate': 0.00012208684903502762, 'epoch': 1.39} + 46%|████▋ | 8000/17285 [71:38:19<82:01:05, 31.80s/it][INFO|trainer.py:3081] 2023-08-25 23:32:56,627 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-25 23:32:56,628 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-25 23:32:56,628 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-5000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-8000 +[INFO|tokenization_utils_base.py:2210] 2023-08-25 23:34:22,898 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-8000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-25 23:34:22,905 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-8000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-8000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-8000 + 46%|████▋ | 8001/17285 [71:40:27<156:44:44, 60.78s/it] 46%|████▋ | 8002/17285 [71:40:52<129:02:31, 50.04s/it] 46%|████▋ | 8003/17285 [71:41:26<116:44:22, 45.28s/it] 46%|████▋ | 8004/17285 [71:42:01<108:23:21, 42.04s/it] 46%|████▋ | 8005/17285 [71:42:34<101:23:42, 39.33s/it] 46%|████▋ | 8006/17285 [71:43:02<92:16:37, 35.80s/it] 46%|████▋ | 8007/17285 [71:43:27<84:09:23, 32.65s/it] 46%|████▋ | 8008/17285 [71:44:01<85:37:48, 33.23s/it] 46%|████▋ | 8009/17285 [71:44:33<84:27:45, 32.78s/it] 46%|████▋ | 8010/17285 [71:45:05<83:24:46, 32.38s/it] {'loss': 1.4513, 'learning_rate': 0.00012190020661473858, 'epoch': 1.39} + 46%|████▋ | 8010/17285 [71:45:05<83:24:46, 32.38s/it] 46%|████▋ | 8011/17285 [71:45:44<89:00:33, 34.55s/it] 46%|████▋ | 8012/17285 [71:46:12<83:46:10, 32.52s/it] 46%|████▋ | 8013/17285 [71:46:41<81:25:50, 31.62s/it] 46%|████▋ | 8014/17285 [71:47:06<76:01:02, 29.52s/it] 46%|████▋ | 8015/17285 [71:47:36<76:20:11, 29.65s/it] 46%|████▋ | 8016/17285 [71:48:11<80:36:29, 31.31s/it] 46%|████▋ | 8017/17285 [71:48:49<85:35:23, 33.25s/it] 46%|████▋ | 8018/17285 [71:49:21<84:46:11, 32.93s/it] 46%|████▋ | 8019/17285 [71:49:58<88:03:17, 34.21s/it] 46%|████▋ | 8020/17285 [71:50:27<83:40:44, 32.51s/it] {'loss': 1.4368, 'learning_rate': 0.00012171348402636268, 'epoch': 1.39} + 46%|████▋ | 8020/17285 [71:50:27<83:40:44, 32.51s/it] 46%|████▋ | 8021/17285 [71:50:54<79:05:47, 30.74s/it] 46%|████▋ | 8022/17285 [71:51:24<78:58:00, 30.69s/it] 46%|████▋ | 8023/17285 [71:51:58<81:33:23, 31.70s/it] 46%|████▋ | 8024/17285 [71:52:29<81:06:47, 31.53s/it] 46%|████▋ | 8025/17285 [71:52:55<76:47:03, 29.85s/it] 46%|████▋ | 8026/17285 [71:53:33<82:47:13, 32.19s/it] 46%|████▋ | 8027/17285 [71:54:09<86:03:11, 33.46s/it] 46%|████▋ | 8028/17285 [71:54:44<86:54:06, 33.80s/it] 46%|████▋ | 8029/17285 [71:55:14<83:54:31, 32.64s/it] 46%|████▋ | 8030/17285 [71:55:43<81:11:44, 31.58s/it] {'loss': 1.4503, 'learning_rate': 0.00012152668195341832, 'epoch': 1.39} + 46%|████▋ | 8030/17285 [71:55:43<81:11:44, 31.58s/it] 46%|████▋ | 8031/17285 [71:56:17<83:13:24, 32.38s/it] 46%|████▋ | 8032/17285 [71:56:46<80:49:32, 31.45s/it] 46%|████▋ | 8033/17285 [71:57:15<78:50:57, 30.68s/it] 46%|████▋ | 8034/17285 [71:57:44<76:57:32, 29.95s/it] 46%|████▋ | 8035/17285 [71:58:21<82:23:02, 32.06s/it] 46%|████▋ | 8036/17285 [71:58:47<77:55:03, 30.33s/it] 46%|████▋ | 8037/17285 [71:59:21<80:57:57, 31.52s/it] 47%|████▋ | 8038/17285 [71:59:48<77:30:37, 30.18s/it] 47%|████▋ | 8039/17285 [72:00:23<80:55:29, 31.51s/it] 47%|████▋ | 8040/17285 [72:00:53<80:00:45, 31.16s/it] {'loss': 1.4039, 'learning_rate': 0.00012133980107971474, 'epoch': 1.4} + 47%|████▋ | 8040/17285 [72:00:53<80:00:45, 31.16s/it] 47%|████▋ | 8041/17285 [72:01:23<78:39:25, 30.63s/it] 47%|████▋ | 8042/17285 [72:01:52<77:42:39, 30.27s/it] 47%|████▋ | 8043/17285 [72:02:22<77:48:41, 30.31s/it] 47%|████▋ | 8044/17285 [72:02:57<80:57:27, 31.54s/it] 47%|████▋ | 8045/17285 [72:03:25<78:43:34, 30.67s/it] 47%|���███▋ | 8046/17285 [72:04:03<83:48:19, 32.65s/it] 47%|████▋ | 8047/17285 [72:04:33<82:11:18, 32.03s/it] 47%|████▋ | 8048/17285 [72:05:04<81:19:30, 31.70s/it] 47%|████▋ | 8049/17285 [72:05:36<81:07:17, 31.62s/it] 47%|████▋ | 8050/17285 [72:06:10<82:50:46, 32.30s/it] {'loss': 1.4467, 'learning_rate': 0.00012115284208934969, 'epoch': 1.4} + 47%|████▋ | 8050/17285 [72:06:10<82:50:46, 32.30s/it] 47%|████▋ | 8051/17285 [72:06:48<87:14:30, 34.01s/it] 47%|████▋ | 8052/17285 [72:07:24<89:06:09, 34.74s/it] 47%|████▋ | 8053/17285 [72:08:00<90:05:53, 35.13s/it] 47%|████▋ | 8054/17285 [72:08:30<86:24:49, 33.70s/it] 47%|████▋ | 8055/17285 [72:09:04<86:00:56, 33.55s/it] 47%|████▋ | 8056/17285 [72:09:30<80:29:30, 31.40s/it] 47%|████▋ | 8057/17285 [72:09:56<76:43:42, 29.93s/it] 47%|████▋ | 8058/17285 [72:10:25<75:28:48, 29.45s/it] 47%|████▋ | 8059/17285 [72:10:51<73:18:35, 28.61s/it] 47%|████▋ | 8060/17285 [72:11:31<81:26:57, 31.79s/it] {'loss': 1.4028, 'learning_rate': 0.00012096580566670692, 'epoch': 1.4} + 47%|████▋ | 8060/17285 [72:11:31<81:26:57, 31.79s/it] 47%|████▋ | 8061/17285 [72:11:59<78:39:01, 30.70s/it] 47%|████▋ | 8062/17285 [72:12:32<80:14:42, 31.32s/it] 47%|████▋ | 8063/17285 [72:12:59<77:27:26, 30.24s/it] 47%|████▋ | 8064/17285 [72:13:35<81:39:15, 31.88s/it] 47%|████▋ | 8065/17285 [72:14:02<77:33:35, 30.28s/it] 47%|████▋ | 8066/17285 [72:14:46<88:10:22, 34.43s/it] 47%|████▋ | 8067/17285 [72:15:17<85:57:43, 33.57s/it] 47%|████▋ | 8068/17285 [72:15:43<79:47:23, 31.16s/it] 47%|████▋ | 8069/17285 [72:16:14<80:02:07, 31.26s/it] 47%|████▋ | 8070/17285 [72:16:44<78:44:14, 30.76s/it] {'loss': 1.4299, 'learning_rate': 0.00012077869249645357, 'epoch': 1.4} + 47%|████▋ | 8070/17285 [72:16:44<78:44:14, 30.76s/it] 47%|████▋ | 8071/17285 [72:17:18<81:00:03, 31.65s/it] 47%|████▋ | 8072/17285 [72:17:50<81:18:27, 31.77s/it] 47%|████▋ | 8073/17285 [72:18:36<92:10:33, 36.02s/it] 47%|████▋ | 8074/17285 [72:19:02<84:46:40, 33.13s/it] 47%|████▋ | 8075/17285 [72:19:33<82:51:09, 32.39s/it] 47%|████▋ | 8076/17285 [72:20:04<81:42:37, 31.94s/it] 47%|████▋ | 8077/17285 [72:20:36<82:00:34, 32.06s/it] 47%|████▋ | 8078/17285 [72:21:12<85:09:47, 33.30s/it] 47%|████▋ | 8079/17285 [72:21:48<87:04:27, 34.05s/it] 47%|████▋ | 8080/17285 [72:22:21<86:02:22, 33.65s/it] {'loss': 1.4264, 'learning_rate': 0.00012059150326353772, 'epoch': 1.4} + 47%|████▋ | 8080/17285 [72:22:21<86:02:22, 33.65s/it] 47%|████▋ | 8081/17285 [72:22:48<81:35:54, 31.92s/it] 47%|████▋ | 8082/17285 [72:23:33<90:59:14, 35.59s/it] 47%|████▋ | 8083/17285 [72:24:11<92:44:22, 36.28s/it] 47%|████▋ | 8084/17285 [72:24:42<89:18:58, 34.95s/it] 47%|████▋ | 8085/17285 [72:25:16<88:30:21, 34.63s/it] 47%|████▋ | 8086/17285 [72:25:41<81:03:03, 31.72s/it] 47%|████▋ | 8087/17285 [72:26:12<80:06:18, 31.35s/it] 47%|████▋ | 8088/17285 [72:26:44<80:37:40, 31.56s/it] 47%|████▋ | 8089/17285 [72:27:22<85:25:43, 33.44s/it] 47%|████▋ | 8090/17285 [72:27:50<81:29:06, 31.90s/it] {'loss': 1.453, 'learning_rate': 0.00012040423865318591, 'epoch': 1.4} + 47%|████▋ | 8090/17285 [72:27:50<81:29:06, 31.90s/it] 47%|████▋ | 8091/17285 [72:28:33<90:05:49, 35.28s/it] 47%|████▋ | 8092/17285 [72:29:03<85:43:37, 33.57s/it] 47%|████▋ | 8093/17285 [72:29:32<82:43:02, 32.40s/it] 47%|████▋ | 8094/17285 [72:30:03<81:48:52, 32.05s/it] 47%|████▋ | 8095/17285 [72:30:32<78:58:31, 30.94s/it] 47%|████▋ | 8096/17285 [72:31:03<79:21:10, 31.09s/it] 47%|████▋ | 8097/17285 [72:31:34<79:00:33, 30.96s/it] 47%|████▋ | 8098/17285 [72:32:09<82:15:09, 32.23s/it] 47%|████▋ | 8099/17285 [72:32:37<78:33:45, 30.79s/it] 47%|████▋ | 8100/17285 [72:33:01<73:27:15, 28.79s/it] {'loss': 1.4452, 'learning_rate': 0.0001202168993509006, 'epoch': 1.41} + 47%|████▋ | 8100/17285 [72:33:01<73:27:15, 28.79s/it] 47%|████▋ | 8101/17285 [72:33:33<75:47:35, 29.71s/it] 47%|████▋ | 8102/17285 [72:34:04<76:55:37, 30.16s/it] 47%|████▋ | 8103/17285 [72:34:31<75:04:21, 29.43s/it] 47%|████▋ | 8104/17285 [72:35:02<76:04:06, 29.83s/it] 47%|████▋ | 8105/17285 [72:35:40<82:20:37, 32.29s/it] 47%|████▋ | 8106/17285 [72:36:17<85:56:22, 33.71s/it] 47%|████▋ | 8107/17285 [72:36:47<83:15:04, 32.65s/it] 47%|████▋ | 8108/17285 [72:37:20<82:52:43, 32.51s/it] 47%|████▋ | 8109/17285 [72:37:50<80:50:16, 31.72s/it] 47%|████▋ | 8110/17285 [72:38:28<85:45:29, 33.65s/it] {'loss': 1.4251, 'learning_rate': 0.00012002948604245768, 'epoch': 1.41} + 47%|████▋ | 8110/17285 [72:38:28<85:45:29, 33.65s/it] 47%|████▋ | 8111/17285 [72:39:08<90:45:19, 35.61s/it] 47%|████▋ | 8112/17285 [72:39:45<92:15:27, 36.21s/it] 47%|████▋ | 8113/17285 [72:40:16<88:00:05, 34.54s/it] 47%|████▋ | 8114/17285 [72:40:47<85:14:48, 33.46s/it] 47%|████▋ | 8115/17285 [72:41:13<79:16:52, 31.12s/it] 47%|████▋ | 8116/17285 [72:41:43<78:58:41, 31.01s/it] 47%|████▋ | 8117/17285 [72:42:17<80:58:53, 31.80s/it] 47%|████▋ | 8118/17285 [72:42:46<78:51:05, 30.97s/it] 47%|████▋ | 8119/17285 [72:43:18<79:35:36, 31.26s/it] 47%|████▋ | 8120/17285 [72:43:49<79:30:03, 31.23s/it] {'loss': 1.4419, 'learning_rate': 0.00011984199941390392, 'epoch': 1.41} + 47%|████▋ | 8120/17285 [72:43:49<79:30:03, 31.23s/it] 47%|████▋ | 8121/17285 [72:44:23<81:27:24, 32.00s/it] 47%|████▋ | 8122/17285 [72:44:54<80:58:41, 31.82s/it] 47%|████▋ | 8123/17285 [72:45:26<80:30:47, 31.64s/it] 47%|████▋ | 8124/17285 [72:46:01<83:44:01, 32.90s/it] 47%|████▋ | 8125/17285 [72:46:34<83:38:07, 32.87s/it] 47%|████▋ | 8126/17285 [72:47:09<85:12:36, 33.49s/it] 47%|████▋ | 8127/17285 [72:47:45<87:16:17, 34.31s/it] 47%|████▋ | 8128/17285 [72:48:22<89:21:54, 35.13s/it] 47%|████▋ | 8129/17285 [72:48:54<86:24:13, 33.97s/it] 47%|████▋ | 8130/17285 [72:49:34<91:30:07, 35.98s/it] {'loss': 1.4453, 'learning_rate': 0.00011965444015155452, 'epoch': 1.41} + 47%|████▋ | 8130/17285 [72:49:34<91:30:07, 35.98s/it] 47%|████▋ | 8131/17285 [72:50:00<83:39:31, 32.90s/it] 47%|████▋ | 8132/17285 [72:50:40<89:06:30, 35.05s/it] 47%|████▋ | 8133/17285 [72:51:08<83:27:44, 32.83s/it] 47%|████▋ | 8134/17285 [72:51:38<81:12:38, 31.95s/it] 47%|████▋ | 8135/17285 [72:52:12<82:34:58, 32.49s/it] 47%|████▋ | 8136/17285 [72:52:47<85:07:23, 33.49s/it] 47%|████▋ | 8137/17285 [72:53:23<87:07:16, 34.28s/it] 47%|████▋ | 8138/17285 [72:53:56<86:07:04, 33.89s/it] 47%|████▋ | 8139/17285 [72:54:24<81:13:15, 31.97s/it] 47%|████▋ | 8140/17285 [72:54:58<82:49:46, 32.61s/it] {'loss': 1.4178, 'learning_rate': 0.00011946680894199054, 'epoch': 1.41} + 47%|████▋ | 8140/17285 [72:54:58<82:49:46, 32.61s/it] 47%|████▋ | 8141/17285 [72:55:34<85:06:51, 33.51s/it] 47%|████▋ | 8142/17285 [72:56:16<91:55:05, 36.19s/it] 47%|████▋ | 8143/17285 [72:56:45<86:03:51, 33.89s/it] 47%|████▋ | 8144/17285 [72:57:15<83:43:11, 32.97s/it] 47%|████▋ | 8145/17285 [72:57:47<82:38:10, 32.55s/it] 47%|████▋ | 8146/17285 [72:58:17<80:28:37, 31.70s/it] 47%|████▋ | 8147/17285 [72:58:45<77:31:57, 30.54s/it] 47%|████▋ | 8148/17285 [72:59:19<80:13:09, 31.61s/it] 47%|████▋ | 8149/17285 [72:59:45<76:13:05, 30.03s/it] 47%|████▋ | 8150/17285 [73:00:15<75:54:46, 29.92s/it] {'loss': 1.4655, 'learning_rate': 0.00011927910647205644, 'epoch': 1.41} + 47%|████▋ | 8150/17285 [73:00:15<75:54:46, 29.92s/it] 47%|████▋ | 8151/17285 [73:00:50<79:50:57, 31.47s/it] 47%|████▋ | 8152/17285 [73:01:20<78:39:24, 31.00s/it] 47%|████▋ | 8153/17285 [73:01:48<76:57:13, 30.34s/it] 47%|████▋ | 8154/17285 [73:02:23<80:23:00, 31.69s/it] 47%|████▋ | 8155/17285 [73:02:53<78:29:11, 30.95s/it] 47%|████▋ | 8156/17285 [73:03:22<77:18:43, 30.49s/it] 47%|████▋ | 8157/17285 [73:03:59<82:17:11, 32.45s/it] 47%|████▋ | 8158/17285 [73:04:30<81:20:35, 32.08s/it] 47%|████▋ | 8159/17285 [73:05:01<80:03:51, 31.58s/it] 47%|████▋ | 8160/17285 [73:05:32<79:47:36, 31.48s/it] {'loss': 1.4289, 'learning_rate': 0.00011909133342885747, 'epoch': 1.42} + 47%|████▋ | 8160/17285 [73:05:32<79:47:36, 31.48s/it] 47%|████▋ | 8161/17285 [73:06:02<78:53:52, 31.13s/it] 47%|████▋ | 8162/17285 [73:06:29<75:58:06, 29.98s/it] 47%|████▋ | 8163/17285 [73:07:02<78:13:47, 30.87s/it] 47%|████▋ | 8164/17285 [73:07:29<75:13:49, 29.69s/it] 47%|████▋ | 8165/17285 [73:08:00<75:43:09, 29.89s/it] 47%|████▋ | 8166/17285 [73:08:31<76:31:35, 30.21s/it] 47%|████▋ | 8167/17285 [73:09:01<76:21:53, 30.15s/it] 47%|████▋ | 8168/17285 [73:09:33<77:49:50, 30.73s/it] 47%|████▋ | 8169/17285 [73:10:04<78:34:09, 31.03s/it] 47%|████▋ | 8170/17285 [73:10:35<78:16:23, 30.91s/it] {'loss': 1.4673, 'learning_rate': 0.00011890349049975729, 'epoch': 1.42} + 47%|████▋ | 8170/17285 [73:10:35<78:16:23, 30.91s/it] 47%|████▋ | 8171/17285 [73:11:06<77:56:53, 30.79s/it] 47%|████▋ | 8172/17285 [73:11:38<79:11:19, 31.28s/it] 47%|████▋ | 8173/17285 [73:12:09<79:13:32, 31.30s/it] 47%|████▋ | 8174/17285 [73:12:35<74:31:53, 29.45s/it] 47%|████▋ | 8175/17285 [73:13:06<75:58:13, 30.02s/it] 47%|████▋ | 8176/17285 [73:13:40<79:13:43, 31.31s/it] 47%|████▋ | 8177/17285 [73:14:12<79:29:20, 31.42s/it] 47%|████▋ | 8178/17285 [73:14:46<81:39:29, 32.28s/it] 47%|████▋ | 8179/17285 [73:15:21<83:54:36, 33.17s/it] 47%|████▋ | 8180/17285 [73:15:56<84:40:59, 33.48s/it] {'loss': 1.4313, 'learning_rate': 0.00011871557837237537, 'epoch': 1.42} + 47%|████▋ | 8180/17285 [73:15:56<84:40:59, 33.48s/it] 47%|████▋ | 8181/17285 [73:16:26<82:05:27, 32.46s/it] 47%|████▋ | 8182/17285 [73:16:56<80:06:57, 31.68s/it] 47%|████▋ | 8183/17285 [73:17:35<85:58:58, 34.01s/it] 47%|████▋ | 8184/17285 [73:18:03<81:25:56, 32.21s/it] 47%|████▋ | 8185/17285 [73:18:46<89:19:47, 35.34s/it] 47%|████▋ | 8186/17285 [73:19:15<84:25:08, 33.40s/it] 47%|████▋ | 8187/17285 [73:19:39<77:34:00, 30.69s/it] 47%|████▋ | 8188/17285 [73:20:19<84:48:37, 33.56s/it] 47%|████▋ | 8189/17285 [73:20:49<81:46:51, 32.37s/it] 47%|████▋ | 8190/17285 [73:21:22<82:27:36, 32.64s/it] {'loss': 1.4631, 'learning_rate': 0.00011852759773458446, 'epoch': 1.42} + 47%|████▋ | 8190/17285 [73:21:22<82:27:36, 32.64s/it] 47%|████▋ | 8191/17285 [73:22:03<88:34:32, 35.06s/it] 47%|████▋ | 8192/17285 [73:22:40<89:53:29, 35.59s/it] 47%|████▋ | 8193/17285 [73:23:10<86:18:52, 34.18s/it] 47%|████▋ | 8194/17285 [73:23:39<81:49:29, 32.40s/it] 47%|████▋ | 8195/17285 [73:24:04<76:08:30, 30.16s/it] 47%|████▋ | 8196/17285 [73:24:38<79:11:58, 31.37s/it][2023-08-26 01:19:43,389] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 47%|████▋ | 8197/17285 [73:25:06<76:32:59, 30.32s/it] 47%|████▋ | 8198/17285 [73:25:37<77:24:07, 30.66s/it] 47%|████▋ | 8199/17285 [73:26:04<74:32:58, 29.54s/it] 47%|████▋ | 8200/17285 [73:26:49<85:49:47, 34.01s/it] {'loss': 1.4695, 'learning_rate': 0.00011835835715290196, 'epoch': 1.42} + 47%|████▋ | 8200/17285 [73:26:49<85:49:47, 34.01s/it] 47%|████▋ | 8201/17285 [73:27:18<82:16:40, 32.61s/it] 47%|████▋ | 8202/17285 [73:27:52<83:47:52, 33.21s/it] 47%|████▋ | 8203/17285 [73:28:18<78:19:16, 31.05s/it] 47%|████▋ | 8204/17285 [73:28:53<80:34:33, 31.94s/it] 47%|████▋ | 8205/17285 [73:29:29<84:01:18, 33.31s/it] 47%|████▋ | 8206/17285 [73:29:57<79:56:38, 31.70s/it] 47%|████▋ | 8207/17285 [73:30:28<79:48:01, 31.65s/it] 47%|████▋ | 8208/17285 [73:30:59<78:47:00, 31.25s/it] 47%|████▋ | 8209/17285 [73:31:28<77:31:54, 30.75s/it] 47%|████▋ | 8210/17285 [73:31:59<77:14:13, 30.64s/it] {'loss': 1.4487, 'learning_rate': 0.00011817024824131962, 'epoch': 1.42} + 47%|████▋ | 8210/17285 [73:31:59<77:14:13, 30.64s/it] 48%|████▊ | 8211/17285 [73:32:31<78:38:43, 31.20s/it] 48%|████▊ | 8212/17285 [73:33:05<80:27:18, 31.92s/it] 48%|████▊ | 8213/17285 [73:33:48<89:12:22, 35.40s/it] 48%|████▊ | 8214/17285 [73:34:37<99:18:13, 39.41s/it] 48%|████▊ | 8215/17285 [73:35:07<92:09:21, 36.58s/it] 48%|████▊ | 8216/17285 [73:35:40<89:08:26, 35.38s/it] 48%|████▊ | 8217/17285 [73:36:08<83:56:36, 33.33s/it] 48%|████▊ | 8218/17285 [73:36:34<78:35:07, 31.20s/it] 48%|████▊ | 8219/17285 [73:37:01<75:15:41, 29.89s/it] 48%|████▊ | 8220/17285 [73:37:35<78:24:06, 31.14s/it] {'loss': 1.4025, 'learning_rate': 0.00011798207281556853, 'epoch': 1.43} + 48%|████▊ | 8220/17285 [73:37:35<78:24:06, 31.14s/it] 48%|████▊ | 8221/17285 [73:38:12<82:42:42, 32.85s/it] 48%|████▊ | 8222/17285 [73:38:50<86:12:02, 34.24s/it] 48%|████▊ | 8223/17285 [73:39:17<81:08:01, 32.23s/it] 48%|████▊ | 8224/17285 [73:39:53<83:59:00, 33.37s/it] 48%|████▊ | 8225/17285 [73:40:24<82:14:45, 32.68s/it] 48%|████▊ | 8226/17285 [73:41:00<84:31:25, 33.59s/it] 48%|████▊ | 8227/17285 [73:41:30<82:03:32, 32.61s/it] 48%|████▊ | 8228/17285 [73:42:00<79:57:37, 31.78s/it] 48%|████▊ | 8229/17285 [73:42:34<81:37:42, 32.45s/it] 48%|████▊ | 8230/17285 [73:43:01<76:59:49, 30.61s/it] {'loss': 1.4665, 'learning_rate': 0.00011779383156448527, 'epoch': 1.43} + 48%|████▊ | 8230/17285 [73:43:01<76:59:49, 30.61s/it] 48%|████▊ | 8231/17285 [73:43:32<77:46:48, 30.93s/it] 48%|████▊ | 8232/17285 [73:44:13<85:23:59, 33.96s/it] 48%|████▊ | 8233/17285 [73:44:47<84:59:58, 33.80s/it] 48%|████▊ | 8234/17285 [73:45:21<85:15:40, 33.91s/it] 48%|████▊ | 8235/17285 [73:45:59<88:22:42, 35.16s/it] 48%|████▊ | 8236/17285 [73:46:32<86:54:47, 34.58s/it] 48%|████▊ | 8237/17285 [73:47:02<83:14:40, 33.12s/it] 48%|████▊ | 8238/17285 [73:47:34<82:35:12, 32.86s/it] 48%|████▊ | 8239/17285 [73:48:03<79:17:49, 31.56s/it] 48%|████▊ | 8240/17285 [73:48:47<88:33:49, 35.25s/it] {'loss': 1.4005, 'learning_rate': 0.00011760552517714743, 'epoch': 1.43} + 48%|████▊ | 8240/17285 [73:48:47<88:33:49, 35.25s/it] 48%|████▊ | 8241/17285 [73:49:22<88:59:22, 35.42s/it] 48%|████▊ | 8242/17285 [73:49:49<82:21:52, 32.79s/it] 48%|████▊ | 8243/17285 [73:50:21<81:47:04, 32.56s/it] 48%|████▊ | 8244/17285 [73:50:51<79:43:28, 31.75s/it] 48%|████▊ | 8245/17285 [73:51:18<76:23:55, 30.42s/it] 48%|████▊ | 8246/17285 [73:51:46<74:18:23, 29.59s/it] 48%|████▊ | 8247/17285 [73:52:23<79:39:48, 31.73s/it] 48%|████▊ | 8248/17285 [73:52:49<75:18:49, 30.00s/it] 48%|████▊ | 8249/17285 [73:53:15<72:47:51, 29.00s/it] 48%|████▊ | 8250/17285 [73:53:44<72:37:13, 28.94s/it] {'loss': 1.4488, 'learning_rate': 0.00011741715434287097, 'epoch': 1.43} + 48%|████▊ | 8250/17285 [73:53:44<72:37:13, 28.94s/it] 48%|████▊ | 8251/17285 [73:54:19<77:17:53, 30.80s/it] 48%|████▊ | 8252/17285 [73:54:51<78:12:14, 31.17s/it] 48%|████▊ | 8253/17285 [73:55:15<72:59:59, 29.10s/it] 48%|████▊ | 8254/17285 [73:55:48<75:57:48, 30.28s/it] 48%|████▊ | 8255/17285 [73:56:28<82:47:35, 33.01s/it] 48%|████▊ | 8256/17285 [73:56:58<80:43:24, 32.19s/it] 48%|████▊ | 8257/17285 [73:57:31<81:12:51, 32.38s/it] 48%|████▊ | 8258/17285 [73:58:05<82:47:09, 33.02s/it] 48%|████▊ | 8259/17285 [73:58:49<90:47:57, 36.22s/it] 48%|████▊ | 8260/17285 [73:59:20<86:52:15, 34.65s/it] {'loss': 1.4511, 'learning_rate': 0.00011722871975120782, 'epoch': 1.43} + 48%|████▊ | 8260/17285 [73:59:20<86:52:15, 34.65s/it] 48%|████▊ | 8261/17285 [74:00:00<90:45:50, 36.21s/it] 48%|████▊ | 8262/17285 [74:00:25<82:34:11, 32.94s/it] 48%|████▊ | 8263/17285 [74:01:01<84:21:33, 33.66s/it] 48%|████▊ | 8264/17285 [74:01:34<83:52:53, 33.47s/it] 48%|████▊ | 8265/17285 [74:02:04<81:18:59, 32.45s/it] 48%|████▊ | 8266/17285 [74:02:40<83:50:43, 33.47s/it] 48%|████▊ | 8267/17285 [74:03:06<78:18:56, 31.26s/it][2023-08-26 01:58:12,473] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 48%|████▊ | 8268/17285 [74:03:35<76:40:21, 30.61s/it] 48%|████▊ | 8269/17285 [74:04:05<76:34:58, 30.58s/it] 48%|████▊ | 8270/17285 [74:04:34<74:49:22, 29.88s/it] {'loss': 1.4192, 'learning_rate': 0.00011705907467624817, 'epoch': 1.44} + 48%|████▊ | 8270/17285 [74:04:34<74:49:22, 29.88s/it] 48%|████▊ | 8271/17285 [74:05:06<76:28:03, 30.54s/it] 48%|████▊ | 8272/17285 [74:05:31<72:31:13, 28.97s/it] 48%|████▊ | 8273/17285 [74:06:05<76:09:20, 30.42s/it] 48%|████▊ | 8274/17285 [74:06:35<76:13:44, 30.45s/it] 48%|████▊ | 8275/17285 [74:07:08<77:50:38, 31.10s/it] 48%|████▊ | 8276/17285 [74:07:51<86:44:51, 34.66s/it] 48%|████▊ | 8277/17285 [74:08:21<83:01:45, 33.18s/it] 48%|████▊ | 8278/17285 [74:08:51<81:05:27, 32.41s/it] 48%|████▊ | 8279/17285 [74:09:18<76:49:27, 30.71s/it] 48%|████▊ | 8280/17285 [74:09:48<76:19:21, 30.51s/it] {'loss': 1.4309, 'learning_rate': 0.00011687052084609971, 'epoch': 1.44} + 48%|████▊ | 8280/17285 [74:09:48<76:19:21, 30.51s/it] 48%|████▊ | 8281/17285 [74:10:14<73:18:15, 29.31s/it] 48%|████▊ | 8282/17285 [74:10:46<74:48:11, 29.91s/it] 48%|████▊ | 8283/17285 [74:11:15<74:04:46, 29.63s/it] 48%|████▊ | 8284/17285 [74:11:49<77:40:57, 31.07s/it] 48%|████▊ | 8285/17285 [74:12:20<77:32:33, 31.02s/it] 48%|████▊ | 8286/17285 [74:12:49<75:35:16, 30.24s/it] 48%|████▊ | 8287/17285 [74:13:23<79:00:02, 31.61s/it] 48%|████▊ | 8288/17285 [74:13:56<79:43:44, 31.90s/it] 48%|████▊ | 8289/17285 [74:14:26<78:24:33, 31.38s/it] 48%|████▊ | 8290/17285 [74:15:04<82:59:58, 33.22s/it] {'loss': 1.4495, 'learning_rate': 0.0001166819052595759, 'epoch': 1.44} + 48%|████▊ | 8290/17285 [74:15:04<82:59:58, 33.22s/it] 48%|████▊ | 8291/17285 [74:15:34<81:16:21, 32.53s/it] 48%|████▊ | 8292/17285 [74:16:02<77:39:07, 31.09s/it] 48%|████▊ | 8293/17285 [74:16:38<80:52:31, 32.38s/it] 48%|████▊ | 8294/17285 [74:17:12<82:29:19, 33.03s/it] 48%|████▊ | 8295/17285 [74:17:40<78:48:07, 31.56s/it] 48%|████▊ | 8296/17285 [74:18:11<78:31:10, 31.45s/it] 48%|████▊ | 8297/17285 [74:18:36<73:37:53, 29.49s/it] 48%|████▊ | 8298/17285 [74:19:06<73:21:37, 29.39s/it] 48%|████▊ | 8299/17285 [74:19:38<75:55:03, 30.41s/it] 48%|████▊ | 8300/17285 [74:20:13<79:20:48, 31.79s/it] {'loss': 1.4375, 'learning_rate': 0.00011649322860712455, 'epoch': 1.44} + 48%|████▊ | 8300/17285 [74:20:13<79:20:48, 31.79s/it] 48%|████▊ | 8301/17285 [74:20:40<75:15:09, 30.15s/it] 48%|████▊ | 8302/17285 [74:21:08<74:05:23, 29.69s/it] 48%|████▊ | 8303/17285 [74:21:42<76:56:56, 30.84s/it] 48%|████▊ | 8304/17285 [74:22:13<77:09:09, 30.93s/it] 48%|████▊ | 8305/17285 [74:22:40<74:34:33, 29.90s/it] 48%|████▊ | 8306/17285 [74:23:20<81:50:37, 32.81s/it] 48%|████▊ | 8307/17285 [74:23:55<83:17:55, 33.40s/it] 48%|████▊ | 8308/17285 [74:24:21<78:09:40, 31.34s/it] 48%|████▊ | 8309/17285 [74:24:51<76:49:33, 30.81s/it] 48%|████▊ | 8310/17285 [74:25:16<72:41:23, 29.16s/it] {'loss': 1.4502, 'learning_rate': 0.00011630449157941714, 'epoch': 1.44} + 48%|████▊ | 8310/17285 [74:25:16<72:41:23, 29.16s/it] 48%|████▊ | 8311/17285 [74:25:41<69:06:33, 27.72s/it] 48%|████▊ | 8312/17285 [74:26:05<66:54:25, 26.84s/it] 48%|████▊ | 8313/17285 [74:26:48<78:28:09, 31.49s/it] 48%|████▊ | 8314/17285 [74:27:16<76:17:04, 30.61s/it] 48%|████▊ | 8315/17285 [74:27:43<73:41:53, 29.58s/it] 48%|████▊ | 8316/17285 [74:28:11<71:49:28, 28.83s/it] 48%|████▊ | 8317/17285 [74:28:42<73:39:10, 29.57s/it] 48%|████▊ | 8318/17285 [74:29:16<77:23:02, 31.07s/it] 48%|████▊ | 8319/17285 [74:29:58<84:59:50, 34.13s/it] 48%|████▊ | 8320/17285 [74:30:31<84:25:30, 33.90s/it] {'loss': 1.4179, 'learning_rate': 0.00011611569486734603, 'epoch': 1.44} + 48%|████▊ | 8320/17285 [74:30:31<84:25:30, 33.90s/it] 48%|████▊ | 8321/17285 [74:30:58<79:13:46, 31.82s/it] 48%|████▊ | 8322/17285 [74:31:33<81:28:39, 32.73s/it] 48%|████▊ | 8323/17285 [74:32:03<79:28:41, 31.93s/it] 48%|████▊ | 8324/17285 [74:32:28<74:26:45, 29.91s/it] 48%|████▊ | 8325/17285 [74:32:54<71:44:21, 28.82s/it] 48%|████▊ | 8326/17285 [74:33:33<79:08:32, 31.80s/it] 48%|████▊ | 8327/17285 [74:34:05<78:59:06, 31.74s/it] 48%|████▊ | 8328/17285 [74:34:47<86:35:55, 34.81s/it] 48%|████▊ | 8329/17285 [74:35:15<81:27:07, 32.74s/it] 48%|████▊ | 8330/17285 [74:35:42<77:41:46, 31.23s/it] {'loss': 1.4581, 'learning_rate': 0.00011592683916202211, 'epoch': 1.45} + 48%|████▊ | 8330/17285 [74:35:42<77:41:46, 31.23s/it] 48%|████▊ | 8331/17285 [74:36:13<77:33:48, 31.18s/it] 48%|████▊ | 8332/17285 [74:36:40<74:04:24, 29.78s/it] 48%|████▊ | 8333/17285 [74:37:23<84:10:37, 33.85s/it] 48%|████▊ | 8334/17285 [74:37:54<81:43:15, 32.87s/it] 48%|████▊ | 8335/17285 [74:38:27<82:09:30, 33.05s/it] 48%|████▊ | 8336/17285 [74:38:59<81:26:55, 32.77s/it] 48%|████▊ | 8337/17285 [74:39:29<79:02:21, 31.80s/it] 48%|████▊ | 8338/17285 [74:40:01<79:16:01, 31.89s/it] 48%|████▊ | 8339/17285 [74:40:31<77:31:03, 31.19s/it] 48%|████▊ | 8340/17285 [74:41:09<83:10:11, 33.47s/it] {'loss': 1.4211, 'learning_rate': 0.00011573792515477222, 'epoch': 1.45} + 48%|████▊ | 8340/17285 [74:41:09<83:10:11, 33.47s/it] 48%|████▊ | 8341/17285 [74:41:47<86:33:21, 34.84s/it] 48%|████▊ | 8342/17285 [74:42:17<82:35:35, 33.25s/it] 48%|████▊ | 8343/17285 [74:42:49<81:33:21, 32.83s/it] 48%|████▊ | 8344/17285 [74:43:21<81:05:21, 32.65s/it] 48%|████▊ | 8345/17285 [74:43:50<78:13:23, 31.50s/it] 48%|████▊ | 8346/17285 [74:44:18<75:45:02, 30.51s/it] 48%|████▊ | 8347/17285 [74:44:54<79:36:08, 32.06s/it] 48%|████▊ | 8348/17285 [74:45:23<77:38:52, 31.28s/it] 48%|████▊ | 8349/17285 [74:46:00<81:38:10, 32.89s/it] 48%|████▊ | 8350/17285 [74:46:34<82:50:02, 33.37s/it] {'loss': 1.4118, 'learning_rate': 0.00011554895353713662, 'epoch': 1.45} + 48%|████▊ | 8350/17285 [74:46:34<82:50:02, 33.37s/it] 48%|████▊ | 8351/17285 [74:47:11<85:00:12, 34.25s/it] 48%|████▊ | 8352/17285 [74:47:45<85:11:35, 34.33s/it] 48%|████▊ | 8353/17285 [74:48:20<85:10:00, 34.33s/it] 48%|████▊ | 8354/17285 [74:48:47<79:59:43, 32.25s/it] 48%|████▊ | 8355/17285 [74:49:25<84:20:02, 34.00s/it] 48%|████▊ | 8356/17285 [74:49:59<84:41:19, 34.14s/it] 48%|████▊ | 8357/17285 [74:50:35<85:23:26, 34.43s/it] 48%|████▊ | 8358/17285 [74:51:02<79:54:26, 32.22s/it] 48%|████▊ | 8359/17285 [74:51:42<85:47:15, 34.60s/it] 48%|████▊ | 8360/17285 [74:52:23<90:48:03, 36.63s/it] {'loss': 1.4308, 'learning_rate': 0.00011535992500086643, 'epoch': 1.45} + 48%|████▊ | 8360/17285 [74:52:23<90:48:03, 36.63s/it] 48%|████▊ | 8361/17285 [74:52:50<83:34:30, 33.71s/it] 48%|████▊ | 8362/17285 [74:53:30<88:22:42, 35.66s/it] 48%|████▊ | 8363/17285 [74:54:01<85:03:30, 34.32s/it] 48%|████▊ | 8364/17285 [74:54:27<78:32:55, 31.70s/it] 48%|████▊ | 8365/17285 [74:55:00<79:48:35, 32.21s/it] 48%|████▊ | 8366/17285 [74:55:27<75:40:37, 30.55s/it] 48%|████▊ | 8367/17285 [74:55:54<72:43:59, 29.36s/it] 48%|████▊ | 8368/17285 [74:56:30<78:05:26, 31.53s/it] 48%|████▊ | 8369/17285 [74:57:12<85:51:25, 34.67s/it] 48%|████▊ | 8370/17285 [74:57:43<83:11:30, 33.59s/it] {'loss': 1.429, 'learning_rate': 0.0001151708402379212, 'epoch': 1.45} + 48%|████▊ | 8370/17285 [74:57:43<83:11:30, 33.59s/it] 48%|████▊ | 8371/17285 [74:58:12<79:45:27, 32.21s/it] 48%|████▊ | 8372/17285 [74:58:39<75:15:22, 30.40s/it] 48%|████▊ | 8373/17285 [74:59:14<78:42:01, 31.79s/it] 48%|████▊ | 8374/17285 [74:59:45<78:03:51, 31.54s/it] 48%|████▊ | 8375/17285 [75:00:13<76:07:54, 30.76s/it] 48%|████▊ | 8376/17285 [75:00:40<73:18:06, 29.62s/it] 48%|████▊ | 8377/17285 [75:01:17<78:30:59, 31.73s/it] 48%|████▊ | 8378/17285 [75:01:50<79:06:04, 31.97s/it] 48%|████▊ | 8379/17285 [75:02:20<77:39:16, 31.39s/it] 48%|████▊ | 8380/17285 [75:02:51<77:49:00, 31.46s/it] {'loss': 1.4262, 'learning_rate': 0.00011498169994046621, 'epoch': 1.45} + 48%|████▊ | 8380/17285 [75:02:51<77:49:00, 31.46s/it] 48%|████▊ | 8381/17285 [75:03:21<76:22:14, 30.88s/it] 48%|████▊ | 8382/17285 [75:03:52<76:45:42, 31.04s/it] 48%|████▊ | 8383/17285 [75:04:20<74:32:10, 30.14s/it] 49%|████▊ | 8384/17285 [75:04:55<77:39:37, 31.41s/it] 49%|████▊ | 8385/17285 [75:05:24<75:56:46, 30.72s/it] 49%|████▊ | 8386/17285 [75:06:00<79:44:37, 32.26s/it] 49%|████▊ | 8387/17285 [75:06:38<84:05:15, 34.02s/it] 49%|████▊ | 8388/17285 [75:07:09<81:47:41, 33.10s/it] 49%|████▊ | 8389/17285 [75:07:43<82:53:39, 33.55s/it] 49%|████▊ | 8390/17285 [75:08:14<81:06:27, 32.83s/it] {'loss': 1.4375, 'learning_rate': 0.00011479250480087011, 'epoch': 1.46} + 49%|████▊ | 8390/17285 [75:08:14<81:06:27, 32.83s/it] 49%|████▊ | 8391/17285 [75:08:49<82:31:26, 33.40s/it] 49%|████▊ | 8392/17285 [75:09:19<79:34:51, 32.22s/it] 49%|████▊ | 8393/17285 [75:09:44<74:52:26, 30.31s/it] 49%|████▊ | 8394/17285 [75:10:27<83:55:52, 33.98s/it] 49%|████▊ | 8395/17285 [75:10:53<77:42:42, 31.47s/it] 49%|████▊ | 8396/17285 [75:11:20<74:50:37, 30.31s/it] 49%|████▊ | 8397/17285 [75:12:01<82:43:31, 33.51s/it][2023-08-26 03:07:13,338] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 49%|████▊ | 8398/17285 [75:12:36<83:25:45, 33.80s/it] 49%|████▊ | 8399/17285 [75:13:15<87:43:21, 35.54s/it] 49%|████▊ | 8400/17285 [75:13:48<85:55:06, 34.81s/it] {'loss': 1.4508, 'learning_rate': 0.00011462218285760746, 'epoch': 1.46} + 49%|████▊ | 8400/17285 [75:13:48<85:55:06, 34.81s/it] 49%|████▊ | 8401/17285 [75:14:23<85:33:58, 34.67s/it] 49%|████▊ | 8402/17285 [75:14:51<80:45:41, 32.73s/it] 49%|████▊ | 8403/17285 [75:15:15<74:23:59, 30.16s/it] 49%|████▊ | 8404/17285 [75:15:57<82:54:32, 33.61s/it] 49%|████▊ | 8405/17285 [75:16:38<88:44:53, 35.98s/it] 49%|████▊ | 8406/17285 [75:17:11<86:17:50, 34.99s/it] 49%|████▊ | 8407/17285 [75:17:40<82:01:39, 33.26s/it] 49%|████▊ | 8408/17285 [75:18:13<81:45:42, 33.16s/it] 49%|████▊ | 8409/17285 [75:18:39<76:17:25, 30.94s/it] 49%|████▊ | 8410/17285 [75:19:10<76:39:02, 31.09s/it] {'loss': 1.4318, 'learning_rate': 0.00011443288542613578, 'epoch': 1.46} + 49%|████▊ | 8410/17285 [75:19:10<76:39:02, 31.09s/it] 49%|████▊ | 8411/17285 [75:19:40<75:51:32, 30.77s/it] 49%|████▊ | 8412/17285 [75:20:14<78:10:25, 31.72s/it] 49%|████▊ | 8413/17285 [75:20:41<74:23:11, 30.18s/it] 49%|████▊ | 8414/17285 [75:21:07<71:14:21, 28.91s/it] 49%|████▊ | 8415/17285 [75:21:40<74:13:12, 30.12s/it] 49%|████▊ | 8416/17285 [75:22:07<72:18:52, 29.35s/it] 49%|████▊ | 8417/17285 [75:22:41<75:38:25, 30.71s/it] 49%|████▊ | 8418/17285 [75:23:10<74:26:57, 30.23s/it] 49%|████▊ | 8419/17285 [75:23:40<73:44:55, 29.95s/it] 49%|████▊ | 8420/17285 [75:24:13<76:36:38, 31.11s/it] {'loss': 1.4416, 'learning_rate': 0.00011424353516151814, 'epoch': 1.46} + 49%|████▊ | 8420/17285 [75:24:13<76:36:38, 31.11s/it] 49%|████▊ | 8421/17285 [75:24:50<81:00:08, 32.90s/it] 49%|████▊ | 8422/17285 [75:25:24<81:25:15, 33.07s/it] 49%|████▊ | 8423/17285 [75:25:54<79:10:52, 32.17s/it] 49%|████▊ | 8424/17285 [75:26:27<80:02:09, 32.52s/it] 49%|████▊ | 8425/17285 [75:26:58<78:42:19, 31.98s/it] 49%|████▊ | 8426/17285 [75:27:30<78:22:49, 31.85s/it] 49%|████▉ | 8427/17285 [75:28:02<78:31:20, 31.91s/it] 49%|████▉ | 8428/17285 [75:28:32<77:03:52, 31.32s/it] 49%|████▉ | 8429/17285 [75:29:04<78:07:59, 31.76s/it] 49%|████▉ | 8430/17285 [75:29:35<77:06:38, 31.35s/it] {'loss': 1.4296, 'learning_rate': 0.00011405413275689179, 'epoch': 1.46} + 49%|████▉ | 8430/17285 [75:29:35<77:06:38, 31.35s/it] 49%|████▉ | 8431/17285 [75:30:02<74:22:48, 30.24s/it] 49%|████▉ | 8432/17285 [75:30:32<73:50:07, 30.02s/it] 49%|████▉ | 8433/17285 [75:31:11<80:23:56, 32.70s/it] 49%|████▉ | 8434/17285 [75:31:39<76:45:05, 31.22s/it] 49%|████▉ | 8435/17285 [75:32:04<72:22:02, 29.44s/it] 49%|████▉ | 8436/17285 [75:32:37<75:01:10, 30.52s/it] 49%|████▉ | 8437/17285 [75:33:11<77:39:32, 31.60s/it] 49%|████▉ | 8438/17285 [75:33:42<77:14:50, 31.43s/it] 49%|████▉ | 8439/17285 [75:34:20<82:10:25, 33.44s/it] 49%|████▉ | 8440/17285 [75:35:02<88:07:46, 35.87s/it] {'loss': 1.4192, 'learning_rate': 0.0001138646789055848, 'epoch': 1.46} + 49%|███���▉ | 8440/17285 [75:35:02<88:07:46, 35.87s/it] 49%|████▉ | 8441/17285 [75:35:31<83:11:20, 33.86s/it] 49%|████▉ | 8442/17285 [75:36:02<80:54:17, 32.94s/it] 49%|████▉ | 8443/17285 [75:36:32<78:33:09, 31.98s/it] 49%|████▉ | 8444/17285 [75:37:08<81:41:28, 33.26s/it] 49%|████▉ | 8445/17285 [75:37:43<83:01:04, 33.81s/it] 49%|████▉ | 8446/17285 [75:38:11<79:10:16, 32.25s/it] 49%|████▉ | 8447/17285 [75:38:41<77:17:11, 31.48s/it] 49%|████▉ | 8448/17285 [75:39:18<81:13:45, 33.09s/it] 49%|████▉ | 8449/17285 [75:39:44<75:39:16, 30.82s/it] 49%|████▉ | 8450/17285 [75:40:15<76:15:37, 31.07s/it] {'loss': 1.4411, 'learning_rate': 0.00011367517430111365, 'epoch': 1.47} + 49%|████▉ | 8450/17285 [75:40:15<76:15:37, 31.07s/it] 49%|████▉ | 8451/17285 [75:40:55<82:32:42, 33.64s/it] 49%|████▉ | 8452/17285 [75:41:28<82:21:16, 33.56s/it] 49%|████▉ | 8453/17285 [75:42:05<84:31:33, 34.45s/it] 49%|████▉ | 8454/17285 [75:42:39<84:04:00, 34.27s/it] 49%|████▉ | 8455/17285 [75:43:13<84:18:10, 34.37s/it] 49%|████▉ | 8456/17285 [75:43:41<79:24:24, 32.38s/it] 49%|████▉ | 8457/17285 [75:44:08<75:20:03, 30.72s/it] 49%|████▉ | 8458/17285 [75:44:36<73:36:48, 30.02s/it] 49%|████▉ | 8459/17285 [75:45:17<81:33:42, 33.27s/it] 49%|████▉ | 8460/17285 [75:45:44<76:46:39, 31.32s/it] {'loss': 1.407, 'learning_rate': 0.0001134856196371805, 'epoch': 1.47} + 49%|████▉ | 8460/17285 [75:45:44<76:46:39, 31.32s/it] 49%|████▉ | 8461/17285 [75:46:26<84:29:45, 34.47s/it] 49%|████▉ | 8462/17285 [75:46:51<77:43:58, 31.72s/it] 49%|████▉ | 8463/17285 [75:47:24<78:40:26, 32.10s/it] 49%|████▉ | 8464/17285 [75:47:50<73:57:05, 30.18s/it] 49%|████▉ | 8465/17285 [75:48:20<73:52:11, 30.15s/it] 49%|████▉ | 8466/17285 [75:48:58<79:38:31, 32.51s/it] 49%|████▉ | 8467/17285 [75:49:24<74:48:04, 30.54s/it] 49%|████▉ | 8468/17285 [75:49:56<76:14:24, 31.13s/it] 49%|████▉ | 8469/17285 [75:50:38<83:53:58, 34.26s/it] 49%|████▉ | 8470/17285 [75:51:13<84:38:27, 34.57s/it] {'loss': 1.447, 'learning_rate': 0.00011329601560767078, 'epoch': 1.47} + 49%|████▉ | 8470/17285 [75:51:13<84:38:27, 34.57s/it] 49%|████▉ | 8471/17285 [75:51:59<92:45:16, 37.88s/it] 49%|████▉ | 8472/17285 [75:52:26<85:13:46, 34.82s/it] 49%|████▉ | 8473/17285 [75:52:52<78:26:45, 32.05s/it] 49%|████▉ | 8474/17285 [75:53:23<77:29:42, 31.66s/it] 49%|████▉ | 8475/17285 [75:54:00<81:37:54, 33.36s/it] 49%|████▉ | 8476/17285 [75:54:35<83:07:07, 33.97s/it] 49%|████▉ | 8477/17285 [75:55:09<82:50:33, 33.86s/it] 49%|████▉ | 8478/17285 [75:55:47<85:38:00, 35.00s/it] 49%|████▉ | 8479/17285 [75:56:20<84:36:11, 34.59s/it] 49%|████▉ | 8480/17285 [75:56:48<79:39:28, 32.57s/it] {'loss': 1.4344, 'learning_rate': 0.0001131063629066507, 'epoch': 1.47} + 49%|████▉ | 8480/17285 [75:56:48<79:39:28, 32.57s/it] 49%|████▉ | 8481/17285 [75:57:15<75:34:23, 30.90s/it] 49%|████▉ | 8482/17285 [75:57:57<83:29:52, 34.15s/it] 49%|████▉ | 8483/17285 [75:58:26<79:51:11, 32.66s/it] 49%|████▉ | 8484/17285 [75:58:58<79:07:10, 32.36s/it] 49%|████▉ | 8485/17285 [75:59:28<77:24:49, 31.67s/it] 49%|████▉ | 8486/17285 [76:00:03<80:17:55, 32.85s/it] 49%|████▉ | 8487/17285 [76:00:33<78:00:30, 31.92s/it] 49%|████▉ | 8488/17285 [76:01:05<78:08:27, 31.98s/it] 49%|████▉ | 8489/17285 [76:01:37<78:13:19, 32.01s/it] 49%|████▉ | 8490/17285 [76:02:11<79:22:23, 32.49s/it] {'loss': 1.4774, 'learning_rate': 0.00011291666222836454, 'epoch': 1.47} + 49%|████▉ | 8490/17285 [76:02:11<79:22:23, 32.49s/it] 49%|████▉ | 8491/17285 [76:02:46<81:19:44, 33.29s/it] 49%|████▉ | 8492/17285 [76:03:20<82:06:55, 33.62s/it][2023-08-26 03:58:35,017] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 49%|████▉ | 8493/17285 [76:03:57<84:31:20, 34.61s/it] 49%|████▉ | 8494/17285 [76:04:25<79:42:41, 32.64s/it] 49%|████▉ | 8495/17285 [76:04:55<77:39:42, 31.81s/it] 49%|████▉ | 8496/17285 [76:05:26<77:13:40, 31.63s/it] 49%|████▉ | 8497/17285 [76:05:55<74:40:26, 30.59s/it] 49%|████▉ | 8498/17285 [76:06:25<74:28:51, 30.51s/it] 49%|████▉ | 8499/17285 [76:06:59<76:42:58, 31.43s/it] 49%|████▉ | 8500/17285 [76:07:24<72:00:14, 29.51s/it] {'loss': 1.4209, 'learning_rate': 0.00011274589117127904, 'epoch': 1.48} + 49%|████▉ | 8500/17285 [76:07:24<72:00:14, 29.51s/it] 49%|████▉ | 8501/17285 [76:07:56<73:53:36, 30.28s/it] 49%|████▉ | 8502/17285 [76:08:29<75:59:59, 31.15s/it] 49%|████▉ | 8503/17285 [76:08:57<73:52:51, 30.29s/it] 49%|████▉ | 8504/17285 [76:09:25<71:50:48, 29.46s/it] 49%|████▉ | 8505/17285 [76:09:51<69:23:05, 28.45s/it] 49%|████▉ | 8506/17285 [76:10:22<71:17:20, 29.23s/it] 49%|████▉ | 8507/17285 [76:10:53<72:47:57, 29.86s/it] 49%|████▉ | 8508/17285 [76:11:21<71:38:33, 29.39s/it] 49%|████▉ | 8509/17285 [76:11:54<73:50:23, 30.29s/it] 49%|████▉ | 8510/17285 [76:12:33<80:26:29, 33.00s/it] {'loss': 1.3907, 'learning_rate': 0.00011255610124945745, 'epoch': 1.48} + 49%|████▉ | 8510/17285 [76:12:33<80:26:29, 33.00s/it] 49%|████▉ | 8511/17285 [76:13:03<78:28:11, 32.20s/it] 49%|████▉ | 8512/17285 [76:13:33<76:40:39, 31.46s/it] 49%|████▉ | 8513/17285 [76:14:09<79:56:20, 32.81s/it] 49%|████▉ | 8514/17285 [76:14:45<81:58:50, 33.65s/it] 49%|████▉ | 8515/17285 [76:15:14<78:29:34, 32.22s/it] 49%|████▉ | 8516/17285 [76:15:40<73:53:53, 30.34s/it] 49%|████▉ | 8517/17285 [76:16:16<78:35:16, 32.27s/it] 49%|████▉ | 8518/17285 [76:16:50<79:15:57, 32.55s/it] 49%|████▉ | 8519/17285 [76:17:22<79:26:34, 32.63s/it] 49%|████▉ | 8520/17285 [76:17:52<77:37:04, 31.88s/it] {'loss': 1.4373, 'learning_rate': 0.00011236626536466241, 'epoch': 1.48} + 49%|████▉ | 8520/17285 [76:17:52<77:37:04, 31.88s/it] 49%|████▉ | 8521/17285 [76:18:19<73:45:45, 30.30s/it] 49%|████▉ | 8522/17285 [76:19:01<82:27:16, 33.87s/it] 49%|████▉ | 8523/17285 [76:19:26<75:25:07, 30.99s/it] 49%|████▉ | 8524/17285 [76:19:56<74:49:07, 30.74s/it] 49%|████▉ | 8525/17285 [76:20:26<74:17:35, 30.53s/it] 49%|████▉ | 8526/17285 [76:20:56<74:20:02, 30.55s/it] 49%|████▉ | 8527/17285 [76:21:40<83:38:47, 34.38s/it] 49%|████▉ | 8528/17285 [76:22:11<81:42:07, 33.59s/it] 49%|████▉ | 8529/17285 [76:22:44<80:49:13, 33.23s/it] 49%|████▉ | 8530/17285 [76:23:12<76:49:11, 31.59s/it] {'loss': 1.4305, 'learning_rate': 0.00011217638421180883, 'epoch': 1.48} + 49%|████▉ | 8530/17285 [76:23:12<76:49:11, 31.59s/it] 49%|████▉ | 8531/17285 [76:23:41<75:33:25, 31.07s/it] 49%|████▉ | 8532/17285 [76:24:12<75:31:50, 31.06s/it] 49%|████▉ | 8533/17285 [76:24:40<72:51:39, 29.97s/it] 49%|████▉ | 8534/17285 [76:25:10<72:55:08, 30.00s/it] 49%|████▉ | 8535/17285 [76:25:37<70:41:36, 29.09s/it] 49%|████▉ | 8536/17285 [76:26:02<68:04:10, 28.01s/it] 49%|████▉ | 8537/17285 [76:26:36<72:10:29, 29.70s/it] 49%|████▉ | 8538/17285 [76:27:15<78:54:13, 32.47s/it] 49%|████▉ | 8539/17285 [76:27:49<80:21:24, 33.08s/it] 49%|████▉ | 8540/17285 [76:28:20<78:48:29, 32.44s/it] {'loss': 1.4338, 'learning_rate': 0.00011198645848597729, 'epoch': 1.48} + 49%|████▉ | 8540/17285 [76:28:20<78:48:29, 32.44s/it] 49%|████▉ | 8541/17285 [76:28:52<77:53:37, 32.07s/it] 49%|████▉ | 8542/17285 [76:29:22<76:44:34, 31.60s/it] 49%|████▉ | 8543/17285 [76:29:48<72:21:12, 29.80s/it] 49%|████▉ | 8544/17285 [76:30:17<72:08:28, 29.71s/it] 49%|████▉ | 8545/17285 [76:30:54<76:56:02, 31.69s/it] 49%|████▉ | 8546/17285 [76:31:29<79:36:39, 32.80s/it] 49%|████▉ | 8547/17285 [76:32:04<81:28:56, 33.57s/it] 49%|████▉ | 8548/17285 [76:32:40<83:03:06, 34.22s/it] 49%|████▉ | 8549/17285 [76:33:12<81:41:07, 33.66s/it] 49%|████▉ | 8550/17285 [76:33:55<88:20:01, 36.41s/it] {'loss': 1.4363, 'learning_rate': 0.00011179648888241155, 'epoch': 1.48} + 49%|████▉ | 8550/17285 [76:33:55<88:20:01, 36.41s/it] 49%|████▉ | 8551/17285 [76:34:26<84:34:09, 34.86s/it] 49%|████▉ | 8552/17285 [76:35:09<90:29:21, 37.30s/it] 49%|████▉ | 8553/17285 [76:35:39<85:00:02, 35.04s/it] 49%|████▉ | 8554/17285 [76:36:26<93:22:49, 38.50s/it] 49%|████▉ | 8555/17285 [76:36:57<88:06:24, 36.33s/it] 49%|████▉ | 8556/17285 [76:37:30<85:34:59, 35.30s/it] 50%|████▉ | 8557/17285 [76:38:06<86:00:33, 35.48s/it] 50%|████▉ | 8558/17285 [76:38:35<81:15:52, 33.52s/it] 50%|████▉ | 8559/17285 [76:39:05<79:00:45, 32.60s/it] 50%|████▉ | 8560/17285 [76:39:47<85:18:12, 35.20s/it] {'loss': 1.4053, 'learning_rate': 0.00011160647609651597, 'epoch': 1.49} + 50%|████▉ | 8560/17285 [76:39:47<85:18:12, 35.20s/it] 50%|████▉ | 8561/17285 [76:40:17<82:11:20, 33.92s/it] 50%|████▉ | 8562/17285 [76:40:51<81:46:38, 33.75s/it] 50%|████▉ | 8563/17285 [76:41:19<77:51:54, 32.14s/it] 50%|████▉ | 8564/17285 [76:41:50<76:42:26, 31.66s/it] 50%|████▉ | 8565/17285 [76:42:26<80:13:05, 33.12s/it] 50%|████▉ | 8566/17285 [76:42:57<78:43:48, 32.51s/it] 50%|████▉ | 8567/17285 [76:43:36<83:21:55, 34.42s/it] 50%|████▉ | 8568/17285 [76:44:07<80:23:47, 33.20s/it] 50%|████▉ | 8569/17285 [76:44:36<77:55:10, 32.18s/it] 50%|████▉ | 8570/17285 [76:45:15<82:40:21, 34.15s/it] {'loss': 1.4481, 'learning_rate': 0.00011141642082385304, 'epoch': 1.49} + 50%|████▉ | 8570/17285 [76:45:15<82:40:21, 34.15s/it] 50%|████▉ | 8571/17285 [76:45:45<79:14:06, 32.73s/it] 50%|████▉ | 8572/17285 [76:46:17<78:40:34, 32.51s/it] 50%|████▉ | 8573/17285 [76:46:51<80:06:38, 33.10s/it] 50%|████▉ | 8574/17285 [76:47:31<84:48:32, 35.05s/it] 50%|████▉ | 8575/17285 [76:48:05<83:58:51, 34.71s/it] 50%|████▉ | 8576/17285 [76:48:32<78:55:58, 32.63s/it] 50%|████▉ | 8577/17285 [76:49:11<83:03:44, 34.34s/it] 50%|████▉ | 8578/17285 [76:49:45<82:44:45, 34.21s/it] 50%|████▉ | 8579/17285 [76:50:21<84:13:06, 34.83s/it] 50%|████▉ | 8580/17285 [76:50:53<82:27:01, 34.10s/it] {'loss': 1.3928, 'learning_rate': 0.00011122632376014078, 'epoch': 1.49} + 50%|████▉ | 8580/17285 [76:50:53<82:27:01, 34.10s/it] 50%|████▉ | 8581/17285 [76:51:26<81:07:39, 33.55s/it] 50%|████▉ | 8582/17285 [76:51:55<77:58:56, 32.26s/it] 50%|████▉ | 8583/17285 [76:52:21<73:34:33, 30.44s/it] 50%|████▉ | 8584/17285 [76:52:49<71:47:50, 29.71s/it] 50%|████▉ | 8585/17285 [76:53:17<70:32:26, 29.19s/it] 50%|████▉ | 8586/17285 [76:53:53<75:43:50, 31.34s/it] 50%|████▉ | 8587/17285 [76:54:34<82:27:33, 34.13s/it] 50%|████▉ | 8588/17285 [76:54:59<75:47:16, 31.37s/it] 50%|████▉ | 8589/17285 [76:55:38<81:13:35, 33.63s/it] 50%|████▉ | 8590/17285 [76:56:13<82:29:01, 34.15s/it] {'loss': 1.3817, 'learning_rate': 0.00011103618560125007, 'epoch': 1.49} + 50%|████▉ | 8590/17285 [76:56:13<82:29:01, 34.15s/it] 50%|████▉ | 8591/17285 [76:56:41<77:58:35, 32.29s/it] 50%|████▉ | 8592/17285 [76:57:17<80:38:01, 33.39s/it] 50%|████▉ | 8593/17285 [76:57:52<81:56:16, 33.94s/it] 50%|████▉ | 8594/17285 [76:58:24<80:10:27, 33.21s/it] 50%|████▉ | 8595/17285 [76:58:53<77:03:27, 31.92s/it] 50%|████▉ | 8596/17285 [76:59:29<79:54:58, 33.11s/it] 50%|████▉ | 8597/17285 [77:00:02<79:50:02, 33.08s/it] 50%|████▉ | 8598/17285 [77:00:29<75:24:12, 31.25s/it] 50%|████▉ | 8599/17285 [77:00:59<75:09:59, 31.15s/it] 50%|████▉ | 8600/17285 [77:01:29<73:47:33, 30.59s/it] {'loss': 1.4496, 'learning_rate': 0.00011084600704320238, 'epoch': 1.49} + 50%|████▉ | 8600/17285 [77:01:29<73:47:33, 30.59s/it] 50%|████▉ | 8601/17285 [77:02:06<78:39:19, 32.61s/it] 50%|████▉ | 8602/17285 [77:02:36<76:50:14, 31.86s/it] 50%|████▉ | 8603/17285 [77:03:06<75:26:26, 31.28s/it] 50%|████▉ | 8604/17285 [77:03:33<72:09:43, 29.93s/it] 50%|████▉ | 8605/17285 [77:04:05<73:35:22, 30.52s/it] 50%|████▉ | 8606/17285 [77:04:35<73:27:57, 30.47s/it] 50%|████▉ | 8607/17285 [77:05:02<70:56:51, 29.43s/it] 50%|████▉ | 8608/17285 [77:05:32<71:23:09, 29.62s/it] 50%|████▉ | 8609/17285 [77:06:07<75:06:54, 31.17s/it] 50%|████▉ | 8610/17285 [77:06:37<74:25:18, 30.88s/it] {'loss': 1.4582, 'learning_rate': 0.00011065578878216696, 'epoch': 1.49} + 50%|████▉ | 8610/17285 [77:06:37<74:25:18, 30.88s/it] 50%|████▉ | 8611/17285 [77:07:06<73:07:44, 30.35s/it] 50%|████▉ | 8612/17285 [77:07:48<81:06:15, 33.66s/it] 50%|████▉ | 8613/17285 [77:08:19<79:34:49, 33.04s/it] 50%|████▉ | 8614/17285 [77:08:46<74:54:44, 31.10s/it] 50%|████▉ | 8615/17285 [77:09:18<75:53:55, 31.52s/it] 50%|████▉ | 8616/17285 [77:09:44<71:43:11, 29.78s/it] 50%|████▉ | 8617/17285 [77:10:14<71:50:44, 29.84s/it] 50%|████▉ | 8618/17285 [77:10:53<78:45:48, 32.72s/it] 50%|████▉ | 8619/17285 [77:11:25<77:40:15, 32.27s/it] 50%|████▉ | 8620/17285 [77:11:51<73:25:01, 30.50s/it] {'loss': 1.451, 'learning_rate': 0.00011046553151445844, 'epoch': 1.5} + 50%|████▉ | 8620/17285 [77:11:51<73:25:01, 30.50s/it] 50%|████▉ | 8621/17285 [77:12:20<71:59:09, 29.91s/it] 50%|████▉ | 8622/17285 [77:12:58<78:05:55, 32.45s/it] 50%|████▉ | 8623/17285 [77:13:36<82:25:29, 34.26s/it] 50%|████▉ | 8624/17285 [77:14:03<76:47:23, 31.92s/it] 50%|████▉ | 8625/17285 [77:14:29<72:44:36, 30.24s/it] 50%|████▉ | 8626/17285 [77:14:59<72:15:34, 30.04s/it] 50%|████▉ | 8627/17285 [77:15:29<72:38:22, 30.20s/it] 50%|████▉ | 8628/17285 [77:15:55<69:34:24, 28.93s/it] 50%|████▉ | 8629/17285 [77:16:19<65:35:00, 27.28s/it] 50%|████▉ | 8630/17285 [77:16:46<65:41:36, 27.32s/it] {'loss': 1.469, 'learning_rate': 0.0001102752359365342, 'epoch': 1.5} + 50%|████▉ | 8630/17285 [77:16:46<65:41:36, 27.32s/it] 50%|████▉ | 8631/17285 [77:17:20<70:01:20, 29.13s/it] 50%|████▉ | 8632/17285 [77:18:01<79:14:31, 32.97s/it] 50%|████▉ | 8633/17285 [77:18:30<75:58:44, 31.61s/it] 50%|████▉ | 8634/17285 [77:18:56<71:42:47, 29.84s/it] 50%|████▉ | 8635/17285 [77:19:30<74:44:43, 31.11s/it] 50%|████▉ | 8636/17285 [77:20:05<77:33:02, 32.28s/it] 50%|████▉ | 8637/17285 [77:20:31<73:21:42, 30.54s/it] 50%|████▉ | 8638/17285 [77:20:57<70:04:54, 29.18s/it] 50%|████▉ | 8639/17285 [77:21:24<68:15:34, 28.42s/it] 50%|████▉ | 8640/17285 [77:21:56<70:47:15, 29.48s/it] {'loss': 1.4299, 'learning_rate': 0.00011008490274499193, 'epoch': 1.5} + 50%|████▉ | 8640/17285 [77:21:56<70:47:15, 29.48s/it] 50%|████▉ | 8641/17285 [77:22:26<71:20:32, 29.71s/it] 50%|████▉ | 8642/17285 [77:22:57<72:21:45, 30.14s/it] 50%|█████ | 8643/17285 [77:23:28<72:32:28, 30.22s/it] 50%|█████ | 8644/17285 [77:23:58<72:42:35, 30.29s/it] 50%|█████ | 8645/17285 [77:24:29<73:12:18, 30.50s/it] 50%|█████ | 8646/17285 [77:25:05<77:12:23, 32.17s/it] 50%|█████ | 8647/17285 [77:25:32<73:28:15, 30.62s/it] 50%|█████ | 8648/17285 [77:26:00<71:50:04, 29.94s/it] 50%|█████ | 8649/17285 [77:26:28<69:57:41, 29.16s/it] 50%|█████ | 8650/17285 [77:27:00<72:05:56, 30.06s/it] {'loss': 1.4298, 'learning_rate': 0.00010989453263656697, 'epoch': 1.5} + 50%|█████ | 8650/17285 [77:27:00<72:05:56, 30.06s/it] 50%|█████ | 8651/17285 [77:27:30<71:56:04, 29.99s/it] 50%|█████ | 8652/17285 [77:27:58<70:42:54, 29.49s/it] 50%|█████ | 8653/17285 [77:28:35<75:50:23, 31.63s/it] 50%|█████ | 8654/17285 [77:29:05<75:07:35, 31.34s/it] 50%|█████ | 8655/17285 [77:29:42<79:10:37, 33.03s/it] 50%|█████ | 8656/17285 [77:30:11<76:20:54, 31.85s/it] 50%|█████ | 8657/17285 [77:30:39<73:33:43, 30.69s/it] 50%|█████ | 8658/17285 [77:31:07<71:15:25, 29.74s/it][2023-08-26 05:26:17,912] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 50%|█████ | 8659/17285 [77:31:40<73:46:12, 30.79s/it] 50%|█████ | 8660/17285 [77:32:16<77:37:12, 32.40s/it] {'loss': 1.4192, 'learning_rate': 0.00010972316855101048, 'epoch': 1.5} + 50%|█████ | 8660/17285 [77:32:16<77:37:12, 32.40s/it] 50%|█████ | 8661/17285 [77:32:50<78:07:56, 32.62s/it] 50%|█████ | 8662/17285 [77:33:27<81:38:20, 34.08s/it] 50%|█████ | 8663/17285 [77:33:57<78:20:10, 32.71s/it] 50%|█████ | 8664/17285 [77:34:30<78:36:42, 32.83s/it] 50%|█████ | 8665/17285 [77:34:58<75:29:35, 31.53s/it] 50%|█████ | 8666/17285 [77:35:31<76:36:26, 32.00s/it] 50%|█████ | 8667/17285 [77:36:05<77:56:43, 32.56s/it] 50%|█████ | 8668/17285 [77:36:33<74:21:21, 31.06s/it] 50%|█████ | 8669/17285 [77:36:58<70:26:03, 29.43s/it] 50%|█████ | 8670/17285 [77:37:27<70:11:51, 29.33s/it] {'loss': 1.4405, 'learning_rate': 0.00010953273022049615, 'epoch': 1.5} + 50%|█████ | 8670/17285 [77:37:27<70:11:51, 29.33s/it] 50%|█████ | 8671/17285 [77:38:04<75:24:33, 31.52s/it] 50%|█████ | 8672/17285 [77:38:41<79:36:46, 33.28s/it] 50%|█████ | 8673/17285 [77:39:07<74:00:52, 30.94s/it] 50%|█████ | 8674/17285 [77:39:37<73:24:59, 30.69s/it] 50%|█████ | 8675/17285 [77:40:07<72:41:15, 30.39s/it] 50%|█████ | 8676/17285 [77:40:40<75:02:55, 31.38s/it] 50%|█████ | 8677/17285 [77:41:11<74:22:23, 31.10s/it] 50%|█████ | 8678/17285 [77:41:46<77:12:14, 32.29s/it] 50%|█████ | 8679/17285 [77:42:12<72:34:31, 30.36s/it] 50%|█████ | 8680/17285 [77:42:39<70:14:16, 29.38s/it] {'loss': 1.4636, 'learning_rate': 0.00010934225699438665, 'epoch': 1.51} + 50%|█████ | 8680/17285 [77:42:39<70:14:16, 29.38s/it] 50%|█████ | 8681/17285 [77:43:13<73:26:00, 30.73s/it] 50%|█████ | 8682/17285 [77:43:43<73:22:18, 30.70s/it] 50%|█████ | 8683/17285 [77:44:16<74:53:18, 31.34s/it] 50%|█████ | 8684/17285 [77:44:49<75:42:24, 31.69s/it] 50%|█████ | 8685/17285 [77:45:26<79:26:32, 33.25s/it][2023-08-26 05:40:29,236] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 50%|█████ | 8686/17285 [77:45:52<74:12:42, 31.07s/it] 50%|█████ | 8687/17285 [77:46:18<71:11:24, 29.81s/it] 50%|█████ | 8688/17285 [77:46:43<67:13:12, 28.15s/it] 50%|█████ | 8689/17285 [77:47:12<68:06:51, 28.53s/it] 50%|█████ | 8690/17285 [77:47:40<67:37:19, 28.32s/it] {'loss': 1.4231, 'learning_rate': 0.00010917080183142705, 'epoch': 1.51} + 50%|█████ | 8690/17285 [77:47:40<67:37:19, 28.32s/it] 50%|█████ | 8691/17285 [77:48:08<67:35:15, 28.31s/it] 50%|█████ | 8692/17285 [77:48:41<70:54:50, 29.71s/it] 50%|█████ | 8693/17285 [77:49:17<74:56:42, 31.40s/it] 50%|█████ | 8694/17285 [77:49:54<79:03:03, 33.13s/it] 50%|█████ | 8695/17285 [77:50:25<77:27:10, 32.46s/it] 50%|█████ | 8696/17285 [77:50:56<77:02:01, 32.29s/it] 50%|█████ | 8697/17285 [77:51:35<81:35:44, 34.20s/it] 50%|█████ | 8698/17285 [77:52:05<78:48:37, 33.04s/it] 50%|█████ | 8699/17285 [77:52:39<78:57:29, 33.11s/it] 50%|█████ | 8700/17285 [77:53:15<81:23:05, 34.13s/it] {'loss': 1.4357, 'learning_rate': 0.00010898026422470837, 'epoch': 1.51} + 50%|█████ | 8700/17285 [77:53:15<81:23:05, 34.13s/it] 50%|█████ | 8701/17285 [77:53:49<81:05:58, 34.01s/it] 50%|█████ | 8702/17285 [77:54:17<76:39:39, 32.15s/it] 50%|█████ | 8703/17285 [77:54:51<78:01:51, 32.73s/it] 50%|█████ | 8704/17285 [77:55:23<77:19:50, 32.44s/it] 50%|█████ | 8705/17285 [77:56:09<86:59:16, 36.50s/it] 50%|█████ | 8706/17285 [77:56:37<80:57:58, 33.98s/it] 50%|█████ | 8707/17285 [77:57:11<81:28:37, 34.19s/it] 50%|█████ | 8708/17285 [77:57:46<81:35:27, 34.25s/it] 50%|█████ | 8709/17285 [77:58:17<79:05:43, 33.20s/it] 50%|█████ | 8710/17285 [77:58:49<78:50:29, 33.10s/it] {'loss': 1.4173, 'learning_rate': 0.00010878969374475633, 'epoch': 1.51} + 50%|█████ | 8710/17285 [77:58:49<78:50:29, 33.10s/it] 50%|█████ | 8711/17285 [77:59:25<80:43:05, 33.89s/it] 50%|█████ | 8712/17285 [77:59:57<79:30:41, 33.39s/it] 50%|█████ | 8713/17285 [78:00:35<82:35:05, 34.68s/it] 50%|█████ | 8714/17285 [78:01:04<78:17:51, 32.89s/it] 50%|█████ | 8715/17285 [78:01:37<78:45:25, 33.08s/it] 50%|█████ | 8716/17285 [78:02:14<81:04:55, 34.06s/it] 50%|█████ | 8717/17285 [78:02:45<79:20:49, 33.34s/it] 50%|█████ | 8718/17285 [78:03:23<82:47:15, 34.79s/it] 50%|█████ | 8719/17285 [78:03:49<76:18:17, 32.07s/it] 50%|█████ | 8720/17285 [78:04:22<76:38:09, 32.21s/it] {'loss': 1.4286, 'learning_rate': 0.00010859909108917496, 'epoch': 1.51} + 50%|█████ | 8720/17285 [78:04:22<76:38:09, 32.21s/it] 50%|█████ | 8721/17285 [78:04:52<75:17:34, 31.65s/it] 50%|█████ | 8722/17285 [78:05:22<74:15:57, 31.22s/it] 50%|█████ | 8723/17285 [78:05:52<72:56:06, 30.67s/it] 50%|█████ | 8724/17285 [78:06:28<76:57:25, 32.36s/it] 50%|█████ | 8725/17285 [78:07:02<78:03:29, 32.83s/it] 50%|█████ | 8726/17285 [78:07:34<77:25:37, 32.57s/it] 50%|█████ | 8727/17285 [78:08:12<81:33:44, 34.31s/it] 50%|█████ | 8728/17285 [78:08:47<82:08:56, 34.56s/it] 51%|█████ | 8729/17285 [78:09:17<78:27:54, 33.01s/it] 51%|█████ | 8730/17285 [78:09:54<81:06:37, 34.13s/it] {'loss': 1.4365, 'learning_rate': 0.00010840845695568593, 'epoch': 1.52} + 51%|█████ | 8730/17285 [78:09:54<81:06:37, 34.13s/it] 51%|█████ | 8731/17285 [78:10:25<79:14:40, 33.35s/it] 51%|█████ | 8732/17285 [78:10:58<78:40:17, 33.11s/it] 51%|█████ | 8733/17285 [78:11:32<79:20:17, 33.40s/it] 51%|█████ | 8734/17285 [78:11:59<74:53:21, 31.53s/it] 51%|█████ | 8735/17285 [78:12:31<75:24:37, 31.75s/it] 51%|█████ | 8736/17285 [78:13:02<75:00:56, 31.59s/it] 51%|█████ | 8737/17285 [78:13:36<76:24:01, 32.18s/it] 51%|█████ | 8738/17285 [78:14:05<74:21:14, 31.32s/it] 51%|█████ | 8739/17285 [78:14:47<81:51:33, 34.48s/it] 51%|█████ | 8740/17285 [78:15:23<83:01:28, 34.98s/it] {'loss': 1.4292, 'learning_rate': 0.00010821779204212623, 'epoch': 1.52} + 51%|█████ | 8740/17285 [78:15:23<83:01:28, 34.98s/it] 51%|█████ | 8741/17285 [78:15:56<81:23:50, 34.30s/it] 51%|█████ | 8742/17285 [78:16:32<82:20:05, 34.70s/it] 51%|█████ | 8743/17285 [78:17:06<81:52:41, 34.51s/it] 51%|█████ | 8744/17285 [78:17:33<77:07:08, 32.51s/it][2023-08-26 06:12:45,703] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 51%|█████ | 8745/17285 [78:18:08<78:34:40, 33.12s/it] 51%|█████ | 8746/17285 [78:18:54<87:49:29, 37.03s/it] 51%|█████ | 8747/17285 [78:19:19<79:14:00, 33.41s/it] 51%|█████ | 8748/17285 [78:19:51<77:50:15, 32.82s/it] 51%|█████ | 8749/17285 [78:20:22<77:10:00, 32.54s/it] 51%|█████ | 8750/17285 [78:20:56<77:36:32, 32.73s/it] {'loss': 1.4429, 'learning_rate': 0.00010804616787981517, 'epoch': 1.52} + 51%|█████ | 8750/17285 [78:20:56<77:36:32, 32.73s/it] 51%|█████ | 8751/17285 [78:21:25<74:52:20, 31.58s/it] 51%|█████ | 8752/17285 [78:21:59<77:05:36, 32.53s/it] 51%|█████ | 8753/17285 [78:22:34<78:39:43, 33.19s/it] 51%|█████ | 8754/17285 [78:23:00<73:42:45, 31.11s/it] 51%|█████ | 8755/17285 [78:23:36<77:07:34, 32.55s/it] 51%|█████ | 8756/17285 [78:24:01<71:49:50, 30.32s/it] 51%|█████ | 8757/17285 [78:24:37<75:37:53, 31.93s/it] 51%|█████ | 8758/17285 [78:25:12<78:06:05, 32.97s/it] 51%|█████ | 8759/17285 [78:25:48<80:04:41, 33.81s/it] 51%|█████ | 8760/17285 [78:26:21<79:05:01, 33.40s/it] {'loss': 1.43, 'learning_rate': 0.00010785544640706349, 'epoch': 1.52} + 51%|█████ | 8760/17285 [78:26:21<79:05:01, 33.40s/it] 51%|█████ | 8761/17285 [78:26:51<76:42:57, 32.40s/it] 51%|█████ | 8762/17285 [78:27:32<83:00:04, 35.06s/it] 51%|█████ | 8763/17285 [78:27:59<77:37:50, 32.79s/it] 51%|█████ | 8764/17285 [78:28:27<73:57:47, 31.25s/it] 51%|█████ | 8765/17285 [78:28:54<70:46:22, 29.90s/it] 51%|█████ | 8766/17285 [78:29:23<70:03:07, 29.60s/it] 51%|█████ | 8767/17285 [78:29:57<73:38:05, 31.12s/it] 51%|█████ | 8768/17285 [78:30:27<72:41:49, 30.73s/it] 51%|█████ | 8769/17285 [78:30:55<70:55:37, 29.98s/it] 51%|█████ | 8770/17285 [78:31:22<68:07:46, 28.80s/it] {'loss': 1.4428, 'learning_rate': 0.0001076646961785964, 'epoch': 1.52} + 51%|█████ | 8770/17285 [78:31:22<68:07:46, 28.80s/it] 51%|█████ | 8771/17285 [78:31:52<69:36:26, 29.43s/it] 51%|█████ | 8772/17285 [78:32:23<70:06:36, 29.65s/it] 51%|█████ | 8773/17285 [78:32:49<68:01:41, 28.77s/it] 51%|█████ | 8774/17285 [78:33:20<69:06:37, 29.23s/it] 51%|█████ | 8775/17285 [78:33:54<72:34:56, 30.70s/it] 51%|█████ | 8776/17285 [78:34:25<73:00:17, 30.89s/it] 51%|█████ | 8777/17285 [78:34:58<74:11:21, 31.39s/it] 51%|█████ | 8778/17285 [78:35:25<71:22:06, 30.20s/it] 51%|█████ | 8779/17285 [78:36:01<75:32:28, 31.97s/it] 51%|█████ | 8780/17285 [78:36:28<71:34:43, 30.30s/it] {'loss': 1.4511, 'learning_rate': 0.0001074739178926758, 'epoch': 1.52} + 51%|█████ | 8780/17285 [78:36:28<71:34:43, 30.30s/it] 51%|█████ | 8781/17285 [78:36:55<69:46:27, 29.54s/it] 51%|█████ | 8782/17285 [78:37:23<68:19:16, 28.93s/it] 51%|█████ | 8783/17285 [78:37:59<73:47:50, 31.25s/it] 51%|█████ | 8784/17285 [78:38:36<77:43:53, 32.92s/it] 51%|█████ | 8785/17285 [78:39:09<77:40:25, 32.90s/it] 51%|█████ | 8786/17285 [78:39:44<79:11:54, 33.55s/it] 51%|█████ | 8787/17285 [78:40:16<78:07:58, 33.10s/it] 51%|█████ | 8788/17285 [78:40:46<75:55:32, 32.17s/it] 51%|█████ | 8789/17285 [78:41:21<77:37:20, 32.89s/it] 51%|█████ | 8790/17285 [78:41:54<77:36:15, 32.89s/it] {'loss': 1.4283, 'learning_rate': 0.00010728311224766634, 'epoch': 1.53} + 51%|█████ | 8790/17285 [78:41:54<77:36:15, 32.89s/it] 51%|█████ | 8791/17285 [78:42:30<79:42:07, 33.78s/it] 51%|█████ | 8792/17285 [78:42:59<76:52:08, 32.58s/it] 51%|█████ | 8793/17285 [78:43:34<78:31:35, 33.29s/it] 51%|█████ | 8794/17285 [78:44:09<79:17:56, 33.62s/it] 51%|█████ | 8795/17285 [78:44:47<82:55:49, 35.16s/it] 51%|█████ | 8796/17285 [78:45:23<83:22:50, 35.36s/it] 51%|█████ | 8797/17285 [78:45:56<81:13:40, 34.45s/it] 51%|█████ | 8798/17285 [78:46:28<80:04:06, 33.96s/it] 51%|█████ | 8799/17285 [78:47:03<80:47:45, 34.28s/it] 51%|█████ | 8800/17285 [78:47:42<83:51:59, 35.58s/it] {'loss': 1.4041, 'learning_rate': 0.00010709227994203286, 'epoch': 1.53} + 51%|█████ | 8800/17285 [78:47:42<83:51:59, 35.58s/it] 51%|█████ | 8801/17285 [78:48:21<86:01:37, 36.50s/it] 51%|█████ | 8802/17285 [78:48:49<80:11:24, 34.03s/it] 51%|█████ | 8803/17285 [78:49:36<89:01:52, 37.79s/it] 51%|█████ | 8804/17285 [78:50:09<85:56:52, 36.48s/it] 51%|█████ | 8805/17285 [78:50:40<82:11:10, 34.89s/it] 51%|█████ | 8806/17285 [78:51:07<76:18:01, 32.40s/it] 51%|█████ | 8807/17285 [78:51:33<71:41:22, 30.44s/it] 51%|█████ | 8808/17285 [78:52:13<78:52:13, 33.49s/it] 51%|█████ | 8809/17285 [78:52:47<79:07:19, 33.61s/it] 51%|█████ | 8810/17285 [78:53:18<77:30:45, 32.93s/it] {'loss': 1.4574, 'learning_rate': 0.00010690142167433773, 'epoch': 1.53} + 51%|█████ | 8810/17285 [78:53:18<77:30:45, 32.93s/it] 51%|█████ | 8811/17285 [78:53:52<77:38:04, 32.98s/it] 51%|█████ | 8812/17285 [78:54:17<72:22:50, 30.75s/it] 51%|█████ | 8813/17285 [78:54:50<73:34:07, 31.26s/it] 51%|█████ | 8814/17285 [78:55:15<69:24:58, 29.50s/it] 51%|█████ | 8815/17285 [78:55:45<69:32:31, 29.56s/it] 51%|█████ | 8816/17285 [78:56:16<70:39:37, 30.04s/it] 51%|█████ | 8817/17285 [78:56:44<69:40:47, 29.62s/it] 51%|█████ | 8818/17285 [78:57:12<68:17:55, 29.04s/it] 51%|█████ | 8819/17285 [78:57:45<70:41:24, 30.06s/it] 51%|█████ | 8820/17285 [78:58:09<67:01:11, 28.50s/it] {'loss': 1.4421, 'learning_rate': 0.00010671053814323834, 'epoch': 1.53} + 51%|█████ | 8820/17285 [78:58:09<67:01:11, 28.50s/it] 51%|█████ | 8821/17285 [78:58:36<65:39:26, 27.93s/it] 51%|█████ | 8822/17285 [78:59:04<65:37:20, 27.91s/it] 51%|█████ | 8823/17285 [78:59:33<66:34:00, 28.32s/it] 51%|█████ | 8824/17285 [79:00:03<67:54:40, 28.89s/it] 51%|█████ | 8825/17285 [79:00:40<73:11:58, 31.15s/it] 51%|█████ | 8826/17285 [79:01:09<71:49:18, 30.57s/it] 51%|█████ | 8827/17285 [79:01:41<72:36:17, 30.90s/it] 51%|█████ | 8828/17285 [79:02:11<72:30:54, 30.87s/it] 51%|█████ | 8829/17285 [79:02:39<69:49:00, 29.72s/it] 51%|█████ | 8830/17285 [79:03:10<70:53:54, 30.19s/it] {'loss': 1.453, 'learning_rate': 0.00010651963004748471, 'epoch': 1.53} + 51%|█████ | 8830/17285 [79:03:10<70:53:54, 30.19s/it] 51%|█████ | 8831/17285 [79:03:46<74:48:11, 31.85s/it] 51%|█████ | 8832/17285 [79:04:20<76:47:09, 32.70s/it] 51%|█████ | 8833/17285 [79:04:51<75:24:51, 32.12s/it] 51%|█████ | 8834/17285 [79:05:23<75:02:07, 31.96s/it] 51%|█████ | 8835/17285 [79:05:55<75:18:11, 32.08s/it] 51%|█████ | 8836/17285 [79:06:22<71:36:34, 30.51s/it] 51%|█████ | 8837/17285 [79:06:53<72:19:09, 30.82s/it] 51%|█████ | 8838/17285 [79:07:31<76:55:04, 32.78s/it] 51%|█████ | 8839/17285 [79:08:05<78:15:37, 33.36s/it] 51%|█████ | 8840/17285 [79:08:34<74:49:05, 31.89s/it] {'loss': 1.4239, 'learning_rate': 0.00010632869808591662, 'epoch': 1.53} + 51%|█████ | 8840/17285 [79:08:34<74:49:05, 31.89s/it] 51%|█████ | 8841/17285 [79:09:10<77:41:14, 33.12s/it] 51%|█████ | 8842/17285 [79:09:49<81:44:00, 34.85s/it] 51%|█████ | 8843/17285 [79:10:23<81:17:51, 34.67s/it] 51%|█████ | 8844/17285 [79:10:57<80:36:24, 34.38s/it] 51%|█████ | 8845/17285 [79:11:27<77:56:01, 33.24s/it] 51%|█████ | 8846/17285 [79:12:01<78:25:12, 33.45s/it] 51%|█████ | 8847/17285 [79:12:29<74:45:22, 31.89s/it] 51%|█████ | 8848/17285 [79:13:00<74:02:52, 31.60s/it] 51%|█████ | 8849/17285 [79:13:34<75:38:35, 32.28s/it] 51%|█████ | 8850/17285 [79:14:10<78:24:58, 33.47s/it] {'loss': 1.4069, 'learning_rate': 0.00010613774295746124, 'epoch': 1.54} + 51%|█████ | 8850/17285 [79:14:10<78:24:58, 33.47s/it] 51%|█████ | 8851/17285 [79:14:38<73:55:13, 31.55s/it] 51%|█████ | 8852/17285 [79:15:14<77:01:18, 32.88s/it] 51%|█████ | 8853/17285 [79:15:48<78:07:38, 33.36s/it] 51%|█████ | 8854/17285 [79:16:28<82:52:22, 35.39s/it] 51%|█████ | 8855/17285 [79:17:07<85:17:08, 36.42s/it] 51%|█████ | 8856/17285 [79:17:38<81:47:51, 34.94s/it] 51%|█████ | 8857/17285 [79:18:08<78:02:27, 33.34s/it] 51%|█████ | 8858/17285 [79:18:36<74:28:47, 31.82s/it] 51%|█████▏ | 8859/17285 [79:19:07<74:00:46, 31.62s/it] 51%|█████▏ | 8860/17285 [79:19:44<77:38:09, 33.17s/it] {'loss': 1.4335, 'learning_rate': 0.0001059467653611306, 'epoch': 1.54} + 51%|█████▏ | 8860/17285 [79:19:44<77:38:09, 33.17s/it] 51%|█████▏ | 8861/17285 [79:20:23<81:12:20, 34.70s/it] 51%|█████▏ | 8862/17285 [79:20:54<78:55:09, 33.73s/it] 51%|█████▏ | 8863/17285 [79:21:29<79:39:50, 34.05s/it] 51%|█████▏ | 8864/17285 [79:21:54<73:28:47, 31.41s/it] 51%|█████▏ | 8865/17285 [79:22:22<71:00:31, 30.36s/it] 51%|█████▏ | 8866/17285 [79:22:57<74:16:50, 31.76s/it] 51%|█████▏ | 8867/17285 [79:23:32<76:40:35, 32.79s/it] 51%|█████▏ | 8868/17285 [79:24:06<77:37:14, 33.20s/it] 51%|█████▏ | 8869/17285 [79:24:34<73:57:43, 31.64s/it] 51%|█████▏ | 8870/17285 [79:25:01<70:35:23, 30.20s/it] {'loss': 1.4341, 'learning_rate': 0.00010575576599601895, 'epoch': 1.54} + 51%|█████▏ | 8870/17285 [79:25:01<70:35:23, 30.20s/it] 51%|█████▏ | 8871/17285 [79:25:31<70:05:36, 29.99s/it] 51%|█████▏ | 8872/17285 [79:26:05<72:47:13, 31.15s/it] 51%|█████▏ | 8873/17285 [79:26:37<73:39:20, 31.52s/it] 51%|█████▏ | 8874/17285 [79:27:08<73:27:41, 31.44s/it] 51%|█████▏ | 8875/17285 [79:27:49<79:59:43, 34.24s/it] 51%|█████▏ | 8876/17285 [79:28:19<77:22:58, 33.13s/it] 51%|█████▏ | 8877/17285 [79:28:50<75:52:24, 32.49s/it] 51%|█████▏ | 8878/17285 [79:29:22<75:04:20, 32.15s/it] 51%|█████▏ | 8879/17285 [79:29:52<73:28:52, 31.47s/it] 51%|█████▏ | 8880/17285 [79:30:24<74:12:11, 31.78s/it] {'loss': 1.3864, 'learning_rate': 0.00010556474556130025, 'epoch': 1.54} + 51%|█████▏ | 8880/17285 [79:30:24<74:12:11, 31.78s/it] 51%|█████▏ | 8881/17285 [79:30:57<75:12:13, 32.21s/it] 51%|█████▏ | 8882/17285 [79:31:31<76:03:45, 32.59s/it] 51%|█████▏ | 8883/17285 [79:31:59<72:54:29, 31.24s/it] 51%|█████▏ | 8884/17285 [79:32:33<74:50:35, 32.07s/it] 51%|█████▏ | 8885/17285 [79:33:06<75:42:39, 32.45s/it] 51%|█████▏ | 8886/17285 [79:33:37<74:10:41, 31.79s/it] 51%|█████▏ | 8887/17285 [79:34:02<69:36:56, 29.84s/it] 51%|█████▏ | 8888/17285 [79:34:37<73:30:16, 31.51s/it] 51%|█████▏ | 8889/17285 [79:35:07<72:33:41, 31.11s/it] 51%|█████▏ | 8890/17285 [79:35:38<71:52:49, 30.82s/it] {'loss': 1.4389, 'learning_rate': 0.00010537370475622554, 'epoch': 1.54} + 51%|█████▏ | 8890/17285 [79:35:38<71:52:49, 30.82s/it] 51%|█████▏ | 8891/17285 [79:36:07<71:10:19, 30.52s/it] 51%|█████▏ | 8892/17285 [79:36:39<71:48:38, 30.80s/it] 51%|█████▏ | 8893/17285 [79:37:09<70:57:51, 30.44s/it] 51%|█████▏ | 8894/17285 [79:37:49<77:59:09, 33.46s/it] 51%|█████▏ | 8895/17285 [79:38:22<77:55:46, 33.44s/it] 51%|█████▏ | 8896/17285 [79:38:57<78:34:29, 33.72s/it] 51%|█████▏ | 8897/17285 [79:39:30<78:12:54, 33.57s/it] 51%|█████▏ | 8898/17285 [79:40:04<78:20:10, 33.62s/it] 51%|█████▏ | 8899/17285 [79:40:29<72:31:52, 31.14s/it] 51%|█████▏ | 8900/17285 [79:41:02<74:02:07, 31.79s/it] {'loss': 1.4895, 'learning_rate': 0.00010518264428012043, 'epoch': 1.54} + 51%|█████▏ | 8900/17285 [79:41:02<74:02:07, 31.79s/it] 51%|█████▏ | 8901/17285 [79:41:38<77:01:37, 33.07s/it] 52%|█████▏ | 8902/17285 [79:42:13<77:53:02, 33.45s/it] 52%|█████▏ | 8903/17285 [79:42:43<75:45:51, 32.54s/it] 52%|█████▏ | 8904/17285 [79:43:22<80:24:29, 34.54s/it] 52%|█████▏ | 8905/17285 [79:43:53<77:39:38, 33.36s/it] 52%|█████▏ | 8906/17285 [79:44:26<77:14:10, 33.18s/it] 52%|█████▏ | 8907/17285 [79:45:02<79:17:05, 34.07s/it] 52%|█████▏ | 8908/17285 [79:45:34<78:10:28, 33.60s/it] 52%|█████▏ | 8909/17285 [79:46:02<73:41:04, 31.67s/it] 52%|█████▏ | 8910/17285 [79:46:35<74:36:10, 32.07s/it] {'loss': 1.4252, 'learning_rate': 0.00010499156483238262, 'epoch': 1.55} + 52%|█████▏ | 8910/17285 [79:46:35<74:36:10, 32.07s/it] 52%|█████▏ | 8911/17285 [79:47:08<75:22:37, 32.40s/it] 52%|█████▏ | 8912/17285 [79:47:38<73:59:09, 31.81s/it] 52%|█████▏ | 8913/17285 [79:48:21<81:53:02, 35.21s/it] 52%|█████▏ | 8914/17285 [79:48:54<79:51:40, 34.34s/it] 52%|█████▏ | 8915/17285 [79:49:20<74:04:07, 31.86s/it] 52%|█████▏ | 8916/17285 [79:49:49<72:05:48, 31.01s/it] 52%|█████▏ | 8917/17285 [79:50:28<77:37:52, 33.40s/it] 52%|█████▏ | 8918/17285 [79:50:57<74:55:59, 32.24s/it] 52%|█████▏ | 8919/17285 [79:51:35<78:31:30, 33.79s/it] 52%|█████▏ | 8920/17285 [79:52:12<80:55:47, 34.83s/it] {'loss': 1.4495, 'learning_rate': 0.00010480046711247918, 'epoch': 1.55} + 52%|█████▏ | 8920/17285 [79:52:12<80:55:47, 34.83s/it] 52%|█████▏ | 8921/17285 [79:52:38<75:00:02, 32.28s/it] 52%|█████▏ | 8922/17285 [79:53:17<79:17:26, 34.13s/it] 52%|█████▏ | 8923/17285 [79:53:48<77:03:25, 33.17s/it] 52%|█████▏ | 8924/17285 [79:54:22<78:06:45, 33.63s/it] 52%|█████▏ | 8925/17285 [79:54:57<78:35:23, 33.84s/it] 52%|█████▏ | 8926/17285 [79:55:23<73:01:33, 31.45s/it] 52%|█████▏ | 8927/17285 [79:55:57<75:16:53, 32.43s/it] 52%|█████▏ | 8928/17285 [79:56:28<74:00:14, 31.88s/it] 52%|█████▏ | 8929/17285 [79:56:59<73:23:49, 31.62s/it] 52%|█████▏ | 8930/17285 [79:57:26<70:13:52, 30.26s/it] {'loss': 1.4292, 'learning_rate': 0.00010460935181994404, 'epoch': 1.55} + 52%|█████▏ | 8930/17285 [79:57:26<70:13:52, 30.26s/it] 52%|█████▏ | 8931/17285 [79:58:01<73:47:37, 31.80s/it] 52%|█████▏ | 8932/17285 [79:58:31<72:27:19, 31.23s/it] 52%|█████▏ | 8933/17285 [79:59:01<71:19:58, 30.75s/it] 52%|█████▏ | 8934/17285 [79:59:36<74:15:14, 32.01s/it] 52%|█████▏ | 8935/17285 [80:00:12<77:24:26, 33.37s/it] 52%|█████▏ | 8936/17285 [80:00:50<80:29:56, 34.71s/it] 52%|█████▏ | 8937/17285 [80:01:22<78:37:33, 33.91s/it] 52%|█████▏ | 8938/17285 [80:01:49<73:21:13, 31.64s/it] 52%|█████▏ | 8939/17285 [80:02:15<69:27:52, 29.96s/it] 52%|█████▏ | 8940/17285 [80:02:50<73:16:06, 31.61s/it] {'loss': 1.405, 'learning_rate': 0.00010441821965437556, 'epoch': 1.55} + 52%|█████▏ | 8940/17285 [80:02:50<73:16:06, 31.61s/it] 52%|█████▏ | 8941/17285 [80:03:22<73:33:51, 31.74s/it] 52%|█████▏ | 8942/17285 [80:03:53<73:14:41, 31.61s/it] 52%|█████▏ | 8943/17285 [80:04:21<70:27:37, 30.41s/it] 52%|█████▏ | 8944/17285 [80:04:55<73:15:13, 31.62s/it] 52%|█████▏ | 8945/17285 [80:05:29<74:20:51, 32.09s/it] 52%|█████▏ | 8946/17285 [80:05:55<70:30:25, 30.44s/it] 52%|█████▏ | 8947/17285 [80:06:24<68:58:03, 29.78s/it] 52%|█████▏ | 8948/17285 [80:07:01<73:58:25, 31.94s/it] 52%|█████▏ | 8949/17285 [80:07:32<73:32:43, 31.76s/it] 52%|█████▏ | 8950/17285 [80:08:07<75:54:45, 32.79s/it] {'loss': 1.4666, 'learning_rate': 0.00010422707131543377, 'epoch': 1.55} + 52%|█████▏ | 8950/17285 [80:08:07<75:54:45, 32.79s/it] 52%|█████▏ | 8951/17285 [80:08:37<73:38:29, 31.81s/it] 52%|█████▏ | 8952/17285 [80:09:07<72:59:39, 31.53s/it] 52%|█████▏ | 8953/17285 [80:09:34<69:41:59, 30.12s/it][2023-08-26 08:04:44,864] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 52%|█████▏ | 8954/17285 [80:10:07<71:38:50, 30.96s/it] 52%|█████▏ | 8955/17285 [80:10:38<71:14:12, 30.79s/it] 52%|█████▏ | 8956/17285 [80:11:04<68:31:12, 29.62s/it] 52%|█████▏ | 8957/17285 [80:11:39<71:46:57, 31.03s/it] 52%|█████▏ | 8958/17285 [80:12:11<72:26:43, 31.32s/it] 52%|█████▏ | 8959/17285 [80:12:36<68:29:45, 29.62s/it] 52%|█████▏ | 8960/17285 [80:13:10<71:18:49, 30.84s/it] {'loss': 1.4412, 'learning_rate': 0.00010405502456046876, 'epoch': 1.56} + 52%|█████▏ | 8960/17285 [80:13:10<71:18:49, 30.84s/it] 52%|█████▏ | 8961/17285 [80:13:42<71:44:37, 31.03s/it] 52%|█████▏ | 8962/17285 [80:14:15<73:23:49, 31.75s/it] 52%|█████▏ | 8963/17285 [80:14:40<68:57:52, 29.83s/it] 52%|█████▏ | 8964/17285 [80:15:15<71:57:16, 31.13s/it] 52%|█████▏ | 8965/17285 [80:15:45<71:36:15, 30.98s/it] 52%|█████▏ | 8966/17285 [80:16:14<70:21:13, 30.45s/it] 52%|█████▏ | 8967/17285 [80:16:46<71:29:50, 30.94s/it] 52%|█████▏ | 8968/17285 [80:17:28<78:49:17, 34.12s/it] 52%|█████▏ | 8969/17285 [80:17:56<74:21:41, 32.19s/it] 52%|█████▏ | 8970/17285 [80:18:24<71:50:17, 31.10s/it] {'loss': 1.3975, 'learning_rate': 0.0001038638474198912, 'epoch': 1.56} + 52%|█████▏ | 8970/17285 [80:18:24<71:50:17, 31.10s/it] 52%|█████▏ | 8971/17285 [80:18:51<69:00:04, 29.88s/it] 52%|█████▏ | 8972/17285 [80:19:20<68:26:51, 29.64s/it] 52%|█████▏ | 8973/17285 [80:19:46<65:50:56, 28.52s/it] 52%|█████▏ | 8974/17285 [80:20:31<77:04:56, 33.39s/it] 52%|█████▏ | 8975/17285 [80:21:01<74:53:11, 32.44s/it] 52%|█████▏ | 8976/17285 [80:21:34<75:15:20, 32.61s/it] 52%|█████▏ | 8977/17285 [80:22:05<74:16:55, 32.19s/it] 52%|█████▏ | 8978/17285 [80:22:33<71:17:47, 30.90s/it] 52%|█████▏ | 8979/17285 [80:23:06<72:23:37, 31.38s/it] 52%|█████▏ | 8980/17285 [80:23:37<72:07:18, 31.26s/it] {'loss': 1.4423, 'learning_rate': 0.00010367265613528012, 'epoch': 1.56} + 52%|█████▏ | 8980/17285 [80:23:37<72:07:18, 31.26s/it] 52%|█████▏ | 8981/17285 [80:24:05<69:51:19, 30.28s/it] 52%|█████▏ | 8982/17285 [80:24:34<69:22:41, 30.08s/it] 52%|█████▏ | 8983/17285 [80:25:01<66:37:15, 28.89s/it] 52%|█████▏ | 8984/17285 [80:25:43<76:03:05, 32.98s/it] 52%|█████▏ | 8985/17285 [80:26:14<74:25:38, 32.28s/it] 52%|█████▏ | 8986/17285 [80:26:50<77:14:58, 33.51s/it] 52%|█████▏ | 8987/17285 [80:27:27<79:24:29, 34.45s/it] 52%|█████▏ | 8988/17285 [80:27:52<73:13:30, 31.77s/it] 52%|█████▏ | 8989/17285 [80:28:22<71:39:14, 31.09s/it] 52%|█████▏ | 8990/17285 [80:28:48<68:14:44, 29.62s/it] {'loss': 1.4614, 'learning_rate': 0.00010348145140651204, 'epoch': 1.56} + 52%|█████▏ | 8990/17285 [80:28:48<68:14:44, 29.62s/it] 52%|█████▏ | 8991/17285 [80:29:14<65:40:29, 28.51s/it] 52%|█████▏ | 8992/17285 [80:29:40<63:48:21, 27.70s/it] 52%|█████▏ | 8993/17285 [80:30:18<71:12:30, 30.92s/it] 52%|█████▏ | 8994/17285 [80:30:47<69:44:46, 30.28s/it] 52%|█████▏ | 8995/17285 [80:31:13<66:43:46, 28.98s/it] 52%|█████▏ | 8996/17285 [80:31:53<74:38:55, 32.42s/it] 52%|█████▏ | 8997/17285 [80:32:26<75:01:36, 32.59s/it] 52%|█████▏ | 8998/17285 [80:32:57<73:54:06, 32.10s/it] 52%|█████▏ | 8999/17285 [80:33:26<71:52:43, 31.23s/it] 52%|█████▏ | 9000/17285 [80:33:56<70:28:32, 30.62s/it] {'loss': 1.4521, 'learning_rate': 0.00010329023393351272, 'epoch': 1.56} + 52%|█████▏ | 9000/17285 [80:33:56<70:28:32, 30.62s/it][INFO|trainer.py:3081] 2023-08-26 08:28:33,355 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-26 08:28:33,355 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-26 08:28:33,355 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-6000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-9000 +[INFO|tokenization_utils_base.py:2210] 2023-08-26 08:29:59,617 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-9000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-26 08:29:59,622 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-9000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-9000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-9000 + 52%|█████▏ | 9001/17285 [80:36:01<136:08:55, 59.17s/it] 52%|█████▏ | 9002/17285 [80:36:38<120:15:13, 52.27s/it] 52%|█████▏ | 9003/17285 [80:37:12<108:14:50, 47.05s/it] 52%|█████▏ | 9004/17285 [80:37:43<97:09:00, 42.23s/it] 52%|█████▏ | 9005/17285 [80:38:13<88:08:45, 38.32s/it] 52%|█████▏ | 9006/17285 [80:38:54<90:25:34, 39.32s/it] 52%|█████▏ | 9007/17285 [80:39:27<85:46:08, 37.30s/it] 52%|█████▏ | 9008/17285 [80:39:54<78:42:41, 34.23s/it] 52%|█████▏ | 9009/17285 [80:40:22<74:17:19, 32.32s/it] 52%|█████▏ | 9010/17285 [80:41:05<81:37:58, 35.51s/it] {'loss': 1.4428, 'learning_rate': 0.00010309900441625435, 'epoch': 1.56} + 52%|█████▏ | 9010/17285 [80:41:05<81:37:58, 35.51s/it] 52%|█████▏ | 9011/17285 [80:41:41<82:19:54, 35.82s/it] 52%|█████▏ | 9012/17285 [80:42:07<75:15:43, 32.75s/it] 52%|█████▏ | 9013/17285 [80:42:38<74:11:11, 32.29s/it] 52%|█████▏ | 9014/17285 [80:43:10<73:47:49, 32.12s/it] 52%|█████▏ | 9015/17285 [80:43:43<74:45:17, 32.54s/it] 52%|█████▏ | 9016/17285 [80:44:21<77:58:25, 33.95s/it] 52%|█████▏ | 9017/17285 [80:44:56<79:15:47, 34.51s/it] 52%|█████▏ | 9018/17285 [80:45:30<78:37:58, 34.24s/it] 52%|█████▏ | 9019/17285 [80:46:03<77:45:34, 33.87s/it] 52%|█████▏ | 9020/17285 [80:46:30<73:10:59, 31.88s/it] {'loss': 1.4417, 'learning_rate': 0.0001029077635547535, 'epoch': 1.57} + 52%|█████▏ | 9020/17285 [80:46:30<73:10:59, 31.88s/it] 52%|█████▏ | 9021/17285 [80:47:04<74:45:23, 32.57s/it] 52%|█████▏ | 9022/17285 [80:47:30<70:08:49, 30.56s/it] 52%|█████▏ | 9023/17285 [80:48:06<73:29:21, 32.02s/it] 52%|█████▏ | 9024/17285 [80:48:34<70:36:06, 30.77s/it] 52%|█████▏ | 9025/17285 [80:49:10<74:40:30, 32.55s/it] 52%|█████▏ | 9026/17285 [80:49:41<73:26:00, 32.01s/it] 52%|█████▏ | 9027/17285 [80:50:10<71:15:50, 31.07s/it] 52%|█████▏ | 9028/17285 [80:50:42<72:05:21, 31.43s/it] 52%|█████▏ | 9029/17285 [80:51:10<69:30:11, 30.31s/it] 52%|█████▏ | 9030/17285 [80:51:46<73:09:57, 31.91s/it] {'loss': 1.4228, 'learning_rate': 0.00010271651204906811, 'epoch': 1.57} + 52%|█████▏ | 9030/17285 [80:51:46<73:09:57, 31.91s/it] 52%|█████▏ | 9031/17285 [80:52:25<78:08:44, 34.08s/it] 52%|█████▏ | 9032/17285 [80:52:59<78:23:31, 34.19s/it] 52%|█████▏ | 9033/17285 [80:53:36<79:53:29, 34.85s/it] 52%|█████▏ | 9034/17285 [80:54:07<77:44:22, 33.92s/it] 52%|█████▏ | 9035/17285 [80:54:38<75:40:07, 33.02s/it] 52%|█████▏ | 9036/17285 [80:55:10<74:33:33, 32.54s/it] 52%|█████▏ | 9037/17285 [80:55:37<71:13:47, 31.09s/it] 52%|█████▏ | 9038/17285 [80:56:03<67:30:24, 29.47s/it] 52%|█████▏ | 9039/17285 [80:56:38<71:10:32, 31.07s/it] 52%|█████▏ | 9040/17285 [80:57:11<72:24:50, 31.62s/it] {'loss': 1.4302, 'learning_rate': 0.0001025252505992951, 'epoch': 1.57} + 52%|█████▏ | 9040/17285 [80:57:11<72:24:50, 31.62s/it] 52%|█████▏ | 9041/17285 [80:57:45<74:20:14, 32.46s/it] 52%|█████▏ | 9042/17285 [80:58:21<76:29:43, 33.41s/it] 52%|█████▏ | 9043/17285 [80:58:52<75:08:41, 32.82s/it] 52%|█████▏ | 9044/17285 [80:59:24<74:20:56, 32.48s/it] 52%|█████▏ | 9045/17285 [80:59:58<75:48:07, 33.12s/it] 52%|█████▏ | 9046/17285 [81:00:27<72:22:32, 31.62s/it] 52%|█████▏ | 9047/17285 [81:00:57<71:27:22, 31.23s/it] 52%|█████▏ | 9048/17285 [81:01:32<74:25:08, 32.52s/it] 52%|█████▏ | 9049/17285 [81:01:59<70:08:44, 30.66s/it] 52%|█████▏ | 9050/17285 [81:02:30<70:45:01, 30.93s/it] {'loss': 1.4544, 'learning_rate': 0.00010233397990556775, 'epoch': 1.57} + 52%|█████▏ | 9050/17285 [81:02:30<70:45:01, 30.93s/it] 52%|█████▏ | 9051/17285 [81:03:05<73:32:07, 32.15s/it] 52%|█████▏ | 9052/17285 [81:03:34<71:13:41, 31.15s/it] 52%|█████▏ | 9053/17285 [81:04:06<72:00:00, 31.49s/it] 52%|█████▏ | 9054/17285 [81:04:43<75:16:37, 32.92s/it] 52%|█████▏ | 9055/17285 [81:05:15<74:57:01, 32.79s/it] 52%|█████▏ | 9056/17285 [81:05:45<73:01:40, 31.95s/it][2023-08-26 09:00:50,957] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 52%|█████▏ | 9057/17285 [81:06:13<70:23:43, 30.80s/it] 52%|█████▏ | 9058/17285 [81:06:40<67:19:05, 29.46s/it] 52%|█████▏ | 9059/17285 [81:07:14<70:47:27, 30.98s/it] 52%|█████▏ | 9060/17285 [81:07:47<72:24:43, 31.69s/it] {'loss': 1.4378, 'learning_rate': 0.0001021618289563197, 'epoch': 1.57} + 52%|█████▏ | 9060/17285 [81:07:47<72:24:43, 31.69s/it][2023-08-26 09:02:55,931] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 52%|█████▏ | 9061/17285 [81:08:18<71:45:37, 31.41s/it] 52%|█████▏ | 9062/17285 [81:08:52<73:19:19, 32.10s/it] 52%|█████▏ | 9063/17285 [81:09:32<78:40:31, 34.45s/it] 52%|█████▏ | 9064/17285 [81:09:56<71:28:02, 31.30s/it] 52%|█████▏ | 9065/17285 [81:10:31<74:00:27, 32.41s/it] 52%|█████▏ | 9066/17285 [81:11:03<73:30:50, 32.20s/it] 52%|█████▏ | 9067/17285 [81:11:32<71:30:18, 31.32s/it] 52%|█████▏ | 9068/17285 [81:12:05<72:46:20, 31.88s/it] 52%|█████▏ | 9069/17285 [81:12:41<75:14:51, 32.97s/it] 52%|█████▏ | 9070/17285 [81:13:27<84:18:17, 36.94s/it] {'loss': 1.4494, 'learning_rate': 0.00010198967159704729, 'epoch': 1.57} + 52%|█████▏ | 9070/17285 [81:13:27<84:18:17, 36.94s/it] 52%|█████▏ | 9071/17285 [81:14:01<82:46:14, 36.28s/it] 52%|█████▏ | 9072/17285 [81:14:32<79:05:06, 34.67s/it] 52%|█████▏ | 9073/17285 [81:15:09<80:20:15, 35.22s/it] 52%|█████▏ | 9074/17285 [81:15:40<77:47:28, 34.11s/it] 53%|█████▎ | 9075/17285 [81:16:13<76:35:56, 33.59s/it] 53%|█████▎ | 9076/17285 [81:16:41<73:03:22, 32.04s/it] 53%|█████▎ | 9077/17285 [81:17:09<69:50:37, 30.63s/it] 53%|█████▎ | 9078/17285 [81:17:36<67:41:16, 29.69s/it] 53%|█████▎ | 9079/17285 [81:18:02<65:03:25, 28.54s/it] 53%|█████▎ | 9080/17285 [81:18:34<67:45:56, 29.73s/it] {'loss': 1.4467, 'learning_rate': 0.00010179837874523537, 'epoch': 1.58} + 53%|█████▎ | 9080/17285 [81:18:34<67:45:56, 29.73s/it] 53%|█████▎ | 9081/17285 [81:19:05<68:27:41, 30.04s/it] 53%|█████▎ | 9082/17285 [81:19:37<69:33:36, 30.53s/it] 53%|█████▎ | 9083/17285 [81:20:09<70:41:24, 31.03s/it] 53%|█████▎ | 9084/17285 [81:20:42<71:56:25, 31.58s/it] 53%|█████▎ | 9085/17285 [81:21:21<77:21:10, 33.96s/it] 53%|█████▎ | 9086/17285 [81:22:03<82:37:33, 36.28s/it] 53%|█████▎ | 9087/17285 [81:22:38<81:58:05, 35.99s/it] 53%|█████▎ | 9088/17285 [81:23:07<77:12:22, 33.91s/it] 53%|█████▎ | 9089/17285 [81:23:33<71:40:17, 31.48s/it] 53%|█████▎ | 9090/17285 [81:24:03<70:34:06, 31.00s/it] {'loss': 1.4443, 'learning_rate': 0.00010160707931026259, 'epoch': 1.58} + 53%|█████▎ | 9090/17285 [81:24:03<70:34:06, 31.00s/it] 53%|█████▎ | 9091/17285 [81:24:35<71:10:45, 31.27s/it][2023-08-26 09:19:49,149] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 53%|█████▎ | 9092/17285 [81:25:11<74:40:28, 32.81s/it] 53%|█████▎ | 9093/17285 [81:25:42<72:49:10, 32.00s/it] 53%|█████▎ | 9094/17285 [81:26:16<74:44:41, 32.85s/it] 53%|█████▎ | 9095/17285 [81:27:00<81:44:51, 35.93s/it] 53%|█████▎ | 9096/17285 [81:27:30<78:04:04, 34.32s/it] 53%|█████▎ | 9097/17285 [81:28:04<78:03:57, 34.32s/it] 53%|█████▎ | 9098/17285 [81:28:31<72:56:34, 32.07s/it] 53%|█████▎ | 9099/17285 [81:29:08<76:18:57, 33.56s/it] 53%|█████▎ | 9100/17285 [81:29:47<79:45:52, 35.08s/it] {'loss': 1.4272, 'learning_rate': 0.00010143490476895921, 'epoch': 1.58} + 53%|█████▎ | 9100/17285 [81:29:47<79:45:52, 35.08s/it] 53%|█████▎ | 9101/17285 [81:30:14<74:36:46, 32.82s/it] 53%|█████▎ | 9102/17285 [81:30:53<78:38:03, 34.59s/it] 53%|█████▎ | 9103/17285 [81:31:25<76:23:32, 33.61s/it] 53%|█████▎ | 9104/17285 [81:31:49<70:23:40, 30.98s/it] 53%|█████▎ | 9105/17285 [81:32:21<71:11:09, 31.33s/it] 53%|█████▎ | 9106/17285 [81:32:52<70:20:48, 30.96s/it] 53%|█████▎ | 9107/17285 [81:33:21<69:13:22, 30.47s/it] 53%|█████▎ | 9108/17285 [81:33:51<68:58:28, 30.37s/it] 53%|█████▎ | 9109/17285 [81:34:23<70:02:59, 30.84s/it] 53%|█████▎ | 9110/17285 [81:34:57<72:30:44, 31.93s/it] {'loss': 1.4492, 'learning_rate': 0.000101243594755249, 'epoch': 1.58} + 53%|█████▎ | 9110/17285 [81:34:57<72:30:44, 31.93s/it] 53%|█████▎ | 9111/17285 [81:35:26<70:17:50, 30.96s/it] 53%|█████▎ | 9112/17285 [81:36:03<74:01:03, 32.60s/it] 53%|█████▎ | 9113/17285 [81:36:30<70:39:51, 31.13s/it] 53%|█████▎ | 9114/17285 [81:36:56<66:46:58, 29.42s/it] 53%|█████▎ | 9115/17285 [81:37:28<68:26:42, 30.16s/it] 53%|█████▎ | 9116/17285 [81:37:57<67:47:11, 29.87s/it] 53%|█████▎ | 9117/17285 [81:38:25<66:41:35, 29.39s/it] 53%|█████▎ | 9118/17285 [81:39:02<71:36:41, 31.57s/it] 53%|█████▎ | 9119/17285 [81:39:39<75:26:17, 33.26s/it] 53%|█████▎ | 9120/17285 [81:40:07<72:02:58, 31.77s/it] {'loss': 1.4289, 'learning_rate': 0.00010105228018922502, 'epoch': 1.58} + 53%|█████▎ | 9120/17285 [81:40:07<72:02:58, 31.77s/it] 53%|█████▎ | 9121/17285 [81:40:42<74:11:02, 32.71s/it] 53%|█████▎ | 9122/17285 [81:41:25<81:10:51, 35.80s/it] 53%|█████▎ | 9123/17285 [81:41:52<75:22:53, 33.25s/it] 53%|█████▎ | 9124/17285 [81:42:21<72:04:25, 31.79s/it] 53%|█████▎ | 9125/17285 [81:42:52<71:53:47, 31.72s/it] 53%|█████▎ | 9126/17285 [81:43:22<70:20:00, 31.03s/it] 53%|█████▎ | 9127/17285 [81:43:54<71:17:48, 31.46s/it] 53%|█████▎ | 9128/17285 [81:44:32<75:43:10, 33.42s/it] 53%|█████▎ | 9129/17285 [81:45:10<78:27:27, 34.63s/it] 53%|█████▎ | 9130/17285 [81:45:42<76:40:48, 33.85s/it] {'loss': 1.3822, 'learning_rate': 0.00010086096177121504, 'epoch': 1.58} + 53%|█████▎ | 9130/17285 [81:45:42<76:40:48, 33.85s/it] 53%|█████▎ | 9131/17285 [81:46:15<76:02:03, 33.57s/it] 53%|█████▎ | 9132/17285 [81:46:44<73:00:07, 32.23s/it] 53%|█████▎ | 9133/17285 [81:47:16<72:59:51, 32.24s/it] 53%|█████▎ | 9134/17285 [81:47:56<78:13:31, 34.55s/it] 53%|█████▎ | 9135/17285 [81:48:30<78:03:54, 34.48s/it] 53%|█████▎ | 9136/17285 [81:49:00<74:32:56, 32.93s/it] 53%|█████▎ | 9137/17285 [81:49:29<72:11:26, 31.90s/it] 53%|█████▎ | 9138/17285 [81:50:03<73:17:54, 32.39s/it] 53%|█████▎ | 9139/17285 [81:50:35<73:14:19, 32.37s/it] 53%|█████▎ | 9140/17285 [81:51:05<71:48:16, 31.74s/it] {'loss': 1.413, 'learning_rate': 0.00010066964020156091, 'epoch': 1.59} + 53%|█████▎ | 9140/17285 [81:51:05<71:48:16, 31.74s/it] 53%|█████▎ | 9141/17285 [81:51:38<72:38:03, 32.11s/it] 53%|█████▎ | 9142/17285 [81:52:11<72:53:02, 32.22s/it] 53%|█████▎ | 9143/17285 [81:52:41<71:22:03, 31.56s/it] 53%|█████▎ | 9144/17285 [81:53:12<71:26:30, 31.59s/it] 53%|█████▎ | 9145/17285 [81:53:42<69:54:34, 30.92s/it] 53%|█████▎ | 9146/17285 [81:54:10<68:03:41, 30.10s/it] 53%|█████▎ | 9147/17285 [81:54:45<71:19:58, 31.56s/it] 53%|█████▎ | 9148/17285 [81:55:13<69:21:22, 30.68s/it] 53%|█████▎ | 9149/17285 [81:55:54<76:16:49, 33.75s/it] 53%|█████▎ | 9150/17285 [81:56:33<79:37:14, 35.23s/it] {'loss': 1.4419, 'learning_rate': 0.000100478316180616, 'epoch': 1.59} + 53%|█████▎ | 9150/17285 [81:56:33<79:37:14, 35.23s/it] 53%|█████▎ | 9151/17285 [81:57:13<82:45:57, 36.63s/it] 53%|█████▎ | 9152/17285 [81:57:43<78:12:36, 34.62s/it] 53%|█████▎ | 9153/17285 [81:58:17<77:45:44, 34.43s/it] 53%|█████▎ | 9154/17285 [81:58:50<77:05:54, 34.14s/it] 53%|█████▎ | 9155/17285 [81:59:28<79:38:09, 35.26s/it] 53%|█████▎ | 9156/17285 [82:00:01<78:12:34, 34.64s/it] 53%|█████▎ | 9157/17285 [82:00:40<81:01:24, 35.89s/it] 53%|█████▎ | 9158/17285 [82:01:16<80:41:49, 35.75s/it] 53%|█████▎ | 9159/17285 [82:01:44<75:24:10, 33.41s/it] 53%|█████▎ | 9160/17285 [82:02:09<69:55:15, 30.98s/it] {'loss': 1.3911, 'learning_rate': 0.00010028699040874277, 'epoch': 1.59} + 53%|█████▎ | 9160/17285 [82:02:09<69:55:15, 30.98s/it] 53%|█████▎ | 9161/17285 [82:02:36<67:17:49, 29.82s/it] 53%|█████▎ | 9162/17285 [82:03:05<66:33:06, 29.49s/it] 53%|█████▎ | 9163/17285 [82:03:37<68:19:17, 30.28s/it] 53%|█████▎ | 9164/17285 [82:04:12<71:43:35, 31.80s/it] 53%|█████▎ | 9165/17285 [82:04:45<72:26:42, 32.12s/it] 53%|█████▎ | 9166/17285 [82:05:21<74:42:47, 33.13s/it] 53%|█████▎ | 9167/17285 [82:05:50<72:25:24, 32.12s/it] 53%|█████▎ | 9168/17285 [82:06:17<68:30:48, 30.39s/it] 53%|█████▎ | 9169/17285 [82:06:52<71:31:02, 31.72s/it] 53%|█████▎ | 9170/17285 [82:07:22<70:53:01, 31.45s/it] {'loss': 1.4321, 'learning_rate': 0.00010009566358630991, 'epoch': 1.59} + 53%|█████▎ | 9170/17285 [82:07:22<70:53:01, 31.45s/it] 53%|█████▎ | 9171/17285 [82:07:53<70:31:46, 31.29s/it] 53%|█████▎ | 9172/17285 [82:08:23<69:43:56, 30.94s/it] 53%|█████▎ | 9173/17285 [82:08:51<67:42:23, 30.05s/it] 53%|█████▎ | 9174/17285 [82:09:20<66:49:26, 29.66s/it] 53%|█████▎ | 9175/17285 [82:09:55<70:10:52, 31.15s/it] 53%|█████▎ | 9176/17285 [82:10:31<73:18:19, 32.54s/it] 53%|█████▎ | 9177/17285 [82:11:08<76:20:38, 33.90s/it] 53%|█████▎ | 9178/17285 [82:11:38<74:03:46, 32.89s/it] 53%|█████▎ | 9179/17285 [82:12:09<72:37:21, 32.25s/it] 53%|█████▎ | 9180/17285 [82:12:49<77:47:58, 34.56s/it] {'loss': 1.4358, 'learning_rate': 9.990433641369012e-05, 'epoch': 1.59} + 53%|█████▎ | 9180/17285 [82:12:49<77:47:58, 34.56s/it] 53%|█████▎ | 9181/17285 [82:13:19<75:06:54, 33.37s/it] 53%|█████▎ | 9182/17285 [82:13:50<73:02:44, 32.45s/it] 53%|█████▎ | 9183/17285 [82:14:18<70:13:24, 31.20s/it] 53%|█████▎ | 9184/17285 [82:14:58<75:54:19, 33.73s/it] 53%|█████▎ | 9185/17285 [82:15:37<80:00:36, 35.56s/it] 53%|█████▎ | 9186/17285 [82:16:05<74:53:41, 33.29s/it] 53%|█████▎ | 9187/17285 [82:16:32<70:35:49, 31.38s/it] 53%|█████▎ | 9188/17285 [82:17:14<77:38:07, 34.52s/it] 53%|█████▎ | 9189/17285 [82:17:48<77:20:47, 34.39s/it] 53%|█████▎ | 9190/17285 [82:18:17<73:13:00, 32.56s/it] {'loss': 1.4307, 'learning_rate': 9.971300959125727e-05, 'epoch': 1.6} + 53%|█████▎ | 9190/17285 [82:18:17<73:13:00, 32.56s/it] 53%|█████▎ | 9191/17285 [82:18:45<70:31:36, 31.37s/it] 53%|█████▎ | 9192/17285 [82:19:19<72:21:39, 32.19s/it] 53%|█████▎ | 9193/17285 [82:19:54<73:59:25, 32.92s/it] 53%|█████▎ | 9194/17285 [82:20:24<71:48:13, 31.95s/it] 53%|█████▎ | 9195/17285 [82:20:53<69:53:46, 31.10s/it] 53%|█████▎ | 9196/17285 [82:21:20<67:32:40, 30.06s/it] 53%|█████▎ | 9197/17285 [82:21:53<69:00:10, 30.71s/it] 53%|█████▎ | 9198/17285 [82:22:21<67:29:39, 30.05s/it] 53%|█████▎ | 9199/17285 [82:22:58<72:21:00, 32.21s/it] 53%|█████▎ | 9200/17285 [82:23:42<80:21:24, 35.78s/it] {'loss': 1.4235, 'learning_rate': 9.952168381938401e-05, 'epoch': 1.6} + 53%|█████▎ | 9200/17285 [82:23:42<80:21:24, 35.78s/it] 53%|█████▎ | 9201/17285 [82:24:14<77:31:48, 34.53s/it] 53%|█████▎ | 9202/17285 [82:24:45<75:05:49, 33.45s/it] 53%|█████▎ | 9203/17285 [82:25:19<75:32:38, 33.65s/it] 53%|█████▎ | 9204/17285 [82:25:57<78:09:12, 34.82s/it] 53%|█████▎ | 9205/17285 [82:26:25<73:56:15, 32.94s/it] 53%|█████▎ | 9206/17285 [82:26:56<72:22:20, 32.25s/it] 53%|█████▎ | 9207/17285 [82:27:28<72:32:42, 32.33s/it] 53%|█████▎ | 9208/17285 [82:28:05<75:29:49, 33.65s/it] 53%|█████▎ | 9209/17285 [82:28:49<82:18:09, 36.69s/it] 53%|█████▎ | 9210/17285 [82:29:17<76:28:47, 34.10s/it] {'loss': 1.4123, 'learning_rate': 9.933035979843912e-05, 'epoch': 1.6} + 53%|█████▎ | 9210/17285 [82:29:17<76:28:47, 34.10s/it] 53%|█████▎ | 9211/17285 [82:29:49<75:22:29, 33.61s/it] 53%|█████▎ | 9212/17285 [82:30:36<84:01:31, 37.47s/it] 53%|█████▎ | 9213/17285 [82:31:12<83:01:56, 37.03s/it] 53%|█████▎ | 9214/17285 [82:31:45<80:24:38, 35.87s/it] 53%|█████▎ | 9215/17285 [82:32:25<82:56:31, 37.00s/it] 53%|█████▎ | 9216/17285 [82:32:56<79:24:22, 35.43s/it] 53%|█████▎ | 9217/17285 [82:33:35<81:43:45, 36.47s/it] 53%|█████▎ | 9218/17285 [82:34:01<74:43:58, 33.35s/it] 53%|█████▎ | 9219/17285 [82:34:27<69:18:49, 30.94s/it] 53%|█████▎ | 9220/17285 [82:34:58<69:29:26, 31.02s/it] {'loss': 1.4267, 'learning_rate': 9.913903822878499e-05, 'epoch': 1.6} + 53%|█████▎ | 9220/17285 [82:34:58<69:29:26, 31.02s/it] 53%|█████▎ | 9221/17285 [82:35:37<74:52:10, 33.42s/it] 53%|█████▎ | 9222/17285 [82:36:13<76:56:53, 34.36s/it] 53%|█████▎ | 9223/17285 [82:36:50<78:40:54, 35.13s/it] 53%|█████▎ | 9224/17285 [82:37:27<79:55:19, 35.69s/it] 53%|█████▎ | 9225/17285 [82:37:53<72:52:03, 32.55s/it] 53%|█████▎ | 9226/17285 [82:38:22<70:55:50, 31.69s/it] 53%|█████▎ | 9227/17285 [82:38:58<73:54:29, 33.02s/it] 53%|█████▎ | 9228/17285 [82:39:41<80:08:36, 35.81s/it] 53%|█████▎ | 9229/17285 [82:40:16<79:57:23, 35.73s/it] 53%|█████▎ | 9230/17285 [82:40:48<77:20:03, 34.56s/it] {'loss': 1.4151, 'learning_rate': 9.8947719810775e-05, 'epoch': 1.6} + 53%|█████▎ | 9230/17285 [82:40:48<77:20:03, 34.56s/it] 53%|█████▎ | 9231/17285 [82:41:21<76:25:09, 34.16s/it] 53%|█████▎ | 9232/17285 [82:42:06<83:14:58, 37.22s/it] 53%|█████▎ | 9233/17285 [82:42:38<79:38:10, 35.60s/it] 53%|█████▎ | 9234/17285 [82:43:13<79:28:46, 35.54s/it] 53%|█████▎ | 9235/17285 [82:43:49<79:48:56, 35.69s/it] 53%|█████▎ | 9236/17285 [82:44:19<75:49:47, 33.92s/it] 53%|█████▎ | 9237/17285 [82:44:55<77:03:21, 34.47s/it] 53%|█████▎ | 9238/17285 [82:45:30<77:43:49, 34.77s/it] 53%|█████▎ | 9239/17285 [82:46:00<74:37:59, 33.39s/it] 53%|█████▎ | 9240/17285 [82:46:26<69:37:30, 31.16s/it] {'loss': 1.4215, 'learning_rate': 9.875640524475103e-05, 'epoch': 1.6} + 53%|█████▎ | 9240/17285 [82:46:26<69:37:30, 31.16s/it] 53%|█████▎ | 9241/17285 [82:47:07<76:07:08, 34.07s/it] 53%|█████▎ | 9242/17285 [82:47:46<79:36:46, 35.63s/it] 53%|█████▎ | 9243/17285 [82:48:17<76:27:17, 34.23s/it] 53%|█████▎ | 9244/17285 [82:48:55<78:43:04, 35.24s/it] 53%|█████▎ | 9245/17285 [82:49:32<79:52:27, 35.76s/it] 53%|█████▎ | 9246/17285 [82:50:03<76:36:36, 34.31s/it] 53%|█████▎ | 9247/17285 [82:50:36<76:12:51, 34.13s/it] 54%|█████▎ | 9248/17285 [82:51:09<75:13:14, 33.69s/it] 54%|█████▎ | 9249/17285 [82:51:42<74:52:58, 33.55s/it] 54%|█████▎ | 9250/17285 [82:52:16<75:01:24, 33.61s/it] {'loss': 1.3905, 'learning_rate': 9.856509523104083e-05, 'epoch': 1.61} + 54%|█████▎ | 9250/17285 [82:52:16<75:01:24, 33.61s/it] 54%|█████▎ | 9251/17285 [82:53:01<82:39:05, 37.04s/it] 54%|█████▎ | 9252/17285 [82:53:35<80:20:07, 36.00s/it] 54%|█████▎ | 9253/17285 [82:53:59<72:44:43, 32.60s/it] 54%|█████▎ | 9254/17285 [82:54:32<72:39:32, 32.57s/it] 54%|█████▎ | 9255/17285 [82:55:03<71:55:56, 32.25s/it] 54%|█████▎ | 9256/17285 [82:55:39<73:54:12, 33.14s/it] 54%|█████▎ | 9257/17285 [82:56:08<71:44:08, 32.17s/it] 54%|█████▎ | 9258/17285 [82:56:39<70:50:54, 31.77s/it] 54%|█████▎ | 9259/17285 [82:57:12<71:24:10, 32.03s/it] 54%|█████▎ | 9260/17285 [82:57:45<72:08:38, 32.36s/it] {'loss': 1.4147, 'learning_rate': 9.83737904699555e-05, 'epoch': 1.61} + 54%|█████▎ | 9260/17285 [82:57:45<72:08:38, 32.36s/it] 54%|█████▎ | 9261/17285 [82:58:25<77:13:54, 34.65s/it][2023-08-26 10:53:36,373] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 54%|█████▎ | 9262/17285 [82:58:59<76:31:55, 34.34s/it] 54%|█████▎ | 9263/17285 [82:59:25<71:15:28, 31.98s/it] 54%|█████▎ | 9264/17285 [82:59:53<68:32:52, 30.77s/it] 54%|█████▎ | 9265/17285 [83:00:25<69:03:06, 31.00s/it] 54%|█████▎ | 9266/17285 [83:00:57<69:49:47, 31.35s/it] 54%|█████▎ | 9267/17285 [83:01:31<71:44:55, 32.21s/it] 54%|█████▎ | 9268/17285 [83:02:04<72:02:31, 32.35s/it] 54%|█████▎ | 9269/17285 [83:02:36<72:18:41, 32.48s/it] 54%|█████▎ | 9270/17285 [83:03:03<68:19:22, 30.69s/it] {'loss': 1.4225, 'learning_rate': 9.820162125476466e-05, 'epoch': 1.61} + 54%|█████▎ | 9270/17285 [83:03:03<68:19:22, 30.69s/it] 54%|█████▎ | 9271/17285 [83:03:41<73:03:36, 32.82s/it][2023-08-26 10:58:43,233] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 54%|█████▎ | 9272/17285 [83:04:06<67:40:39, 30.41s/it] 54%|█████▎ | 9273/17285 [83:04:38<69:13:38, 31.11s/it] 54%|█████▎ | 9274/17285 [83:05:06<66:41:58, 29.97s/it] 54%|█████▎ | 9275/17285 [83:05:34<65:38:41, 29.50s/it] 54%|█████▎ | 9276/17285 [83:06:03<65:15:47, 29.34s/it] 54%|█████▎ | 9277/17285 [83:06:36<68:01:35, 30.58s/it] 54%|█████▎ | 9278/17285 [83:07:08<68:26:56, 30.78s/it] 54%|█████▎ | 9279/17285 [83:07:40<69:38:32, 31.32s/it] 54%|█████▎ | 9280/17285 [83:08:15<71:38:22, 32.22s/it] {'loss': 1.4273, 'learning_rate': 9.802945737193441e-05, 'epoch': 1.61} + 54%|█████▎ | 9280/17285 [83:08:15<71:38:22, 32.22s/it] 54%|█████▎ | 9281/17285 [83:08:56<77:29:57, 34.86s/it] 54%|█████▎ | 9282/17285 [83:09:27<75:01:26, 33.75s/it] 54%|█████▎ | 9283/17285 [83:09:53<70:10:49, 31.57s/it] 54%|█████▎ | 9284/17285 [83:10:32<75:10:22, 33.82s/it] 54%|█████▎ | 9285/17285 [83:11:07<76:02:55, 34.22s/it] 54%|█████▎ | 9286/17285 [83:11:40<75:09:38, 33.83s/it] 54%|█████▎ | 9287/17285 [83:12:08<70:58:47, 31.95s/it] 54%|█████▎ | 9288/17285 [83:12:36<68:27:47, 30.82s/it] 54%|█████▎ | 9289/17285 [83:13:02<65:17:16, 29.39s/it] 54%|█████▎ | 9290/17285 [83:13:30<64:29:41, 29.04s/it] {'loss': 1.4644, 'learning_rate': 9.783817104368033e-05, 'epoch': 1.61} + 54%|█████▎ | 9290/17285 [83:13:30<64:29:41, 29.04s/it] 54%|█████▍ | 9291/17285 [83:14:10<71:23:45, 32.15s/it] 54%|█████▍ | 9292/17285 [83:14:39<69:07:45, 31.14s/it] 54%|█████▍ | 9293/17285 [83:15:07<67:10:13, 30.26s/it] 54%|█████▍ | 9294/17285 [83:15:38<67:41:51, 30.50s/it] 54%|█████▍ | 9295/17285 [83:16:09<68:10:19, 30.72s/it] 54%|█████▍ | 9296/17285 [83:16:47<73:00:38, 32.90s/it] 54%|█████▍ | 9297/17285 [83:17:13<68:36:08, 30.92s/it] 54%|█████▍ | 9298/17285 [83:17:40<65:54:12, 29.70s/it] 54%|█████▍ | 9299/17285 [83:18:05<62:15:56, 28.07s/it] 54%|█████▍ | 9300/17285 [83:18:41<67:47:00, 30.56s/it] {'loss': 1.4026, 'learning_rate': 9.764689262903611e-05, 'epoch': 1.61} + 54%|█████▍ | 9300/17285 [83:18:41<67:47:00, 30.56s/it] 54%|█████▍ | 9301/17285 [83:19:13<68:57:46, 31.10s/it] 54%|█████▍ | 9302/17285 [83:19:42<67:14:42, 30.32s/it] 54%|█████▍ | 9303/17285 [83:20:10<65:32:06, 29.56s/it] 54%|█████▍ | 9304/17285 [83:20:41<66:30:44, 30.00s/it] 54%|█████▍ | 9305/17285 [83:21:10<66:07:51, 29.83s/it] 54%|█████▍ | 9306/17285 [83:21:42<67:51:56, 30.62s/it] 54%|█████▍ | 9307/17285 [83:22:15<69:11:30, 31.22s/it] 54%|█████▍ | 9308/17285 [83:22:45<68:06:21, 30.74s/it] 54%|█████▍ | 9309/17285 [83:23:11<65:08:37, 29.40s/it] 54%|█████▍ | 9310/17285 [83:23:44<67:50:40, 30.63s/it] {'loss': 1.4154, 'learning_rate': 9.74556228281972e-05, 'epoch': 1.62} + 54%|█████▍ | 9310/17285 [83:23:44<67:50:40, 30.63s/it] 54%|█████▍ | 9311/17285 [83:24:19<70:24:39, 31.79s/it] 54%|█████▍ | 9312/17285 [83:24:49<69:08:30, 31.22s/it] 54%|█████▍ | 9313/17285 [83:25:22<70:31:01, 31.84s/it] 54%|█████▍ | 9314/17285 [83:25:58<73:00:12, 32.97s/it] 54%|█████▍ | 9315/17285 [83:26:28<70:54:43, 32.03s/it] 54%|█████▍ | 9316/17285 [83:26:58<69:41:10, 31.48s/it] 54%|█████▍ | 9317/17285 [83:27:30<70:04:08, 31.66s/it] 54%|█████▍ | 9318/17285 [83:28:02<70:30:58, 31.86s/it] 54%|█████▍ | 9319/17285 [83:28:27<65:44:26, 29.71s/it] 54%|█████▍ | 9320/17285 [83:28:55<64:53:17, 29.33s/it] {'loss': 1.438, 'learning_rate': 9.726436234132755e-05, 'epoch': 1.62} + 54%|█████▍ | 9320/17285 [83:28:55<64:53:17, 29.33s/it] 54%|█████▍ | 9321/17285 [83:29:34<71:06:41, 32.14s/it] 54%|█████▍ | 9322/17285 [83:30:04<69:38:54, 31.49s/it] 54%|█████▍ | 9323/17285 [83:30:45<76:03:43, 34.39s/it] 54%|█████▍ | 9324/17285 [83:31:14<72:36:05, 32.83s/it] 54%|█████▍ | 9325/17285 [83:31:40<67:59:45, 30.75s/it] 54%|█████▍ | 9326/17285 [83:32:16<71:21:03, 32.27s/it] 54%|█████▍ | 9327/17285 [83:32:44<68:45:23, 31.10s/it] 54%|█████▍ | 9328/17285 [83:33:12<66:27:10, 30.07s/it] 54%|█████▍ | 9329/17285 [83:33:40<64:49:04, 29.33s/it] 54%|█████▍ | 9330/17285 [83:34:16<69:17:29, 31.36s/it] {'loss': 1.4191, 'learning_rate': 9.707311186855684e-05, 'epoch': 1.62} + 54%|█████▍ | 9330/17285 [83:34:16<69:17:29, 31.36s/it] 54%|█████▍ | 9331/17285 [83:34:43<66:50:40, 30.25s/it] 54%|█████▍ | 9332/17285 [83:35:13<66:23:30, 30.05s/it] 54%|█████▍ | 9333/17285 [83:35:42<65:21:33, 29.59s/it] 54%|█████▍ | 9334/17285 [83:36:12<65:38:40, 29.72s/it] 54%|█████▍ | 9335/17285 [83:36:47<69:15:31, 31.36s/it] 54%|█████▍ | 9336/17285 [83:37:15<67:12:05, 30.43s/it] 54%|█████▍ | 9337/17285 [83:37:40<63:32:33, 28.78s/it] 54%|█████▍ | 9338/17285 [83:38:16<68:20:10, 30.96s/it] 54%|█████▍ | 9339/17285 [83:38:50<70:06:47, 31.77s/it] 54%|█████▍ | 9340/17285 [83:39:21<70:06:07, 31.76s/it] {'loss': 1.4027, 'learning_rate': 9.68818721099783e-05, 'epoch': 1.62} + 54%|█████▍ | 9340/17285 [83:39:21<70:06:07, 31.76s/it] 54%|█████▍ | 9341/17285 [83:39:54<70:27:14, 31.93s/it] 54%|█████▍ | 9342/17285 [83:40:29<72:37:05, 32.91s/it] 54%|█████▍ | 9343/17285 [83:40:55<67:47:02, 30.73s/it] 54%|█████▍ | 9344/17285 [83:41:23<66:34:35, 30.18s/it] 54%|█████▍ | 9345/17285 [83:41:50<64:25:28, 29.21s/it] 54%|█████▍ | 9346/17285 [83:42:27<69:24:17, 31.47s/it] 54%|█████▍ | 9347/17285 [83:42:53<65:20:32, 29.63s/it] 54%|█████▍ | 9348/17285 [83:43:22<65:16:19, 29.61s/it] 54%|█████▍ | 9349/17285 [83:43:48<62:50:11, 28.50s/it] 54%|█████▍ | 9350/17285 [83:44:14<61:23:09, 27.85s/it] {'loss': 1.4297, 'learning_rate': 9.669064376564584e-05, 'epoch': 1.62} + 54%|█████▍ | 9350/17285 [83:44:14<61:23:09, 27.85s/it] 54%|█████▍ | 9351/17285 [83:44:49<66:00:35, 29.95s/it] 54%|█████▍ | 9352/17285 [83:45:24<69:17:14, 31.44s/it] 54%|█████▍ | 9353/17285 [83:46:01<72:59:41, 33.13s/it] 54%|█████▍ | 9354/17285 [83:46:35<73:18:37, 33.28s/it] 54%|█████▍ | 9355/17285 [83:47:12<75:55:08, 34.47s/it] 54%|█████▍ | 9356/17285 [83:47:41<72:23:27, 32.87s/it] 54%|█████▍ | 9357/17285 [83:48:18<74:53:16, 34.01s/it] 54%|█████▍ | 9358/17285 [83:48:52<75:10:55, 34.14s/it] 54%|█████▍ | 9359/17285 [83:49:24<73:38:01, 33.44s/it] 54%|█████▍ | 9360/17285 [83:49:59<74:47:56, 33.98s/it] {'loss': 1.4164, 'learning_rate': 9.64994275355716e-05, 'epoch': 1.62} + 54%|█████▍ | 9360/17285 [83:49:59<74:47:56, 33.98s/it] 54%|█████▍ | 9361/17285 [83:50:31<73:16:33, 33.29s/it] 54%|█████▍ | 9362/17285 [83:51:04<73:16:13, 33.29s/it] 54%|█████▍ | 9363/17285 [83:51:35<71:39:31, 32.56s/it] 54%|█████▍ | 9364/17285 [83:52:09<72:41:26, 33.04s/it] 54%|█████▍ | 9365/17285 [83:52:36<68:41:27, 31.22s/it] 54%|█████▍ | 9366/17285 [83:53:07<68:03:31, 30.94s/it] 54%|█████▍ | 9367/17285 [83:53:40<69:53:04, 31.77s/it] 54%|█████▍ | 9368/17285 [83:54:15<71:52:39, 32.68s/it] 54%|█████▍ | 9369/17285 [83:54:53<75:33:59, 34.37s/it] 54%|█████▍ | 9370/17285 [83:55:30<76:50:15, 34.95s/it] {'loss': 1.3995, 'learning_rate': 9.630822411972336e-05, 'epoch': 1.63} + 54%|█████▍ | 9370/17285 [83:55:30<76:50:15, 34.95s/it] 54%|█████▍ | 9371/17285 [83:55:59<72:58:18, 33.19s/it] 54%|█████▍ | 9372/17285 [83:56:38<76:56:15, 35.00s/it] 54%|█████▍ | 9373/17285 [83:57:09<74:15:20, 33.79s/it] 54%|█████▍ | 9374/17285 [83:57:49<78:18:32, 35.64s/it] 54%|█████▍ | 9375/17285 [83:58:21<76:01:48, 34.60s/it] 54%|█████▍ | 9376/17285 [83:58:54<74:50:18, 34.06s/it] 54%|█████▍ | 9377/17285 [83:59:22<70:43:00, 32.19s/it] 54%|█████▍ | 9378/17285 [83:59:54<70:34:47, 32.13s/it] 54%|█████▍ | 9379/17285 [84:00:27<71:33:42, 32.59s/it] 54%|█████▍ | 9380/17285 [84:01:05<74:47:28, 34.06s/it] {'loss': 1.4077, 'learning_rate': 9.611703421802204e-05, 'epoch': 1.63} + 54%|█████▍ | 9380/17285 [84:01:05<74:47:28, 34.06s/it] 54%|█████▍ | 9381/17285 [84:01:32<70:29:37, 32.11s/it] 54%|█████▍ | 9382/17285 [84:02:08<72:29:35, 33.02s/it] 54%|█████▍ | 9383/17285 [84:02:36<69:07:45, 31.49s/it] 54%|█████▍ | 9384/17285 [84:03:12<72:40:39, 33.11s/it] 54%|█████▍ | 9385/17285 [84:03:41<69:52:29, 31.84s/it] 54%|█████▍ | 9386/17285 [84:04:13<69:44:30, 31.79s/it] 54%|█████▍ | 9387/17285 [84:04:47<71:07:49, 32.42s/it] 54%|█████▍ | 9388/17285 [84:05:23<73:43:47, 33.61s/it] 54%|█████▍ | 9389/17285 [84:05:54<71:37:24, 32.66s/it] 54%|█████▍ | 9390/17285 [84:06:25<71:04:33, 32.41s/it] {'loss': 1.4506, 'learning_rate': 9.592585853033905e-05, 'epoch': 1.63} + 54%|█████▍ | 9390/17285 [84:06:26<71:04:33, 32.41s/it] 54%|█████▍ | 9391/17285 [84:06:50<66:05:56, 30.14s/it] 54%|█████▍ | 9392/17285 [84:07:19<64:56:12, 29.62s/it] 54%|█████▍ | 9393/17285 [84:07:50<66:12:55, 30.20s/it] 54%|█████▍ | 9394/17285 [84:08:21<66:45:41, 30.46s/it] 54%|█████▍ | 9395/17285 [84:08:52<66:33:16, 30.37s/it] 54%|█████▍ | 9396/17285 [84:09:20<65:05:52, 29.71s/it] 54%|█████▍ | 9397/17285 [84:09:59<71:36:52, 32.68s/it] 54%|█████▍ | 9398/17285 [84:10:30<70:00:23, 31.95s/it] 54%|█████▍ | 9399/17285 [84:11:01<69:35:57, 31.77s/it] 54%|█████▍ | 9400/17285 [84:11:34<70:31:50, 32.20s/it] {'loss': 1.4236, 'learning_rate': 9.573469775649374e-05, 'epoch': 1.63} + 54%|█████▍ | 9400/17285 [84:11:34<70:31:50, 32.20s/it] 54%|█████▍ | 9401/17285 [84:12:03<68:36:23, 31.33s/it] 54%|█████▍ | 9402/17285 [84:12:39<71:04:45, 32.46s/it] 54%|█████▍ | 9403/17285 [84:13:05<66:51:08, 30.53s/it] 54%|█████▍ | 9404/17285 [84:13:48<75:14:18, 34.37s/it] 54%|█████▍ | 9405/17285 [84:14:30<80:23:24, 36.73s/it] 54%|█████▍ | 9406/17285 [84:15:10<82:21:59, 37.63s/it] 54%|█████▍ | 9407/17285 [84:15:44<80:12:24, 36.65s/it] 54%|█████▍ | 9408/17285 [84:16:18<78:21:47, 35.81s/it] 54%|█████▍ | 9409/17285 [84:16:49<75:06:46, 34.33s/it] 54%|█████▍ | 9410/17285 [84:17:15<69:22:29, 31.71s/it] {'loss': 1.3765, 'learning_rate': 9.554355259625092e-05, 'epoch': 1.63} + 54%|█████▍ | 9410/17285 [84:17:15<69:22:29, 31.71s/it] 54%|█████▍ | 9411/17285 [84:17:45<68:45:43, 31.44s/it] 54%|█████▍ | 9412/17285 [84:18:10<64:18:42, 29.41s/it] 54%|█████▍ | 9413/17285 [84:18:42<65:53:39, 30.13s/it] 54%|█████▍ | 9414/17285 [84:19:10<64:35:33, 29.54s/it] 54%|█████▍ | 9415/17285 [84:19:50<71:42:46, 32.80s/it] 54%|█████▍ | 9416/17285 [84:20:22<71:02:38, 32.50s/it] 54%|█████▍ | 9417/17285 [84:20:55<71:06:53, 32.54s/it] 54%|█████▍ | 9418/17285 [84:21:25<69:29:29, 31.80s/it] 54%|█████▍ | 9419/17285 [84:21:53<67:12:06, 30.76s/it] 54%|█████▍ | 9420/17285 [84:22:27<69:16:45, 31.71s/it] {'loss': 1.4404, 'learning_rate': 9.535242374931823e-05, 'epoch': 1.63} + 54%|█████▍ | 9420/17285 [84:22:27<69:16:45, 31.71s/it] 55%|█████▍ | 9421/17285 [84:22:57<68:18:59, 31.27s/it] 55%|█████▍ | 9422/17285 [84:23:34<71:54:06, 32.92s/it] 55%|█████▍ | 9423/17285 [84:24:05<70:14:53, 32.17s/it] 55%|█████▍ | 9424/17285 [84:24:38<71:09:59, 32.59s/it] 55%|█████▍ | 9425/17285 [84:25:14<73:25:33, 33.63s/it] 55%|█████▍ | 9426/17285 [84:25:46<72:06:11, 33.03s/it] 55%|█████▍ | 9427/17285 [84:26:16<70:11:41, 32.16s/it] 55%|█████▍ | 9428/17285 [84:26:50<71:25:02, 32.72s/it] 55%|█████▍ | 9429/17285 [84:27:23<71:39:27, 32.84s/it] 55%|█████▍ | 9430/17285 [84:28:00<74:14:24, 34.02s/it] {'loss': 1.4332, 'learning_rate': 9.516131191534359e-05, 'epoch': 1.64} + 55%|█████▍ | 9430/17285 [84:28:00<74:14:24, 34.02s/it] 55%|█████▍ | 9431/17285 [84:28:39<77:51:03, 35.68s/it] 55%|█████▍ | 9432/17285 [84:29:23<82:54:22, 38.01s/it] 55%|█████▍ | 9433/17285 [84:29:57<80:21:58, 36.85s/it] 55%|█████▍ | 9434/17285 [84:30:33<79:45:42, 36.57s/it] 55%|█████▍ | 9435/17285 [84:31:01<74:20:39, 34.09s/it] 55%|█████▍ | 9436/17285 [84:31:44<79:57:12, 36.67s/it] 55%|█████▍ | 9437/17285 [84:32:17<77:35:01, 35.59s/it] 55%|█████▍ | 9438/17285 [84:32:46<73:27:53, 33.70s/it] 55%|█████▍ | 9439/17285 [84:33:13<69:07:22, 31.72s/it] 55%|█████▍ | 9440/17285 [84:33:42<66:54:42, 30.71s/it] {'loss': 1.4082, 'learning_rate': 9.497021779391265e-05, 'epoch': 1.64} + 55%|█████▍ | 9440/17285 [84:33:42<66:54:42, 30.71s/it] 55%|█████▍ | 9441/17285 [84:34:10<64:58:10, 29.82s/it] 55%|█████▍ | 9442/17285 [84:34:50<71:46:33, 32.95s/it] 55%|█████▍ | 9443/17285 [84:35:20<70:06:53, 32.19s/it] 55%|█████▍ | 9444/17285 [84:35:51<69:02:13, 31.70s/it] 55%|█████▍ | 9445/17285 [84:36:26<71:10:16, 32.68s/it] 55%|█████▍ | 9446/17285 [84:36:54<68:09:17, 31.30s/it] 55%|█████▍ | 9447/17285 [84:37:29<70:42:06, 32.47s/it] 55%|█████▍ | 9448/17285 [84:38:00<69:45:59, 32.05s/it] 55%|█████▍ | 9449/17285 [84:38:29<67:30:33, 31.01s/it] 55%|█████▍ | 9450/17285 [84:39:02<68:55:26, 31.67s/it] {'loss': 1.4542, 'learning_rate': 9.477914208454618e-05, 'epoch': 1.64} + 55%|█████▍ | 9450/17285 [84:39:02<68:55:26, 31.67s/it] 55%|█████▍ | 9451/17285 [84:39:28<65:23:43, 30.05s/it] 55%|█████▍ | 9452/17285 [84:40:03<68:21:27, 31.42s/it] 55%|█████▍ | 9453/17285 [84:40:34<67:59:29, 31.25s/it] 55%|█████▍ | 9454/17285 [84:41:02<65:51:55, 30.28s/it] 55%|█████▍ | 9455/17285 [84:41:26<61:45:33, 28.40s/it] 55%|█████▍ | 9456/17285 [84:41:52<60:28:55, 27.81s/it] 55%|█████▍ | 9457/17285 [84:42:25<63:50:26, 29.36s/it] 55%|█████▍ | 9458/17285 [84:43:01<68:13:25, 31.38s/it] 55%|█████▍ | 9459/17285 [84:43:27<64:55:01, 29.86s/it] 55%|█████▍ | 9460/17285 [84:44:05<69:37:47, 32.03s/it] {'loss': 1.4153, 'learning_rate': 9.458808548669762e-05, 'epoch': 1.64} + 55%|█████▍ | 9460/17285 [84:44:05<69:37:47, 32.03s/it] 55%|█████▍ | 9461/17285 [84:44:38<70:13:10, 32.31s/it] 55%|█████▍ | 9462/17285 [84:45:10<70:34:42, 32.48s/it] 55%|█████▍ | 9463/17285 [84:45:43<70:32:30, 32.47s/it] 55%|█████▍ | 9464/17285 [84:46:12<68:37:33, 31.59s/it] 55%|█████▍ | 9465/17285 [84:46:52<73:50:06, 33.99s/it] 55%|█████▍ | 9466/17285 [84:47:22<71:26:58, 32.90s/it] 55%|█████▍ | 9467/17285 [84:47:48<67:00:01, 30.85s/it] 55%|█████▍ | 9468/17285 [84:48:24<70:10:34, 32.32s/it] 55%|█████▍ | 9469/17285 [84:48:51<66:37:02, 30.68s/it] 55%|█████▍ | 9470/17285 [84:49:17<63:15:32, 29.14s/it] {'loss': 1.4286, 'learning_rate': 9.439704869975043e-05, 'epoch': 1.64} + 55%|█████▍ | 9470/17285 [84:49:17<63:15:32, 29.14s/it] 55%|█████▍ | 9471/17285 [84:49:53<67:57:48, 31.31s/it] 55%|█████▍ | 9472/17285 [84:50:17<63:22:23, 29.20s/it][2023-08-26 12:45:23,903] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 55%|█████▍ | 9473/17285 [84:50:46<63:15:50, 29.15s/it] 55%|█████▍ | 9474/17285 [84:51:11<60:16:53, 27.78s/it] 55%|█████▍ | 9475/17285 [84:51:50<67:23:18, 31.06s/it][2023-08-26 12:46:52,873] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 55%|█████▍ | 9476/17285 [84:52:15<63:52:16, 29.45s/it] 55%|█████▍ | 9477/17285 [84:52:55<70:50:28, 32.66s/it][2023-08-26 12:48:13,061] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 55%|█████▍ | 9478/17285 [84:53:35<75:37:03, 34.87s/it] 55%|█████▍ | 9479/17285 [84:54:02<70:07:16, 32.34s/it] 55%|█████▍ | 9480/17285 [84:54:38<72:45:23, 33.56s/it] {'loss': 1.3663, 'learning_rate': 9.426333511085766e-05, 'epoch': 1.65} + 55%|█████▍ | 9480/17285 [84:54:38<72:45:23, 33.56s/it] 55%|█████▍ | 9481/17285 [84:55:14<73:57:32, 34.12s/it] 55%|█████▍ | 9482/17285 [84:55:49<74:43:10, 34.47s/it] 55%|█████▍ | 9483/17285 [84:56:27<76:49:13, 35.45s/it] 55%|█████▍ | 9484/17285 [84:57:05<78:29:45, 36.22s/it] 55%|█████▍ | 9485/17285 [84:57:29<71:03:08, 32.79s/it] 55%|█████▍ | 9486/17285 [84:58:05<72:43:30, 33.57s/it] 55%|█████▍ | 9487/17285 [84:58:32<68:23:07, 31.57s/it] 55%|█████▍ | 9488/17285 [84:59:01<66:57:31, 30.92s/it] 55%|█████▍ | 9489/17285 [84:59:36<69:16:48, 31.99s/it] 55%|█████▍ | 9490/17285 [85:00:07<68:39:08, 31.71s/it] {'loss': 1.389, 'learning_rate': 9.407233360732119e-05, 'epoch': 1.65} + 55%|█████▍ | 9490/17285 [85:00:07<68:39:08, 31.71s/it] 55%|█████▍ | 9491/17285 [85:00:42<70:46:20, 32.69s/it] 55%|█████▍ | 9492/17285 [85:01:12<69:08:18, 31.94s/it] 55%|█████▍ | 9493/17285 [85:01:52<74:12:42, 34.29s/it] 55%|█████▍ | 9494/17285 [85:02:32<77:56:23, 36.01s/it] 55%|█████▍ | 9495/17285 [85:03:02<74:27:13, 34.41s/it] 55%|█████▍ | 9496/17285 [85:03:27<67:57:47, 31.41s/it] 55%|█████▍ | 9497/17285 [85:03:55<66:07:02, 30.56s/it] 55%|█████▍ | 9498/17285 [85:04:26<66:22:51, 30.69s/it] 55%|█████▍ | 9499/17285 [85:04:57<66:09:59, 30.59s/it] 55%|█████▍ | 9500/17285 [85:05:33<69:38:38, 32.21s/it] {'loss': 1.4096, 'learning_rate': 9.388135380265187e-05, 'epoch': 1.65} + 55%|█████▍ | 9500/17285 [85:05:33<69:38:38, 32.21s/it] 55%|█████▍ | 9501/17285 [85:06:05<69:40:52, 32.23s/it] 55%|█████▍ | 9502/17285 [85:06:31<65:54:09, 30.48s/it] 55%|█████▍ | 9503/17285 [85:07:14<73:54:30, 34.19s/it] 55%|█████▍ | 9504/17285 [85:07:57<79:44:29, 36.89s/it] 55%|█████▍ | 9505/17285 [85:08:28<75:52:36, 35.11s/it] 55%|█████▍ | 9506/17285 [85:08:55<70:05:59, 32.44s/it] 55%|█████▌ | 9507/17285 [85:09:28<70:25:32, 32.60s/it] 55%|█████▌ | 9508/17285 [85:09:57<68:35:25, 31.75s/it] 55%|█████▌ | 9509/17285 [85:10:32<70:35:46, 32.68s/it] 55%|█████▌ | 9510/17285 [85:11:07<72:04:51, 33.38s/it] {'loss': 1.4111, 'learning_rate': 9.369039639595209e-05, 'epoch': 1.65} + 55%|█████▌ | 9510/17285 [85:11:07<72:04:51, 33.38s/it] 55%|█████▌ | 9511/17285 [85:11:39<71:07:50, 32.94s/it] 55%|█████▌ | 9512/17285 [85:12:12<70:50:29, 32.81s/it] 55%|█████▌ | 9513/17285 [85:12:43<69:52:46, 32.37s/it] 55%|█████▌ | 9514/17285 [85:13:17<70:51:32, 32.83s/it] 55%|█████▌ | 9515/17285 [85:13:50<70:55:41, 32.86s/it] 55%|█████▌ | 9516/17285 [85:14:27<73:56:39, 34.26s/it] 55%|█████▌ | 9517/17285 [85:15:00<73:11:23, 33.92s/it] 55%|█████▌ | 9518/17285 [85:15:35<73:43:30, 34.17s/it] 55%|█████▌ | 9519/17285 [85:16:00<67:33:26, 31.32s/it] 55%|█████▌ | 9520/17285 [85:16:30<66:48:21, 30.97s/it] {'loss': 1.3776, 'learning_rate': 9.349946208624212e-05, 'epoch': 1.65} + 55%|█████▌ | 9520/17285 [85:16:30<66:48:21, 30.97s/it] 55%|█████▌ | 9521/17285 [85:17:05<69:21:21, 32.16s/it] 55%|█████▌ | 9522/17285 [85:17:41<72:07:37, 33.45s/it] 55%|█████▌ | 9523/17285 [85:18:11<69:51:31, 32.40s/it] 55%|█████▌ | 9524/17285 [85:18:50<74:04:53, 34.36s/it] 55%|█████▌ | 9525/17285 [85:19:17<69:17:39, 32.15s/it] 55%|█████▌ | 9526/17285 [85:19:52<70:47:30, 32.85s/it] 55%|█████▌ | 9527/17285 [85:20:22<69:07:54, 32.08s/it] 55%|█████▌ | 9528/17285 [85:20:50<66:34:55, 30.90s/it] 55%|█████▌ | 9529/17285 [85:21:31<73:00:23, 33.89s/it] 55%|█████▌ | 9530/17285 [85:21:57<67:45:49, 31.46s/it] {'loss': 1.4375, 'learning_rate': 9.330855157245775e-05, 'epoch': 1.65} + 55%|█████▌ | 9530/17285 [85:21:57<67:45:49, 31.46s/it] 55%|█████▌ | 9531/17285 [85:22:29<68:13:39, 31.68s/it] 55%|█████▌ | 9532/17285 [85:23:00<67:44:42, 31.46s/it] 55%|█████▌ | 9533/17285 [85:23:25<63:50:36, 29.65s/it] 55%|█████▌ | 9534/17285 [85:24:06<70:52:56, 32.92s/it] 55%|█████▌ | 9535/17285 [85:24:33<67:01:52, 31.14s/it] 55%|█████▌ | 9536/17285 [85:25:15<74:19:08, 34.53s/it] 55%|█████▌ | 9537/17285 [85:25:43<70:05:47, 32.57s/it] 55%|█████▌ | 9538/17285 [85:26:18<71:32:23, 33.24s/it] 55%|█████▌ | 9539/17285 [85:26:57<75:21:06, 35.02s/it] 55%|█████▌ | 9540/17285 [85:27:28<72:34:22, 33.73s/it] {'loss': 1.4131, 'learning_rate': 9.31176655534477e-05, 'epoch': 1.66} + 55%|█████▌ | 9540/17285 [85:27:28<72:34:22, 33.73s/it] 55%|█████▌ | 9541/17285 [85:28:01<71:50:14, 33.40s/it] 55%|█████▌ | 9542/17285 [85:28:28<67:40:08, 31.46s/it] 55%|█████▌ | 9543/17285 [85:29:03<70:25:27, 32.75s/it] 55%|█████▌ | 9544/17285 [85:29:30<66:27:54, 30.91s/it] 55%|█████▌ | 9545/17285 [85:30:06<69:44:45, 32.44s/it] 55%|█████▌ | 9546/17285 [85:30:33<66:20:11, 30.86s/it] 55%|█████▌ | 9547/17285 [85:31:02<64:49:27, 30.16s/it] 55%|█████▌ | 9548/17285 [85:31:33<65:46:31, 30.61s/it] 55%|█████▌ | 9549/17285 [85:32:10<69:42:42, 32.44s/it] 55%|█████▌ | 9550/17285 [85:32:41<68:57:56, 32.10s/it] {'loss': 1.3917, 'learning_rate': 9.292680472797101e-05, 'epoch': 1.66} + 55%|█████▌ | 9550/17285 [85:32:41<68:57:56, 32.10s/it] 55%|█████▌ | 9551/17285 [85:33:07<64:37:33, 30.08s/it] 55%|█████▌ | 9552/17285 [85:33:36<64:23:27, 29.98s/it] 55%|█████▌ | 9553/17285 [85:34:08<65:34:43, 30.53s/it] 55%|█████▌ | 9554/17285 [85:34:38<65:20:55, 30.43s/it] 55%|█████▌ | 9555/17285 [85:35:17<70:46:22, 32.96s/it] 55%|█████▌ | 9556/17285 [85:35:49<70:13:25, 32.71s/it] 55%|█████▌ | 9557/17285 [85:36:25<71:58:16, 33.53s/it] 55%|█████▌ | 9558/17285 [85:36:55<69:30:13, 32.38s/it] 55%|█████▌ | 9559/17285 [85:37:26<69:03:25, 32.18s/it] 55%|█████▌ | 9560/17285 [85:37:51<64:23:10, 30.01s/it] {'loss': 1.4346, 'learning_rate': 9.273596979469446e-05, 'epoch': 1.66} + 55%|█████▌ | 9560/17285 [85:37:51<64:23:10, 30.01s/it] 55%|█████▌ | 9561/17285 [85:38:21<64:31:16, 30.07s/it] 55%|█████▌ | 9562/17285 [85:39:02<71:22:55, 33.27s/it] 55%|█████▌ | 9563/17285 [85:39:34<70:15:52, 32.76s/it] 55%|█████▌ | 9564/17285 [85:40:09<71:38:51, 33.41s/it] 55%|█████▌ | 9565/17285 [85:40:45<73:48:36, 34.42s/it] 55%|█████▌ | 9566/17285 [85:41:21<74:30:09, 34.75s/it] 55%|█████▌ | 9567/17285 [85:41:48<69:23:10, 32.36s/it] 55%|█████▌ | 9568/17285 [85:42:14<65:42:04, 30.65s/it] 55%|█████▌ | 9569/17285 [85:42:48<67:37:37, 31.55s/it] 55%|█████▌ | 9570/17285 [85:43:26<71:35:52, 33.41s/it] {'loss': 1.4056, 'learning_rate': 9.254516145219005e-05, 'epoch': 1.66} + 55%|█████▌ | 9570/17285 [85:43:26<71:35:52, 33.41s/it] 55%|█████▌ | 9571/17285 [85:43:55<68:44:28, 32.08s/it] 55%|█████▌ | 9572/17285 [85:44:33<72:36:08, 33.89s/it] 55%|█████▌ | 9573/17285 [85:44:59<67:14:59, 31.39s/it] 55%|█████▌ | 9574/17285 [85:45:37<71:54:50, 33.57s/it] 55%|█████▌ | 9575/17285 [85:46:09<70:57:07, 33.13s/it] 55%|█████▌ | 9576/17285 [85:46:44<71:50:21, 33.55s/it] 55%|█████▌ | 9577/17285 [85:47:19<73:05:31, 34.14s/it] 55%|█████▌ | 9578/17285 [85:47:49<70:32:56, 32.95s/it] 55%|█████▌ | 9579/17285 [85:48:20<68:42:59, 32.10s/it] 55%|█████▌ | 9580/17285 [85:48:53<69:30:00, 32.47s/it] {'loss': 1.4045, 'learning_rate': 9.235438039893248e-05, 'epoch': 1.66} + 55%|█████▌ | 9580/17285 [85:48:53<69:30:00, 32.47s/it] 55%|█████▌ | 9581/17285 [85:49:23<68:15:13, 31.89s/it] 55%|█████▌ | 9582/17285 [85:50:02<72:23:45, 33.83s/it] 55%|█████▌ | 9583/17285 [85:50:32<69:58:57, 32.71s/it] 55%|█████▌ | 9584/17285 [85:50:59<66:22:38, 31.03s/it] 55%|█████▌ | 9585/17285 [85:51:28<65:03:23, 30.42s/it] 55%|█████▌ | 9586/17285 [85:51:58<64:55:39, 30.36s/it] 55%|█████▌ | 9587/17285 [85:52:25<62:18:42, 29.14s/it] 55%|█████▌ | 9588/17285 [85:52:55<63:26:45, 29.67s/it] 55%|█████▌ | 9589/17285 [85:53:27<64:38:15, 30.24s/it] 55%|█████▌ | 9590/17285 [85:53:52<61:25:41, 28.74s/it] {'loss': 1.4248, 'learning_rate': 9.216362733329655e-05, 'epoch': 1.66} + 55%|█████▌ | 9590/17285 [85:53:52<61:25:41, 28.74s/it] 55%|█████▌ | 9591/17285 [85:54:36<70:56:28, 33.19s/it] 55%|█████▌ | 9592/17285 [85:55:03<67:09:55, 31.43s/it] 55%|█████▌ | 9593/17285 [85:55:34<66:41:47, 31.22s/it] 56%|█████▌ | 9594/17285 [85:56:02<64:32:14, 30.21s/it] 56%|█████▌ | 9595/17285 [85:56:32<64:15:02, 30.08s/it] 56%|█████▌ | 9596/17285 [85:57:01<63:37:57, 29.79s/it] 56%|█████▌ | 9597/17285 [85:57:39<69:08:37, 32.38s/it] 56%|█████▌ | 9598/17285 [85:58:10<68:25:52, 32.05s/it] 56%|█████▌ | 9599/17285 [85:58:43<69:04:30, 32.35s/it] 56%|█████▌ | 9600/17285 [85:59:15<68:40:25, 32.17s/it] {'loss': 1.4291, 'learning_rate': 9.197290295355454e-05, 'epoch': 1.67} + 56%|█████▌ | 9600/17285 [85:59:15<68:40:25, 32.17s/it] 56%|█████▌ | 9601/17285 [85:59:52<71:39:54, 33.58s/it] 56%|█████▌ | 9602/17285 [86:00:30<74:11:17, 34.76s/it] 56%|█████▌ | 9603/17285 [86:01:07<76:12:12, 35.71s/it] 56%|█████▌ | 9604/17285 [86:01:36<71:28:00, 33.50s/it] 56%|█████▌ | 9605/17285 [86:02:07<70:02:26, 32.83s/it] 56%|█████▌ | 9606/17285 [86:02:40<70:06:23, 32.87s/it] 56%|█████▌ | 9607/17285 [86:03:10<68:16:11, 32.01s/it] 56%|█████▌ | 9608/17285 [86:03:42<68:21:43, 32.06s/it] 56%|█████▌ | 9609/17285 [86:04:09<65:07:30, 30.54s/it] 56%|█████▌ | 9610/17285 [86:04:49<70:58:11, 33.29s/it] {'loss': 1.4162, 'learning_rate': 9.17822079578738e-05, 'epoch': 1.67} + 56%|█████▌ | 9610/17285 [86:04:49<70:58:11, 33.29s/it] 56%|█████▌ | 9611/17285 [86:05:20<69:24:14, 32.56s/it] 56%|█████▌ | 9612/17285 [86:05:50<67:54:32, 31.86s/it] 56%|█████▌ | 9613/17285 [86:06:15<63:16:00, 29.69s/it] 56%|█████▌ | 9614/17285 [86:06:47<65:13:15, 30.61s/it] 56%|█████▌ | 9615/17285 [86:07:18<64:59:43, 30.51s/it] 56%|█████▌ | 9616/17285 [86:07:46<63:36:49, 29.86s/it] 56%|█████▌ | 9617/17285 [86:08:17<64:11:02, 30.13s/it] 56%|█████▌ | 9618/17285 [86:08:50<66:03:25, 31.02s/it] 56%|█████▌ | 9619/17285 [86:09:20<65:12:19, 30.62s/it] 56%|█████▌ | 9620/17285 [86:09:54<67:23:03, 31.65s/it] {'loss': 1.4189, 'learning_rate': 9.159154304431409e-05, 'epoch': 1.67} + 56%|█████▌ | 9620/17285 [86:09:54<67:23:03, 31.65s/it] 56%|█████▌ | 9621/17285 [86:10:23<65:46:01, 30.89s/it] 56%|█████▌ | 9622/17285 [86:10:49<62:32:50, 29.38s/it] 56%|█████▌ | 9623/17285 [86:11:19<62:58:23, 29.59s/it] 56%|█████▌ | 9624/17285 [86:11:47<62:29:52, 29.37s/it] 56%|█████▌ | 9625/17285 [86:12:26<68:21:00, 32.12s/it] 56%|█████▌ | 9626/17285 [86:12:56<67:12:09, 31.59s/it] 56%|█████▌ | 9627/17285 [86:13:24<64:39:11, 30.39s/it] 56%|█████▌ | 9628/17285 [86:13:53<63:45:00, 29.97s/it] 56%|█████▌ | 9629/17285 [86:14:19<61:05:01, 28.72s/it] 56%|█████▌ | 9630/17285 [86:14:55<65:40:18, 30.88s/it] {'loss': 1.4231, 'learning_rate': 9.140090891082506e-05, 'epoch': 1.67} + 56%|█████▌ | 9630/17285 [86:14:55<65:40:18, 30.88s/it] 56%|█████▌ | 9631/17285 [86:15:25<65:27:29, 30.79s/it] 56%|█████▌ | 9632/17285 [86:15:59<67:21:02, 31.68s/it] 56%|█████▌ | 9633/17285 [86:16:30<67:03:10, 31.55s/it] 56%|█████▌ | 9634/17285 [86:17:10<72:29:06, 34.11s/it] 56%|█████▌ | 9635/17285 [86:17:37<67:34:14, 31.80s/it] 56%|█████▌ | 9636/17285 [86:18:11<69:20:00, 32.63s/it] 56%|█████▌ | 9637/17285 [86:18:42<68:10:44, 32.09s/it] 56%|█████▌ | 9638/17285 [86:19:22<73:22:44, 34.54s/it] 56%|█████▌ | 9639/17285 [86:19:59<74:21:09, 35.01s/it] 56%|█████▌ | 9640/17285 [86:20:25<68:48:58, 32.41s/it] {'loss': 1.4008, 'learning_rate': 9.121030625524365e-05, 'epoch': 1.67} + 56%|█████▌ | 9640/17285 [86:20:25<68:48:58, 32.41s/it] 56%|█████▌ | 9641/17285 [86:20:56<68:04:27, 32.06s/it][2023-08-26 14:16:13,017] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 56%|█████▌ | 9642/17285 [86:21:35<72:38:00, 34.21s/it] 56%|█████▌ | 9643/17285 [86:22:13<74:58:58, 35.32s/it] 56%|█████▌ | 9644/17285 [86:22:47<73:50:15, 34.79s/it] 56%|█████▌ | 9645/17285 [86:23:21<73:12:16, 34.49s/it] 56%|█████▌ | 9646/17285 [86:23:53<72:04:34, 33.97s/it] 56%|█████▌ | 9647/17285 [86:24:24<70:09:17, 33.07s/it] 56%|█████▌ | 9648/17285 [86:24:53<67:41:12, 31.91s/it] 56%|█████▌ | 9649/17285 [86:25:19<63:54:46, 30.13s/it] 56%|█████▌ | 9650/17285 [86:25:51<64:29:18, 30.41s/it] {'loss': 1.4321, 'learning_rate': 9.103879135550087e-05, 'epoch': 1.67} + 56%|█████▌ | 9650/17285 [86:25:51<64:29:18, 30.41s/it] 56%|█████▌ | 9651/17285 [86:26:20<63:40:40, 30.03s/it] 56%|█████▌ | 9652/17285 [86:26:49<63:08:18, 29.78s/it] 56%|█████▌ | 9653/17285 [86:27:16<61:16:40, 28.90s/it] 56%|█████▌ | 9654/17285 [86:27:53<66:18:10, 31.28s/it] 56%|█████▌ | 9655/17285 [86:28:29<69:19:39, 32.71s/it] 56%|█████▌ | 9656/17285 [86:29:03<70:41:46, 33.36s/it] 56%|█████▌ | 9657/17285 [86:29:30<66:02:22, 31.17s/it] 56%|█████▌ | 9658/17285 [86:30:03<67:16:47, 31.76s/it] 56%|█████▌ | 9659/17285 [86:30:31<65:14:11, 30.80s/it] 56%|█████▌ | 9660/17285 [86:31:05<66:48:04, 31.54s/it] {'loss': 1.4719, 'learning_rate': 9.084825043007008e-05, 'epoch': 1.68} + 56%|█████▌ | 9660/17285 [86:31:05<66:48:04, 31.54s/it] 56%|█████▌ | 9661/17285 [86:31:35<65:50:39, 31.09s/it] 56%|█████▌ | 9662/17285 [86:32:07<66:56:03, 31.61s/it] 56%|█████▌ | 9663/17285 [86:32:38<66:13:20, 31.28s/it] 56%|█████▌ | 9664/17285 [86:33:18<71:44:01, 33.89s/it] 56%|█████▌ | 9665/17285 [86:33:48<69:18:28, 32.74s/it] 56%|█████▌ | 9666/17285 [86:34:20<69:05:11, 32.64s/it] 56%|█████▌ | 9667/17285 [86:34:52<68:37:28, 32.43s/it] 56%|█████▌ | 9668/17285 [86:35:21<66:27:43, 31.41s/it] 56%|█████▌ | 9669/17285 [86:35:53<66:44:33, 31.55s/it] 56%|█████▌ | 9670/17285 [86:36:25<67:01:24, 31.69s/it] {'loss': 1.4252, 'learning_rate': 9.065774300561337e-05, 'epoch': 1.68} + 56%|█████▌ | 9670/17285 [86:36:25<67:01:24, 31.69s/it] 56%|█████▌ | 9671/17285 [86:37:06<72:48:35, 34.43s/it] 56%|█████▌ | 9672/17285 [86:37:40<72:38:48, 34.35s/it] 56%|█████▌ | 9673/17285 [86:38:11<70:32:21, 33.36s/it] 56%|█████▌ | 9674/17285 [86:38:47<72:09:22, 34.13s/it] 56%|█████▌ | 9675/17285 [86:39:16<69:05:34, 32.69s/it] 56%|█████▌ | 9676/17285 [86:39:43<65:18:31, 30.90s/it] 56%|█████▌ | 9677/17285 [86:40:11<63:07:52, 29.87s/it] 56%|█████▌ | 9678/17285 [86:40:40<62:52:15, 29.75s/it] 56%|█████▌ | 9679/17285 [86:41:10<62:42:59, 29.68s/it] 56%|█████▌ | 9680/17285 [86:41:41<63:31:18, 30.07s/it] {'loss': 1.4116, 'learning_rate': 9.04672697795039e-05, 'epoch': 1.68} + 56%|█████▌ | 9680/17285 [86:41:41<63:31:18, 30.07s/it] 56%|█████▌ | 9681/17285 [86:42:16<66:33:18, 31.51s/it] 56%|█████▌ | 9682/17285 [86:42:50<68:31:10, 32.44s/it] 56%|█████▌ | 9683/17285 [86:43:25<70:16:53, 33.28s/it] 56%|█████▌ | 9684/17285 [86:43:59<70:20:49, 33.32s/it] 56%|█████▌ | 9685/17285 [86:44:25<65:38:14, 31.09s/it][2023-08-26 14:39:34,277] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 56%|█████▌ | 9686/17285 [86:44:57<66:09:35, 31.34s/it] 56%|█████▌ | 9687/17285 [86:45:26<64:55:25, 30.76s/it] 56%|█████▌ | 9688/17285 [86:46:01<67:33:05, 32.01s/it] 56%|█████▌ | 9689/17285 [86:46:31<66:30:31, 31.52s/it] 56%|█████▌ | 9690/17285 [86:47:06<68:25:11, 32.43s/it] {'loss': 1.421, 'learning_rate': 9.029587369187029e-05, 'epoch': 1.68} + 56%|█████▌ | 9690/17285 [86:47:06<68:25:11, 32.43s/it] 56%|█████▌ | 9691/17285 [86:47:46<73:01:24, 34.62s/it] 56%|█████▌ | 9692/17285 [86:48:18<71:48:35, 34.05s/it] 56%|█████▌ | 9693/17285 [86:48:50<70:01:44, 33.21s/it] 56%|█████▌ | 9694/17285 [86:49:15<65:13:35, 30.93s/it] 56%|█████▌ | 9695/17285 [86:49:47<65:53:54, 31.26s/it] 56%|█████▌ | 9696/17285 [86:50:16<64:10:33, 30.44s/it] 56%|█████▌ | 9697/17285 [86:50:42<61:27:12, 29.16s/it] 56%|█████▌ | 9698/17285 [86:51:13<62:50:54, 29.82s/it] 56%|█████▌ | 9699/17285 [86:51:48<66:00:54, 31.33s/it] 56%|█████▌ | 9700/17285 [86:52:21<67:01:58, 31.82s/it] {'loss': 1.4574, 'learning_rate': 9.010546736343308e-05, 'epoch': 1.68} + 56%|█████▌ | 9700/17285 [86:52:21<67:01:58, 31.82s/it] 56%|█████▌ | 9701/17285 [86:52:56<68:42:23, 32.61s/it] 56%|█████▌ | 9702/17285 [86:53:30<69:33:52, 33.03s/it] 56%|█████▌ | 9703/17285 [86:54:00<67:56:27, 32.26s/it] 56%|█████▌ | 9704/17285 [86:54:27<64:39:05, 30.70s/it] 56%|█████▌ | 9705/17285 [86:54:59<65:12:18, 30.97s/it] 56%|█████▌ | 9706/17285 [86:55:27<63:21:20, 30.09s/it] 56%|█████▌ | 9707/17285 [86:55:56<62:45:59, 29.82s/it] 56%|█████▌ | 9708/17285 [86:56:22<60:43:01, 28.85s/it] 56%|█████▌ | 9709/17285 [86:56:56<63:41:16, 30.26s/it] 56%|█████▌ | 9710/17285 [86:57:24<62:14:45, 29.58s/it] {'loss': 1.3972, 'learning_rate': 8.991509725500809e-05, 'epoch': 1.69} + 56%|█████▌ | 9710/17285 [86:57:24<62:14:45, 29.58s/it] 56%|█████▌ | 9711/17285 [86:57:53<61:56:52, 29.44s/it] 56%|█████▌ | 9712/17285 [86:58:24<63:01:26, 29.96s/it] 56%|█████▌ | 9713/17285 [86:58:51<60:59:25, 29.00s/it] 56%|█████▌ | 9714/17285 [86:59:16<58:20:28, 27.74s/it] 56%|█████▌ | 9715/17285 [86:59:45<59:25:10, 28.26s/it] 56%|█████▌ | 9716/17285 [87:00:19<63:07:27, 30.02s/it] 56%|█████▌ | 9717/17285 [87:00:50<63:35:12, 30.25s/it] 56%|█████▌ | 9718/17285 [87:01:21<64:06:06, 30.50s/it] 56%|█████▌ | 9719/17285 [87:02:01<69:53:27, 33.26s/it] 56%|█████▌ | 9720/17285 [87:02:31<67:58:53, 32.35s/it] {'loss': 1.3886, 'learning_rate': 8.972476406346583e-05, 'epoch': 1.69} + 56%|█████▌ | 9720/17285 [87:02:31<67:58:53, 32.35s/it] 56%|█████▌ | 9721/17285 [87:02:59<65:11:30, 31.03s/it] 56%|█████▌ | 9722/17285 [87:03:28<64:04:25, 30.50s/it] 56%|█████▋ | 9723/17285 [87:04:02<65:58:29, 31.41s/it] 56%|█████▋ | 9724/17285 [87:04:30<64:01:33, 30.48s/it] 56%|█████▋ | 9725/17285 [87:05:00<63:34:24, 30.27s/it] 56%|█████▋ | 9726/17285 [87:05:26<60:45:28, 28.94s/it] 56%|█████▋ | 9727/17285 [87:05:58<62:37:48, 29.83s/it] 56%|█████▋ | 9728/17285 [87:06:28<62:59:10, 30.01s/it] 56%|█████▋ | 9729/17285 [87:07:00<64:18:51, 30.64s/it] 56%|█████▋ | 9730/17285 [87:07:26<61:13:00, 29.17s/it] {'loss': 1.4333, 'learning_rate': 8.953446848554158e-05, 'epoch': 1.69} + 56%|█████▋ | 9730/17285 [87:07:26<61:13:00, 29.17s/it] 56%|█████▋ | 9731/17285 [87:08:00<64:27:25, 30.72s/it] 56%|█████▋ | 9732/17285 [87:08:33<65:44:30, 31.33s/it] 56%|█████▋ | 9733/17285 [87:09:15<72:13:45, 34.43s/it] 56%|█████▋ | 9734/17285 [87:09:42<67:49:41, 32.34s/it] 56%|█████▋ | 9735/17285 [87:10:10<64:45:01, 30.87s/it] 56%|█████▋ | 9736/17285 [87:10:35<61:15:31, 29.21s/it] 56%|█████▋ | 9737/17285 [87:11:15<67:47:19, 32.33s/it] 56%|█████▋ | 9738/17285 [87:11:48<68:36:21, 32.73s/it] 56%|█████▋ | 9739/17285 [87:12:23<69:59:06, 33.39s/it] 56%|█████▋ | 9740/17285 [87:12:58<70:33:34, 33.67s/it] {'loss': 1.3839, 'learning_rate': 8.934421121783305e-05, 'epoch': 1.69} + 56%|█████▋ | 9740/17285 [87:12:58<70:33:34, 33.67s/it] 56%|█████▋ | 9741/17285 [87:13:32<71:04:56, 33.92s/it] 56%|█████▋ | 9742/17285 [87:13:58<66:03:30, 31.53s/it] 56%|█████▋ | 9743/17285 [87:14:26<63:32:47, 30.33s/it] 56%|█████▋ | 9744/17285 [87:14:56<63:49:26, 30.47s/it] 56%|█████▋ | 9745/17285 [87:15:32<67:12:57, 32.09s/it] 56%|█████▋ | 9746/17285 [87:15:58<63:09:57, 30.16s/it] 56%|█████▋ | 9747/17285 [87:16:23<59:41:17, 28.51s/it] 56%|█████▋ | 9748/17285 [87:17:00<65:23:48, 31.24s/it] 56%|█████▋ | 9749/17285 [87:17:32<65:30:17, 31.29s/it] 56%|█████▋ | 9750/17285 [87:18:05<67:01:19, 32.02s/it] {'loss': 1.4411, 'learning_rate': 8.915399295679763e-05, 'epoch': 1.69} + 56%|█████▋ | 9750/17285 [87:18:05<67:01:19, 32.02s/it] 56%|█████▋ | 9751/17285 [87:18:40<68:57:18, 32.95s/it] 56%|█████▋ | 9752/17285 [87:19:13<68:26:43, 32.71s/it] 56%|█████▋ | 9753/17285 [87:19:47<69:29:12, 33.21s/it] 56%|█████▋ | 9754/17285 [87:20:15<66:00:33, 31.55s/it] 56%|█████▋ | 9755/17285 [87:20:47<66:30:24, 31.80s/it] 56%|█████▋ | 9756/17285 [87:21:25<70:32:30, 33.73s/it] 56%|█████▋ | 9757/17285 [87:21:57<69:09:21, 33.07s/it] 56%|█████▋ | 9758/17285 [87:22:29<68:47:32, 32.90s/it] 56%|█████▋ | 9759/17285 [87:23:03<69:17:21, 33.14s/it] 56%|█████▋ | 9760/17285 [87:23:33<66:59:16, 32.05s/it] {'loss': 1.4303, 'learning_rate': 8.896381439874992e-05, 'epoch': 1.69} + 56%|█████▋ | 9760/17285 [87:23:33<66:59:16, 32.05s/it] 56%|█████▋ | 9761/17285 [87:24:08<69:24:55, 33.21s/it] 56%|█████▋ | 9762/17285 [87:24:39<67:44:22, 32.42s/it] 56%|█████▋ | 9763/17285 [87:25:07<65:03:43, 31.14s/it] 56%|█████▋ | 9764/17285 [87:25:39<65:26:53, 31.33s/it] 56%|█████▋ | 9765/17285 [87:26:08<64:08:58, 30.71s/it] 56%|█████▋ | 9766/17285 [87:26:39<64:24:41, 30.84s/it] 57%|█████▋ | 9767/17285 [87:27:10<64:04:10, 30.68s/it] 57%|█████▋ | 9768/17285 [87:27:48<68:47:53, 32.95s/it] 57%|█████▋ | 9769/17285 [87:28:15<65:10:48, 31.22s/it] 57%|█████▋ | 9770/17285 [87:28:47<65:36:04, 31.43s/it] {'loss': 1.453, 'learning_rate': 8.877367623985927e-05, 'epoch': 1.7} + 57%|█████▋ | 9770/17285 [87:28:47<65:36:04, 31.43s/it] 57%|█████▋ | 9771/17285 [87:29:19<66:03:32, 31.65s/it] 57%|█████▋ | 9772/17285 [87:29:54<68:00:33, 32.59s/it] 57%|█████▋ | 9773/17285 [87:30:29<69:25:33, 33.27s/it] 57%|█████▋ | 9774/17285 [87:31:01<68:59:13, 33.07s/it] 57%|█████▋ | 9775/17285 [87:31:34<68:36:05, 32.88s/it] 57%|█████▋ | 9776/17285 [87:32:09<70:09:04, 33.63s/it] 57%|█████▋ | 9777/17285 [87:32:38<66:51:53, 32.06s/it] 57%|█████▋ | 9778/17285 [87:33:14<69:32:49, 33.35s/it] 57%|█████▋ | 9779/17285 [87:33:51<72:00:04, 34.53s/it] 57%|█████▋ | 9780/17285 [87:34:18<66:56:02, 32.11s/it] {'loss': 1.4066, 'learning_rate': 8.858357917614699e-05, 'epoch': 1.7} + 57%|█████▋ | 9780/17285 [87:34:18<66:56:02, 32.11s/it] 57%|█████��� | 9781/17285 [87:34:49<66:17:35, 31.80s/it] 57%|█████▋ | 9782/17285 [87:35:24<68:40:58, 32.95s/it] 57%|█████▋ | 9783/17285 [87:35:58<69:06:12, 33.16s/it] 57%|█████▋ | 9784/17285 [87:36:31<68:56:33, 33.09s/it] 57%|█████▋ | 9785/17285 [87:37:06<70:11:57, 33.70s/it] 57%|█████▋ | 9786/17285 [87:37:37<68:26:36, 32.86s/it] 57%|█████▋ | 9787/17285 [87:38:13<70:23:35, 33.80s/it] 57%|█████▋ | 9788/17285 [87:38:44<68:36:48, 32.95s/it] 57%|█████▋ | 9789/17285 [87:39:18<69:00:42, 33.14s/it] 57%|█████▋ | 9790/17285 [87:39:44<64:36:47, 31.04s/it] {'loss': 1.3924, 'learning_rate': 8.839352390348404e-05, 'epoch': 1.7} + 57%|█████▋ | 9790/17285 [87:39:44<64:36:47, 31.04s/it] 57%|█████▋ | 9791/17285 [87:40:22<69:05:38, 33.19s/it] 57%|█████▋ | 9792/17285 [87:40:55<68:55:47, 33.12s/it] 57%|█████▋ | 9793/17285 [87:41:28<69:10:31, 33.24s/it] 57%|█████▋ | 9794/17285 [87:41:54<64:28:40, 30.99s/it] 57%|█████▋ | 9795/17285 [87:42:24<63:56:34, 30.73s/it] 57%|█████▋ | 9796/17285 [87:42:56<64:41:52, 31.10s/it] 57%|█████▋ | 9797/17285 [87:43:25<63:03:11, 30.31s/it] 57%|█████▋ | 9798/17285 [87:43:55<63:17:28, 30.43s/it] 57%|█████▋ | 9799/17285 [87:44:27<64:12:39, 30.88s/it] 57%|█████▋ | 9800/17285 [87:44:59<64:27:32, 31.00s/it] {'loss': 1.3878, 'learning_rate': 8.820351111758849e-05, 'epoch': 1.7} + 57%|█████▋ | 9800/17285 [87:44:59<64:27:32, 31.00s/it] 57%|█████▋ | 9801/17285 [87:45:25<61:52:45, 29.77s/it] 57%|█████▋ | 9802/17285 [87:46:10<70:59:54, 34.16s/it] 57%|█████▋ | 9803/17285 [87:46:37<66:34:00, 32.03s/it] 57%|█████▋ | 9804/17285 [87:47:06<64:35:10, 31.08s/it] 57%|█████▋ | 9805/17285 [87:47:33<62:08:00, 29.90s/it] 57%|█████▋ | 9806/17285 [87:48:07<64:41:37, 31.14s/it] 57%|█████▋ | 9807/17285 [87:48:41<66:42:55, 32.12s/it] 57%|█████▋ | 9808/17285 [87:49:15<67:31:00, 32.51s/it] 57%|█████▋ | 9809/17285 [87:49:41<63:24:18, 30.53s/it] 57%|█████▋ | 9810/17285 [87:50:10<62:52:49, 30.28s/it] {'loss': 1.3849, 'learning_rate': 8.801354151402274e-05, 'epoch': 1.7} + 57%|█████▋ | 9810/17285 [87:50:10<62:52:49, 30.28s/it] 57%|█████▋ | 9811/17285 [87:50:49<67:54:25, 32.71s/it] 57%|█████▋ | 9812/17285 [87:51:28<71:50:12, 34.61s/it] 57%|█████▋ | 9813/17285 [87:51:59<69:27:24, 33.46s/it] 57%|█████▋ | 9814/17285 [87:52:29<67:21:21, 32.46s/it] 57%|█████▋ | 9815/17285 [87:52:59<65:59:12, 31.80s/it] 57%|█████▋ | 9816/17285 [87:53:46<75:38:03, 36.46s/it] 57%|█████▋ | 9817/17285 [87:54:25<76:55:52, 37.09s/it] 57%|█████▋ | 9818/17285 [87:54:53<71:09:09, 34.30s/it] 57%|█████▋ | 9819/17285 [87:55:23<68:46:28, 33.16s/it] 57%|█████▋ | 9820/17285 [87:55:51<65:11:35, 31.44s/it] {'loss': 1.4135, 'learning_rate': 8.782361578819118e-05, 'epoch': 1.7} + 57%|█████▋ | 9820/17285 [87:55:51<65:11:35, 31.44s/it] 57%|█████▋ | 9821/17285 [87:56:24<66:30:52, 32.08s/it] 57%|█████▋ | 9822/17285 [87:57:00<68:40:47, 33.13s/it] 57%|█████▋ | 9823/17285 [87:57:33<68:45:27, 33.17s/it] 57%|█████▋ | 9824/17285 [87:58:00<64:34:54, 31.16s/it] 57%|█████▋ | 9825/17285 [87:58:29<63:48:15, 30.79s/it] 57%|█████▋ | 9826/17285 [87:59:10<70:03:11, 33.81s/it] 57%|█████▋ | 9827/17285 [87:59:50<73:50:49, 35.65s/it] 57%|█████▋ | 9828/17285 [88:00:23<72:02:04, 34.78s/it] 57%|█████▋ | 9829/17285 [88:01:02<74:30:24, 35.97s/it] 57%|█████▋ | 9830/17285 [88:01:31<70:37:00, 34.10s/it] {'loss': 1.4187, 'learning_rate': 8.763373463533758e-05, 'epoch': 1.71} + 57%|█████▋ | 9830/17285 [88:01:31<70:37:00, 34.10s/it] 57%|█████▋ | 9831/17285 [88:02:08<71:53:25, 34.72s/it] 57%|█████▋ | 9832/17285 [88:02:46<74:17:22, 35.88s/it] 57%|█████▋ | 9833/17285 [88:03:27<77:16:03, 37.33s/it][2023-08-26 15:58:34,034] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 57%|█████▋ | 9834/17285 [88:03:56<72:20:28, 34.95s/it] 57%|█████▋ | 9835/17285 [88:04:30<71:42:04, 34.65s/it][2023-08-26 15:59:34,276] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 57%|█████▋ | 9836/17285 [88:04:57<66:30:45, 32.14s/it] 57%|█████▋ | 9837/17285 [88:05:28<66:11:50, 32.00s/it] 57%|█████▋ | 9838/17285 [88:05:55<62:42:54, 30.32s/it] 57%|█████▋ | 9839/17285 [88:06:23<61:43:42, 29.84s/it] 57%|█████▋ | 9840/17285 [88:06:54<62:11:51, 30.08s/it] {'loss': 1.4153, 'learning_rate': 8.748186227269857e-05, 'epoch': 1.71} + 57%|█████▋ | 9840/17285 [88:06:54<62:11:51, 30.08s/it] 57%|█████▋ | 9841/17285 [88:07:29<65:32:14, 31.69s/it] 57%|█████▋ | 9842/17285 [88:08:03<66:47:39, 32.31s/it] 57%|█████▋ | 9843/17285 [88:08:31<63:50:22, 30.88s/it] 57%|█████▋ | 9844/17285 [88:09:06<66:33:25, 32.20s/it] 57%|█████▋ | 9845/17285 [88:09:39<66:47:56, 32.32s/it] 57%|█████▋ | 9846/17285 [88:10:05<63:10:53, 30.58s/it] 57%|█████▋ | 9847/17285 [88:10:40<65:53:48, 31.89s/it] 57%|█████▋ | 9848/17285 [88:11:08<63:40:27, 30.82s/it] 57%|█████▋ | 9849/17285 [88:11:33<59:47:06, 28.94s/it] 57%|█████▋ | 9850/17285 [88:12:01<59:20:06, 28.73s/it] {'loss': 1.4083, 'learning_rate': 8.729206310269713e-05, 'epoch': 1.71} + 57%|█████▋ | 9850/17285 [88:12:01<59:20:06, 28.73s/it] 57%|█████▋ | 9851/17285 [88:12:34<61:57:08, 30.00s/it] 57%|█████▋ | 9852/17285 [88:13:12<66:47:35, 32.35s/it] 57%|█████▋ | 9853/17285 [88:13:46<67:57:09, 32.92s/it] 57%|█████▋ | 9854/17285 [88:14:21<69:14:04, 33.54s/it] 57%|█████▋ | 9855/17285 [88:14:56<69:42:25, 33.77s/it] 57%|█████▋ | 9856/17285 [88:15:28<68:43:47, 33.31s/it] 57%|█████▋ | 9857/17285 [88:15:54<64:18:09, 31.16s/it] 57%|█████▋ | 9858/17285 [88:16:26<65:04:27, 31.54s/it] 57%|█████▋ | 9859/17285 [88:17:02<67:31:43, 32.74s/it] 57%|█████▋ | 9860/17285 [88:17:28<63:09:11, 30.62s/it] {'loss': 1.4055, 'learning_rate': 8.710231045148006e-05, 'epoch': 1.71} + 57%|█████▋ | 9860/17285 [88:17:28<63:09:11, 30.62s/it] 57%|█████▋ | 9861/17285 [88:18:00<64:11:42, 31.13s/it] 57%|█████▋ | 9862/17285 [88:18:31<64:16:00, 31.17s/it] 57%|█████▋ | 9863/17285 [88:19:05<66:02:00, 32.03s/it] 57%|█████▋ | 9864/17285 [88:19:40<67:41:44, 32.84s/it] 57%|█████▋ | 9865/17285 [88:20:13<67:36:27, 32.80s/it] 57%|█████▋ | 9866/17285 [88:20:49<69:51:57, 33.90s/it] 57%|█████▋ | 9867/17285 [88:21:20<68:12:56, 33.11s/it] 57%|█████▋ | 9868/17285 [88:21:55<69:03:03, 33.52s/it] 57%|█████▋ | 9869/17285 [88:22:21<64:15:56, 31.20s/it] 57%|█████▋ | 9870/17285 [88:23:00<69:16:12, 33.63s/it] {'loss': 1.42, 'learning_rate': 8.691260501365754e-05, 'epoch': 1.71} + 57%|█████▋ | 9870/17285 [88:23:00<69:16:12, 33.63s/it] 57%|█████▋ | 9871/17285 [88:23:27<65:02:46, 31.58s/it] 57%|█████▋ | 9872/17285 [88:24:02<67:07:08, 32.60s/it] 57%|█████▋ | 9873/17285 [88:24:34<66:44:54, 32.42s/it] 57%|█████▋ | 9874/17285 [88:25:04<65:21:56, 31.75s/it] 57%|█████▋ | 9875/17285 [88:25:30<61:34:01, 29.91s/it] 57%|█████▋ | 9876/17285 [88:26:00<62:11:02, 30.21s/it] 57%|█████▋ | 9877/17285 [88:26:31<62:15:48, 30.26s/it] 57%|█████▋ | 9878/17285 [88:26:59<61:03:31, 29.68s/it] 57%|█████▋ | 9879/17285 [88:27:31<62:23:40, 30.33s/it] 57%|█████▋ | 9880/17285 [88:28:00<61:38:38, 29.97s/it] {'loss': 1.438, 'learning_rate': 8.672294748366692e-05, 'epoch': 1.71} + 57%|█████▋ | 9880/17285 [88:28:00<61:38:38, 29.97s/it] 57%|█████▋ | 9881/17285 [88:28:27<59:59:30, 29.17s/it] 57%|█████▋ | 9882/17285 [88:29:00<61:58:18, 30.14s/it] 57%|█████▋ | 9883/17285 [88:29:32<63:01:25, 30.65s/it] 57%|█████▋ | 9884/17285 [88:29:59<61:08:41, 29.74s/it] 57%|█████▋ | 9885/17285 [88:30:40<68:12:37, 33.18s/it] 57%|█████▋ | 9886/17285 [88:31:05<62:50:15, 30.57s/it] 57%|█████▋ | 9887/17285 [88:31:44<68:10:25, 33.17s/it] 57%|█████▋ | 9888/17285 [88:32:15<66:57:18, 32.59s/it] 57%|█████▋ | 9889/17285 [88:33:01<75:10:54, 36.59s/it] 57%|█████▋ | 9890/17285 [88:33:35<73:05:57, 35.59s/it] {'loss': 1.4077, 'learning_rate': 8.653333855577024e-05, 'epoch': 1.72} + 57%|█████▋ | 9890/17285 [88:33:35<73:05:57, 35.59s/it] 57%|█████▋ | 9891/17285 [88:34:01<67:13:35, 32.73s/it] 57%|█████▋ | 9892/17285 [88:34:39<70:27:39, 34.31s/it] 57%|█████▋ | 9893/17285 [88:35:19<74:03:20, 36.07s/it] 57%|█████▋ | 9894/17285 [88:35:44<67:35:04, 32.92s/it] 57%|█████▋ | 9895/17285 [88:36:22<70:27:51, 34.33s/it] 57%|█████▋ | 9896/17285 [88:37:03<74:23:11, 36.24s/it] 57%|█████▋ | 9897/17285 [88:37:36<72:39:01, 35.40s/it] 57%|█████▋ | 9898/17285 [88:38:07<69:38:47, 33.94s/it] 57%|█████▋ | 9899/17285 [88:38:40<69:31:07, 33.88s/it] 57%|█████▋ | 9900/17285 [88:39:10<66:56:41, 32.63s/it] {'loss': 1.4448, 'learning_rate': 8.634377892405157e-05, 'epoch': 1.72} + 57%|█████▋ | 9900/17285 [88:39:10<66:56:41, 32.63s/it] 57%|█████▋ | 9901/17285 [88:39:37<63:04:13, 30.75s/it] 57%|█████▋ | 9902/17285 [88:40:18<69:22:18, 33.83s/it] 57%|█████▋ | 9903/17285 [88:40:53<70:27:30, 34.36s/it] 57%|█████▋ | 9904/17285 [88:41:26<69:25:58, 33.87s/it] 57%|█████▋ | 9905/17285 [88:41:59<69:12:49, 33.76s/it] 57%|█████▋ | 9906/17285 [88:42:31<68:08:00, 33.24s/it] 57%|█████▋ | 9907/17285 [88:43:02<66:40:02, 32.53s/it] 57%|█████▋ | 9908/17285 [88:43:38<68:22:21, 33.37s/it] 57%|█████▋ | 9909/17285 [88:44:08<66:23:12, 32.40s/it] 57%|█████▋ | 9910/17285 [88:44:40<66:20:34, 32.38s/it] {'loss': 1.3921, 'learning_rate': 8.615426928241457e-05, 'epoch': 1.72} + 57%|█████▋ | 9910/17285 [88:44:40<66:20:34, 32.38s/it] 57%|█████▋ | 9911/17285 [88:45:08<63:39:17, 31.08s/it] 57%|█████▋ | 9912/17285 [88:45:35<61:05:38, 29.83s/it] 57%|█████▋ | 9913/17285 [88:46:15<67:15:48, 32.85s/it] 57%|█████▋ | 9914/17285 [88:46:52<69:39:58, 34.02s/it] 57%|█████▋ | 9915/17285 [88:47:19<65:30:10, 32.00s/it] 57%|█████▋ | 9916/17285 [88:47:57<69:09:02, 33.78s/it] 57%|█████▋ | 9917/17285 [88:48:25<65:35:36, 32.05s/it] 57%|█████▋ | 9918/17285 [88:49:01<68:02:48, 33.25s/it] 57%|█████▋ | 9919/17285 [88:49:30<65:22:26, 31.95s/it] 57%|█████▋ | 9920/17285 [88:50:06<67:57:02, 33.21s/it] {'loss': 1.4282, 'learning_rate': 8.596481032457986e-05, 'epoch': 1.72} + 57%|█████▋ | 9920/17285 [88:50:06<67:57:02, 33.21s/it] 57%|█████▋ | 9921/17285 [88:50:36<65:50:37, 32.19s/it] 57%|█████▋ | 9922/17285 [88:51:02<62:11:25, 30.41s/it] 57%|█████▋ | 9923/17285 [88:51:33<62:33:05, 30.59s/it] 57%|█████▋ | 9924/17285 [88:52:04<62:33:57, 30.60s/it] 57%|█████▋ | 9925/17285 [88:52:41<66:43:17, 32.64s/it] 57%|█████▋ | 9926/17285 [88:53:11<65:09:37, 31.88s/it] 57%|█████▋ | 9927/17285 [88:53:39<62:40:55, 30.67s/it] 57%|█████▋ | 9928/17285 [88:54:12<64:13:06, 31.42s/it] 57%|█████▋ | 9929/17285 [88:54:38<60:36:14, 29.66s/it] 57%|█████▋ | 9930/17285 [88:55:04<58:28:37, 28.62s/it] {'loss': 1.4084, 'learning_rate': 8.577540274408256e-05, 'epoch': 1.72} + 57%|█████▋ | 9930/17285 [88:55:04<58:28:37, 28.62s/it] 57%|█████▋ | 9931/17285 [88:55:41<63:37:17, 31.14s/it] 57%|█████▋ | 9932/17285 [88:56:21<69:06:10, 33.83s/it] 57%|█████▋ | 9933/17285 [88:56:52<67:26:42, 33.03s/it] 57%|█████▋ | 9934/17285 [88:57:26<68:09:25, 33.38s/it] 57%|█████▋ | 9935/17285 [88:58:02<69:10:54, 33.88s/it] 57%|█████▋ | 9936/17285 [88:58:36<69:47:48, 34.19s/it] 57%|█████▋ | 9937/17285 [88:59:04<65:50:37, 32.26s/it] 57%|█████▋ | 9938/17285 [88:59:34<64:31:41, 31.62s/it] 58%|█████▊ | 9939/17285 [89:00:03<63:00:30, 30.88s/it] 58%|█████▊ | 9940/17285 [89:00:43<68:14:51, 33.45s/it] {'loss': 1.4007, 'learning_rate': 8.558604723426972e-05, 'epoch': 1.73} + 58%|█████▊ | 9940/17285 [89:00:43<68:14:51, 33.45s/it] 58%|█████▊ | 9941/17285 [89:01:11<65:08:49, 31.93s/it] 58%|█████▊ | 9942/17285 [89:01:45<65:57:16, 32.34s/it] 58%|█████▊ | 9943/17285 [89:02:18<66:32:59, 32.63s/it] 58%|█████▊ | 9944/17285 [89:02:49<65:53:19, 32.31s/it] 58%|█████▊ | 9945/17285 [89:03:19<64:00:24, 31.39s/it] 58%|█████▊ | 9946/17285 [89:03:46<61:46:24, 30.30s/it] 58%|█████▊ | 9947/17285 [89:04:24<66:21:54, 32.56s/it] 58%|█████▊ | 9948/17285 [89:04:55<65:20:29, 32.06s/it] 58%|█████▊ | 9949/17285 [89:05:22<61:55:13, 30.39s/it] 58%|█████▊ | 9950/17285 [89:05:57<64:53:12, 31.85s/it] {'loss': 1.4371, 'learning_rate': 8.539674448829775e-05, 'epoch': 1.73} + 58%|█████▊ | 9950/17285 [89:05:57<64:53:12, 31.85s/it] 58%|█████▊ | 9951/17285 [89:06:33<67:17:23, 33.03s/it] 58%|█████▊ | 9952/17285 [89:06:59<63:25:08, 31.13s/it] 58%|█████▊ | 9953/17285 [89:07:27<61:21:15, 30.12s/it] 58%|█████▊ | 9954/17285 [89:07:59<62:34:48, 30.73s/it] 58%|█████▊ | 9955/17285 [89:08:39<67:52:32, 33.34s/it] 58%|█████▊ | 9956/17285 [89:09:09<65:46:38, 32.31s/it] 58%|█████▊ | 9957/17285 [89:09:45<68:02:04, 33.42s/it] 58%|█████▊ | 9958/17285 [89:10:24<71:41:47, 35.23s/it] 58%|█████▊ | 9959/17285 [89:11:07<76:23:46, 37.54s/it] 58%|█████▊ | 9960/17285 [89:11:39<72:44:39, 35.75s/it] {'loss': 1.4304, 'learning_rate': 8.520749519912991e-05, 'epoch': 1.73} + 58%|█████▊ | 9960/17285 [89:11:39<72:44:39, 35.75s/it] 58%|█████▊ | 9961/17285 [89:12:10<70:04:25, 34.44s/it] 58%|█████▊ | 9962/17285 [89:12:44<69:32:39, 34.19s/it] 58%|█████▊ | 9963/17285 [89:13:12<65:53:08, 32.39s/it] 58%|█████▊ | 9964/17285 [89:13:42<64:35:16, 31.76s/it] 58%|█████▊ | 9965/17285 [89:14:17<66:10:37, 32.55s/it] 58%|█████▊ | 9966/17285 [89:14:47<64:43:32, 31.84s/it] 58%|█████▊ | 9967/17285 [89:15:21<66:25:22, 32.68s/it] 58%|█████▊ | 9968/17285 [89:15:55<67:10:00, 33.05s/it] 58%|█████▊ | 9969/17285 [89:16:25<65:27:18, 32.21s/it] 58%|█████▊ | 9970/17285 [89:16:56<64:38:31, 31.81s/it] {'loss': 1.3866, 'learning_rate': 8.501830005953381e-05, 'epoch': 1.73} + 58%|█████▊ | 9970/17285 [89:16:56<64:38:31, 31.81s/it] 58%|█████▊ | 9971/17285 [89:17:31<66:28:08, 32.72s/it] 58%|█████▊ | 9972/17285 [89:17:58<62:42:10, 30.87s/it] 58%|█████▊ | 9973/17285 [89:18:28<62:07:46, 30.59s/it] 58%|█████▊ | 9974/17285 [89:19:07<67:17:41, 33.14s/it] 58%|█████▊ | 9975/17285 [89:19:54<75:52:55, 37.37s/it] 58%|█████▊ | 9976/17285 [89:20:22<69:51:24, 34.41s/it] 58%|█████▊ | 9977/17285 [89:20:49<65:42:11, 32.37s/it] 58%|█████▊ | 9978/17285 [89:21:23<66:48:33, 32.92s/it] 58%|█████▊ | 9979/17285 [89:21:54<65:12:04, 32.13s/it] 58%|█████▊ | 9980/17285 [89:22:31<68:12:13, 33.61s/it] {'loss': 1.4156, 'learning_rate': 8.482915976207883e-05, 'epoch': 1.73} + 58%|█████▊ | 9980/17285 [89:22:31<68:12:13, 33.61s/it] 58%|█████▊ | 9981/17285 [89:23:13<73:37:09, 36.29s/it] 58%|█████▊ | 9982/17285 [89:23:46<71:18:10, 35.15s/it] 58%|█████▊ | 9983/17285 [89:24:18<69:19:58, 34.18s/it] 58%|█████▊ | 9984/17285 [89:24:43<64:03:17, 31.58s/it] 58%|█████▊ | 9985/17285 [89:25:09<60:45:11, 29.96s/it] 58%|█████▊ | 9986/17285 [89:25:36<58:51:53, 29.03s/it] 58%|█████▊ | 9987/17285 [89:26:04<57:50:26, 28.53s/it] 58%|█████▊ | 9988/17285 [89:26:38<61:33:30, 30.37s/it] 58%|█████▊ | 9989/17285 [89:27:07<60:22:44, 29.79s/it] 58%|█████▊ | 9990/17285 [89:27:33<58:01:29, 28.63s/it] {'loss': 1.4735, 'learning_rate': 8.464007499913359e-05, 'epoch': 1.73} + 58%|█████▊ | 9990/17285 [89:27:33<58:01:29, 28.63s/it] 58%|█████▊ | 9991/17285 [89:28:09<62:31:39, 30.86s/it] 58%|█████▊ | 9992/17285 [89:28:46<66:20:06, 32.74s/it] 58%|█████▊ | 9993/17285 [89:29:23<68:44:31, 33.94s/it] 58%|█████▊ | 9994/17285 [89:29:52<65:49:43, 32.50s/it] 58%|█████▊ | 9995/17285 [89:30:36<73:03:17, 36.08s/it] 58%|█████▊ | 9996/17285 [89:31:18<76:18:06, 37.69s/it] 58%|█████▊ | 9997/17285 [89:31:55<75:58:47, 37.53s/it] 58%|█████▊ | 9998/17285 [89:32:24<71:12:49, 35.18s/it] 58%|█████▊ | 9999/17285 [89:33:01<72:00:16, 35.58s/it] 58%|█████▊ | 10000/17285 [89:33:31<68:44:28, 33.97s/it] {'loss': 1.3907, 'learning_rate': 8.445104646286339e-05, 'epoch': 1.74} + 58%|█████▊ | 10000/17285 [89:33:31<68:44:28, 33.97s/it][INFO|trainer.py:3081] 2023-08-26 17:28:08,828 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-26 17:28:08,828 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-26 17:28:08,828 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-7000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-10000 +[INFO|tokenization_utils_base.py:2210] 2023-08-26 17:29:34,104 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-10000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-26 17:29:34,108 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-10000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-10000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-10000 + 58%|█████▊ | 10001/17285 [89:35:31<120:53:12, 59.75s/it] 58%|█████▊ | 10002/17285 [89:36:02<103:10:37, 51.00s/it] 58%|█████▊ | 10003/17285 [89:36:27<87:37:54, 43.32s/it] 58%|█████▊ | 10004/17285 [89:36:59<80:29:11, 39.80s/it] 58%|█████▊ | 10005/17285 [89:37:24<71:40:02, 35.44s/it] 58%|█████▊ | 10006/17285 [89:37:56<69:56:40, 34.59s/it][2023-08-26 17:33:10,551] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 58%|█████▊ | 10007/17285 [89:38:33<71:02:06, 35.14s/it] 58%|█████▊ | 10008/17285 [89:39:03<68:06:20, 33.69s/it] 58%|█████▊ | 10009/17285 [89:39:44<72:29:53, 35.87s/it] 58%|█████▊ | 10010/17285 [89:40:15<69:18:19, 34.30s/it] {'loss': 1.3785, 'learning_rate': 8.428096942593624e-05, 'epoch': 1.74} + 58%|█████▊ | 10010/17285 [89:40:15<69:18:19, 34.30s/it] 58%|█████▊ | 10011/17285 [89:40:45<66:42:23, 33.01s/it] 58%|█████▊ | 10012/17285 [89:41:21<68:51:16, 34.08s/it] 58%|█████▊ | 10013/17285 [89:41:49<65:11:37, 32.27s/it] 58%|█████▊ | 10014/17285 [89:42:25<66:57:46, 33.15s/it] 58%|█████▊ | 10015/17285 [89:42:55<65:12:45, 32.29s/it] 58%|█████▊ | 10016/17285 [89:43:28<65:51:47, 32.62s/it] 58%|█████▊ | 10017/17285 [89:44:04<67:54:58, 33.64s/it] 58%|█████▊ | 10018/17285 [89:44:32<64:23:06, 31.90s/it] 58%|█████▊ | 10019/17285 [89:45:01<62:25:46, 30.93s/it] 58%|█████▊ | 10020/17285 [89:45:27<59:41:45, 29.58s/it] {'loss': 1.4134, 'learning_rate': 8.409204962652496e-05, 'epoch': 1.74} + 58%|█████▊ | 10020/17285 [89:45:27<59:41:45, 29.58s/it] 58%|█████▊ | 10021/17285 [89:45:59<60:53:41, 30.18s/it] 58%|█████▊ | 10022/17285 [89:46:36<64:58:09, 32.20s/it] 58%|█████▊ | 10023/17285 [89:47:07<64:19:03, 31.88s/it] 58%|█████▊ | 10024/17285 [89:47:35<61:54:28, 30.69s/it] 58%|█████▊ | 10025/17285 [89:48:08<63:32:21, 31.51s/it] 58%|█████▊ | 10026/17285 [89:48:43<65:26:42, 32.46s/it][2023-08-26 17:43:57,691] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 58%|█████▊ | 10027/17285 [89:49:20<68:15:44, 33.86s/it] 58%|█████▊ | 10028/17285 [89:49:53<67:41:44, 33.58s/it] 58%|█████▊ | 10029/17285 [89:50:32<70:48:24, 35.13s/it] 58%|█████▊ | 10030/17285 [89:51:06<70:24:35, 34.94s/it] {'loss': 1.3956, 'learning_rate': 8.392207157637791e-05, 'epoch': 1.74} + 58%|█████▊ | 10030/17285 [89:51:06<70:24:35, 34.94s/it] 58%|█████▊ | 10031/17285 [89:51:38<68:22:23, 33.93s/it] 58%|█████▊ | 10032/17285 [89:52:09<66:29:38, 33.00s/it] 58%|█████▊ | 10033/17285 [89:52:37<63:34:29, 31.56s/it] 58%|█████▊ | 10034/17285 [89:53:07<62:44:37, 31.15s/it] 58%|█████▊ | 10035/17285 [89:53:35<60:58:29, 30.28s/it] 58%|█████▊ | 10036/17285 [89:54:10<63:28:52, 31.53s/it] 58%|█████▊ | 10037/17285 [89:54:51<69:21:02, 34.45s/it] 58%|█████▊ | 10038/17285 [89:55:26<69:26:23, 34.49s/it] 58%|█████▊ | 10039/17285 [89:55:58<68:19:33, 33.95s/it] 58%|█████▊ | 10040/17285 [89:56:31<67:34:12, 33.58s/it] {'loss': 1.3801, 'learning_rate': 8.373326301036039e-05, 'epoch': 1.74} + 58%|█████▊ | 10040/17285 [89:56:31<67:34:12, 33.58s/it] 58%|█████▊ | 10041/17285 [89:57:04<67:02:48, 33.32s/it] 58%|█████▊ | 10042/17285 [89:57:40<69:08:30, 34.37s/it] 58%|█████▊ | 10043/17285 [89:58:19<71:33:33, 35.57s/it] 58%|█████▊ | 10044/17285 [89:58:52<69:57:26, 34.78s/it] 58%|█████▊ | 10045/17285 [89:59:22<67:06:42, 33.37s/it] 58%|█████▊ | 10046/17285 [89:59:57<68:21:46, 34.00s/it] 58%|█████▊ | 10047/17285 [90:00:22<62:58:55, 31.33s/it] 58%|█████▊ | 10048/17285 [90:00:49<60:06:55, 29.90s/it] 58%|█████▊ | 10049/17285 [90:01:28<65:25:25, 32.55s/it] 58%|█████▊ | 10050/17285 [90:01:57<63:24:15, 31.55s/it] {'loss': 1.3976, 'learning_rate': 8.354451399050185e-05, 'epoch': 1.74} + 58%|█████▊ | 10050/17285 [90:01:57<63:24:15, 31.55s/it] 58%|█████▊ | 10051/17285 [90:02:30<64:05:34, 31.90s/it] 58%|█████▊ | 10052/17285 [90:03:13<70:51:48, 35.27s/it] 58%|█████▊ | 10053/17285 [90:03:40<66:18:43, 33.01s/it] 58%|█████▊ | 10054/17285 [90:04:09<63:27:34, 31.59s/it] 58%|█████▊ | 10055/17285 [90:04:46<66:35:50, 33.16s/it] 58%|█████▊ | 10056/17285 [90:05:19<66:25:55, 33.08s/it] 58%|█████▊ | 10057/17285 [90:05:51<66:17:55, 33.02s/it] 58%|█████▊ | 10058/17285 [90:06:27<68:01:29, 33.89s/it] 58%|█████▊ | 10059/17285 [90:07:02<68:42:03, 34.23s/it] 58%|█████▊ | 10060/17285 [90:07:36<68:20:39, 34.05s/it] {'loss': 1.4065, 'learning_rate': 8.335582520773848e-05, 'epoch': 1.75} + 58%|█████▊ | 10060/17285 [90:07:36<68:20:39, 34.05s/it] 58%|█████▊ | 10061/17285 [90:08:02<63:40:50, 31.73s/it] 58%|█████▊ | 10062/17285 [90:08:27<59:11:25, 29.50s/it] 58%|█████▊ | 10063/17285 [90:09:00<61:17:11, 30.55s/it] 58%|█████▊ | 10064/17285 [90:09:35<64:23:27, 32.10s/it] 58%|█████▊ | 10065/17285 [90:10:07<64:06:48, 31.97s/it] 58%|█████▊ | 10066/17285 [90:10:33<60:30:30, 30.17s/it] 58%|█████▊ | 10067/17285 [90:11:09<64:05:32, 31.97s/it] 58%|█████▊ | 10068/17285 [90:11:39<62:52:19, 31.36s/it] 58%|█████▊ | 10069/17285 [90:12:09<62:09:15, 31.01s/it] 58%|█████▊ | 10070/17285 [90:12:40<62:05:02, 30.98s/it] {'loss': 1.4058, 'learning_rate': 8.316719735278616e-05, 'epoch': 1.75} + 58%|█████▊ | 10070/17285 [90:12:40<62:05:02, 30.98s/it] 58%|█████▊ | 10071/17285 [90:13:08<60:22:39, 30.13s/it] 58%|█████▊ | 10072/17285 [90:13:42<62:23:00, 31.14s/it] 58%|█████▊ | 10073/17285 [90:14:15<63:39:39, 31.78s/it] 58%|█████▊ | 10074/17285 [90:14:45<62:26:53, 31.18s/it] 58%|█████▊ | 10075/17285 [90:15:13<60:28:13, 30.19s/it] 58%|█████▊ | 10076/17285 [90:15:42<60:06:17, 30.01s/it] 58%|█████▊ | 10077/17285 [90:16:26<68:22:15, 34.15s/it] 58%|█████▊ | 10078/17285 [90:16:54<64:52:26, 32.41s/it] 58%|█████▊ | 10079/17285 [90:17:30<67:02:46, 33.50s/it] 58%|█████▊ | 10080/17285 [90:18:02<65:54:18, 32.93s/it] {'loss': 1.3986, 'learning_rate': 8.29786311161376e-05, 'epoch': 1.75} + 58%|█████▊ | 10080/17285 [90:18:02<65:54:18, 32.93s/it] 58%|█████▊ | 10081/17285 [90:18:33<64:33:41, 32.26s/it] 58%|█████▊ | 10082/17285 [90:19:03<63:18:19, 31.64s/it] 58%|█████▊ | 10083/17285 [90:19:37<64:27:23, 32.22s/it] 58%|█████▊ | 10084/17285 [90:20:07<63:14:37, 31.62s/it] 58%|█████▊ | 10085/17285 [90:20:46<67:47:30, 33.90s/it] 58%|█████▊ | 10086/17285 [90:21:23<69:32:40, 34.78s/it] 58%|█████▊ | 10087/17285 [90:21:56<68:43:56, 34.38s/it] 58%|█████▊ | 10088/17285 [90:22:24<64:56:14, 32.48s/it] 58%|█████▊ | 10089/17285 [90:22:50<60:45:46, 30.40s/it] 58%|█████▊ | 10090/17285 [90:23:26<64:04:05, 32.06s/it] {'loss': 1.3801, 'learning_rate': 8.279012718806004e-05, 'epoch': 1.75} + 58%|█████▊ | 10090/17285 [90:23:26<64:04:05, 32.06s/it] 58%|█████▊ | 10091/17285 [90:24:01<65:43:49, 32.89s/it] 58%|█████▊ | 10092/17285 [90:24:36<67:29:10, 33.78s/it] 58%|█████▊ | 10093/17285 [90:25:04<63:42:36, 31.89s/it] 58%|█████▊ | 10094/17285 [90:25:41<66:30:10, 33.29s/it] 58%|█████▊ | 10095/17285 [90:26:09<63:48:22, 31.95s/it] 58%|█████▊ | 10096/17285 [90:26:42<64:09:44, 32.13s/it] 58%|█████▊ | 10097/17285 [90:27:13<63:48:21, 31.96s/it] 58%|█████▊ | 10098/17285 [90:27:43<62:11:21, 31.15s/it] 58%|█████▊ | 10099/17285 [90:28:13<61:37:06, 30.87s/it] 58%|█████▊ | 10100/17285 [90:28:43<61:03:23, 30.59s/it] {'loss': 1.4051, 'learning_rate': 8.260168625859259e-05, 'epoch': 1.75} + 58%|█████▊ | 10100/17285 [90:28:43<61:03:23, 30.59s/it] 58%|█████▊ | 10101/17285 [90:29:15<61:54:15, 31.02s/it] 58%|█████▊ | 10102/17285 [90:29:54<66:53:47, 33.53s/it] 58%|█████▊ | 10103/17285 [90:30:28<67:13:49, 33.70s/it] 58%|█████▊ | 10104/17285 [90:30:55<62:57:36, 31.56s/it] 58%|█████▊ | 10105/17285 [90:31:30<65:19:12, 32.75s/it] 58%|█████▊ | 10106/17285 [90:32:11<70:01:29, 35.11s/it] 58%|█████▊ | 10107/17285 [90:32:37<64:14:00, 32.22s/it] 58%|█████▊ | 10108/17285 [90:33:06<62:32:10, 31.37s/it] 58%|█████▊ | 10109/17285 [90:33:38<63:12:12, 31.71s/it] 58%|█████▊ | 10110/17285 [90:34:17<67:32:25, 33.89s/it] {'loss': 1.4178, 'learning_rate': 8.241330901754376e-05, 'epoch': 1.75} + 58%|█████▊ | 10110/17285 [90:34:17<67:32:25, 33.89s/it] 58%|█████▊ | 10111/17285 [90:34:50<66:31:32, 33.38s/it] 59%|█████▊ | 10112/17285 [90:35:23<66:29:26, 33.37s/it] 59%|█████▊ | 10113/17285 [90:35:58<67:34:37, 33.92s/it] 59%|█████▊ | 10114/17285 [90:36:28<64:57:39, 32.61s/it] 59%|█████▊ | 10115/17285 [90:36:58<63:35:37, 31.93s/it] 59%|█████▊ | 10116/17285 [90:37:26<60:59:45, 30.63s/it] 59%|█████▊ | 10117/17285 [90:37:59<62:42:15, 31.49s/it] 59%|█████▊ | 10118/17285 [90:38:31<62:53:53, 31.59s/it] 59%|█████▊ | 10119/17285 [90:39:00<61:22:35, 30.83s/it] 59%|█████▊ | 10120/17285 [90:39:34<63:24:18, 31.86s/it] {'loss': 1.3994, 'learning_rate': 8.222499615448894e-05, 'epoch': 1.76} + 59%|█████▊ | 10120/17285 [90:39:34<63:24:18, 31.86s/it] 59%|█████▊ | 10121/17285 [90:40:09<65:09:26, 32.74s/it] 59%|█████▊ | 10122/17285 [90:40:43<66:00:10, 33.17s/it] 59%|█████▊ | 10123/17285 [90:41:18<66:50:12, 33.60s/it] 59%|█████▊ | 10124/17285 [90:41:52<66:53:46, 33.63s/it] 59%|█████▊ | 10125/17285 [90:42:26<67:12:10, 33.79s/it] 59%|█████▊ | 10126/17285 [90:42:56<64:53:07, 32.63s/it] 59%|█████▊ | 10127/17285 [90:43:30<65:50:23, 33.11s/it] 59%|█████▊ | 10128/17285 [90:44:07<68:05:28, 34.25s/it] 59%|█████▊ | 10129/17285 [90:44:39<66:43:52, 33.57s/it] 59%|█████▊ | 10130/17285 [90:45:15<68:19:52, 34.38s/it] {'loss': 1.3948, 'learning_rate': 8.203674835876778e-05, 'epoch': 1.76} + 59%|█████▊ | 10130/17285 [90:45:15<68:19:52, 34.38s/it] 59%|█████▊ | 10131/17285 [90:45:48<67:18:18, 33.87s/it] 59%|█████▊ | 10132/17285 [90:46:24<68:57:47, 34.71s/it] 59%|█████▊ | 10133/17285 [90:46:52<64:33:03, 32.49s/it] 59%|█████▊ | 10134/17285 [90:47:26<65:43:14, 33.09s/it] 59%|█████▊ | 10135/17285 [90:47:58<65:04:50, 32.77s/it] 59%|█████▊ | 10136/17285 [90:48:31<64:54:12, 32.68s/it] 59%|█████▊ | 10137/17285 [90:48:58<61:30:35, 30.98s/it] 59%|█████▊ | 10138/17285 [90:49:31<62:53:14, 31.68s/it] 59%|█████▊ | 10139/17285 [90:50:10<66:58:50, 33.74s/it] 59%|█████▊ | 10140/17285 [90:50:38<63:37:57, 32.06s/it] {'loss': 1.412, 'learning_rate': 8.184856631948184e-05, 'epoch': 1.76} + 59%|█████▊ | 10140/17285 [90:50:38<63:37:57, 32.06s/it] 59%|█████▊ | 10141/17285 [90:51:14<66:02:47, 33.28s/it] 59%|█████▊ | 10142/17285 [90:51:47<65:47:04, 33.15s/it] 59%|█████▊ | 10143/17285 [90:52:17<63:57:54, 32.24s/it] 59%|█████▊ | 10144/17285 [90:52:54<66:42:43, 33.63s/it] 59%|█████▊ | 10145/17285 [90:53:24<64:48:48, 32.68s/it] 59%|█████▊ | 10146/17285 [90:53:57<65:05:52, 32.83s/it] 59%|█████▊ | 10147/17285 [90:54:26<62:32:36, 31.54s/it] 59%|█████▊ | 10148/17285 [90:54:53<60:01:10, 30.27s/it] 59%|█████▊ | 10149/17285 [90:55:24<60:26:55, 30.50s/it] 59%|█████▊ | 10150/17285 [90:55:50<57:43:26, 29.12s/it] {'loss': 1.4018, 'learning_rate': 8.16604507254919e-05, 'epoch': 1.76} + 59%|█████▊ | 10150/17285 [90:55:50<57:43:26, 29.12s/it] 59%|█████▊ | 10151/17285 [90:56:17<56:38:06, 28.58s/it] 59%|█████▊ | 10152/17285 [90:56:44<55:14:35, 27.88s/it] 59%|█████▊ | 10153/17285 [90:57:14<56:43:22, 28.63s/it] 59%|█████▊ | 10154/17285 [90:57:50<60:59:18, 30.79s/it] 59%|█████▉ | 10155/17285 [90:58:24<63:02:18, 31.83s/it] 59%|█████▉ | 10156/17285 [90:58:53<61:29:18, 31.05s/it] 59%|█████▉ | 10157/17285 [90:59:24<61:05:38, 30.86s/it] 59%|█████▉ | 10158/17285 [90:59:53<60:06:23, 30.36s/it] 59%|█████▉ | 10159/17285 [91:00:28<62:37:44, 31.64s/it] 59%|█████▉ | 10160/17285 [91:00:57<61:21:51, 31.01s/it] {'loss': 1.395, 'learning_rate': 8.147240226541555e-05, 'epoch': 1.76} + 59%|█████▉ | 10160/17285 [91:00:57<61:21:51, 31.01s/it] 59%|█████▉ | 10161/17285 [91:01:31<63:01:32, 31.85s/it] 59%|█████▉ | 10162/17285 [91:02:07<65:44:04, 33.22s/it] 59%|█████▉ | 10163/17285 [91:02:42<66:40:33, 33.70s/it] 59%|█████▉ | 10164/17285 [91:03:08<61:56:05, 31.31s/it] 59%|█████▉ | 10165/17285 [91:03:40<62:36:41, 31.66s/it] 59%|█████▉ | 10166/17285 [91:04:09<60:52:36, 30.78s/it] 59%|█████▉ | 10167/17285 [91:04:45<63:45:50, 32.25s/it] 59%|█████▉ | 10168/17285 [91:05:11<60:25:53, 30.57s/it] 59%|█████▉ | 10169/17285 [91:05:43<61:10:04, 30.94s/it] 59%|█████▉ | 10170/17285 [91:06:18<63:28:38, 32.12s/it] {'loss': 1.4177, 'learning_rate': 8.128442162762465e-05, 'epoch': 1.77} + 59%|█████▉ | 10170/17285 [91:06:18<63:28:38, 32.12s/it] 59%|█████▉ | 10171/17285 [91:06:49<62:36:20, 31.68s/it] 59%|█████▉ | 10172/17285 [91:07:20<62:15:27, 31.51s/it] 59%|█████▉ | 10173/17285 [91:07:50<61:11:45, 30.98s/it] 59%|█████▉ | 10174/17285 [91:08:17<58:59:45, 29.87s/it] 59%|█████▉ | 10175/17285 [91:08:44<57:09:28, 28.94s/it] 59%|█████▉ | 10176/17285 [91:09:17<59:45:52, 30.26s/it] 59%|█████▉ | 10177/17285 [91:09:44<57:47:40, 29.27s/it] 59%|█████▉ | 10178/17285 [91:10:19<60:56:48, 30.87s/it] 59%|█████▉ | 10179/17285 [91:10:46<58:55:21, 29.85s/it] 59%|██��██▉ | 10180/17285 [91:11:24<63:48:57, 32.33s/it] {'loss': 1.4684, 'learning_rate': 8.109650950024272e-05, 'epoch': 1.77} + 59%|█████▉ | 10180/17285 [91:11:24<63:48:57, 32.33s/it] 59%|█████▉ | 10181/17285 [91:12:10<71:55:18, 36.45s/it] 59%|█████▉ | 10182/17285 [91:12:46<71:12:44, 36.09s/it] 59%|█████▉ | 10183/17285 [91:13:12<65:47:38, 33.35s/it] 59%|█████▉ | 10184/17285 [91:13:42<63:35:19, 32.24s/it] 59%|█████▉ | 10185/17285 [91:14:21<67:31:50, 34.24s/it] 59%|█████▉ | 10186/17285 [91:14:57<68:33:45, 34.77s/it] 59%|█████▉ | 10187/17285 [91:15:31<67:55:56, 34.45s/it] 59%|█████▉ | 10188/17285 [91:15:59<64:03:31, 32.49s/it] 59%|█████▉ | 10189/17285 [91:16:31<64:05:22, 32.51s/it] 59%|█████▉ | 10190/17285 [91:17:05<65:01:16, 32.99s/it] {'loss': 1.3745, 'learning_rate': 8.090866657114254e-05, 'epoch': 1.77} + 59%|█████▉ | 10190/17285 [91:17:05<65:01:16, 32.99s/it] 59%|█████▉ | 10191/17285 [91:17:37<63:58:00, 32.46s/it] 59%|█████▉ | 10192/17285 [91:18:07<62:52:52, 31.91s/it] 59%|█████▉ | 10193/17285 [91:18:39<63:02:25, 32.00s/it] 59%|█████▉ | 10194/17285 [91:19:16<65:37:08, 33.31s/it] 59%|█████▉ | 10195/17285 [91:19:43<61:58:04, 31.46s/it] 59%|█████▉ | 10196/17285 [91:20:17<63:34:43, 32.29s/it] 59%|█████▉ | 10197/17285 [91:20:48<62:44:23, 31.87s/it] 59%|█████▉ | 10198/17285 [91:21:19<62:04:38, 31.53s/it] 59%|█████▉ | 10199/17285 [91:21:44<58:26:00, 29.69s/it] 59%|█████▉ | 10200/17285 [91:22:19<61:38:46, 31.32s/it] {'loss': 1.4243, 'learning_rate': 8.072089352794361e-05, 'epoch': 1.77} + 59%|█████▉ | 10200/17285 [91:22:19<61:38:46, 31.32s/it] 59%|█████▉ | 10201/17285 [91:23:01<67:53:24, 34.50s/it] 59%|█████▉ | 10202/17285 [91:23:37<68:28:51, 34.81s/it] 59%|█████▉ | 10203/17285 [91:24:04<64:02:53, 32.56s/it] 59%|█████▉ | 10204/17285 [91:24:29<59:45:43, 30.38s/it] 59%|█████▉ | 10205/17285 [91:25:05<63:04:28, 32.07s/it] 59%|█████▉ | 10206/17285 [91:25:37<62:47:16, 31.93s/it] 59%|█████▉ | 10207/17285 [91:26:10<63:21:49, 32.23s/it] 59%|█████▉ | 10208/17285 [91:26:37<60:25:37, 30.74s/it][2023-08-26 19:21:49,045] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 59%|█████▉ | 10209/17285 [91:27:11<62:28:35, 31.79s/it] 59%|█████▉ | 10210/17285 [91:27:39<59:55:08, 30.49s/it] {'loss': 1.3984, 'learning_rate': 8.055195810962145e-05, 'epoch': 1.77} + 59%|█████▉ | 10210/17285 [91:27:39<59:55:08, 30.49s/it] 59%|█████▉ | 10211/17285 [91:28:13<62:15:09, 31.68s/it] 59%|█████▉ | 10212/17285 [91:28:39<58:53:00, 29.97s/it] 59%|█████▉ | 10213/17285 [91:29:13<61:08:08, 31.12s/it] 59%|█████▉ | 10214/17285 [91:29:49<64:02:03, 32.60s/it] 59%|█████▉ | 10215/17285 [91:30:20<63:17:04, 32.22s/it] 59%|█████▉ | 10216/17285 [91:30:54<63:51:25, 32.52s/it] 59%|█████▉ | 10217/17285 [91:31:32<67:01:54, 34.14s/it] 59%|█████▉ | 10218/17285 [91:32:04<66:02:05, 33.64s/it] 59%|█████▉ | 10219/17285 [91:32:41<68:13:11, 34.76s/it] 59%|█████▉ | 10220/17285 [91:33:19<69:53:51, 35.62s/it] {'loss': 1.4003, 'learning_rate': 8.036431974310813e-05, 'epoch': 1.77} + 59%|█████▉ | 10220/17285 [91:33:19<69:53:51, 35.62s/it] 59%|█████▉ | 10221/17285 [91:33:44<63:50:02, 32.53s/it] 59%|█████▉ | 10222/17285 [91:34:14<62:21:17, 31.78s/it] 59%|█████▉ | 10223/17285 [91:34:45<61:29:01, 31.34s/it] 59%|█████▉ | 10224/17285 [91:35:18<62:26:31, 31.84s/it] 59%|█████▉ | 10225/17285 [91:35:50<62:34:42, 31.91s/it] 59%|█████▉ | 10226/17285 [91:36:17<59:32:57, 30.37s/it] 59%|█████▉ | 10227/17285 [91:36:42<56:28:12, 28.80s/it] 59%|█████▉ | 10228/17285 [91:37:13<57:52:19, 29.52s/it] 59%|█████▉ | 10229/17285 [91:37:48<61:12:21, 31.23s/it] 59%|█████▉ | 10230/17285 [91:38:23<63:22:21, 32.34s/it] {'loss': 1.4197, 'learning_rate': 8.017675325513676e-05, 'epoch': 1.78} + 59%|█████▉ | 10230/17285 [91:38:23<63:22:21, 32.34s/it] 59%|█████▉ | 10231/17285 [91:38:54<62:30:52, 31.90s/it][2023-08-26 19:34:13,625] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 59%|█████▉ | 10232/17285 [91:39:36<68:25:30, 34.93s/it] 59%|█████▉ | 10233/17285 [91:40:05<65:01:53, 33.20s/it] 59%|█████▉ | 10234/17285 [91:40:37<64:02:06, 32.69s/it] 59%|█████▉ | 10235/17285 [91:41:12<65:43:51, 33.56s/it] 59%|█████▉ | 10236/17285 [91:41:44<64:31:47, 32.96s/it] 59%|█████▉ | 10237/17285 [91:42:11<61:15:33, 31.29s/it] 59%|█████▉ | 10238/17285 [91:42:51<66:09:58, 33.80s/it] 59%|█████▉ | 10239/17285 [91:43:30<69:06:31, 35.31s/it] 59%|█████▉ | 10240/17285 [91:44:02<67:30:27, 34.50s/it] {'loss': 1.389, 'learning_rate': 8.000800543960246e-05, 'epoch': 1.78} + 59%|█████▉ | 10240/17285 [91:44:02<67:30:27, 34.50s/it] 59%|█████▉ | 10241/17285 [91:44:33<65:12:02, 33.32s/it] 59%|█████▉ | 10242/17285 [91:45:00<61:23:24, 31.38s/it] 59%|█████▉ | 10243/17285 [91:45:30<60:37:20, 30.99s/it] 59%|█████▉ | 10244/17285 [91:46:02<61:21:59, 31.38s/it] 59%|█████▉ | 10245/17285 [91:46:33<61:01:19, 31.20s/it] 59%|█████▉ | 10246/17285 [91:47:03<60:40:08, 31.03s/it] 59%|█████▉ | 10247/17285 [91:47:37<62:03:47, 31.75s/it] 59%|█████▉ | 10248/17285 [91:48:06<60:31:56, 30.97s/it] 59%|█████▉ | 10249/17285 [91:48:34<59:00:43, 30.19s/it] 59%|█████▉ | 10250/17285 [91:49:14<64:40:18, 33.09s/it] {'loss': 1.3928, 'learning_rate': 7.9820577412245e-05, 'epoch': 1.78} + 59%|█████▉ | 10250/17285 [91:49:14<64:40:18, 33.09s/it] 59%|█████▉ | 10251/17285 [91:49:45<63:16:27, 32.38s/it] 59%|█████▉ | 10252/17285 [91:50:19<64:18:03, 32.91s/it] 59%|█████▉ | 10253/17285 [91:50:51<63:25:31, 32.47s/it] 59%|█████▉ | 10254/17285 [91:51:16<59:16:15, 30.35s/it] 59%|█████▉ | 10255/17285 [91:51:50<61:37:09, 31.55s/it] 59%|█████▉ | 10256/17285 [91:52:26<64:01:12, 32.79s/it] 59%|█████▉ | 10257/17285 [91:53:02<66:07:21, 33.87s/it] 59%|█████▉ | 10258/17285 [91:53:38<67:07:13, 34.39s/it] 59%|█████▉ | 10259/17285 [91:54:07<64:01:50, 32.81s/it] 59%|█████▉ | 10260/17285 [91:54:36<61:56:22, 31.74s/it] {'loss': 1.4402, 'learning_rate': 7.96332232538574e-05, 'epoch': 1.78} + 59%|█████▉ | 10260/17285 [91:54:36<61:56:22, 31.74s/it] 59%|█████▉ | 10261/17285 [91:55:16<66:23:46, 34.03s/it] 59%|█████▉ | 10262/17285 [91:55:43<62:15:13, 31.91s/it] 59%|█████▉ | 10263/17285 [91:56:16<62:59:57, 32.30s/it] 59%|█████▉ | 10264/17285 [91:56:44<60:19:32, 30.93s/it] 59%|█████▉ | 10265/17285 [91:57:13<59:22:35, 30.45s/it] 59%|█████▉ | 10266/17285 [91:57:44<59:28:53, 30.51s/it] 59%|█████▉ | 10267/17285 [91:58:19<62:19:40, 31.97s/it] 59%|█████▉ | 10268/17285 [91:58:50<61:39:31, 31.63s/it] 59%|█████▉ | 10269/17285 [91:59:15<57:41:01, 29.60s/it] 59%|█████▉ | 10270/17285 [91:59:51<61:34:29, 31.60s/it] {'loss': 1.4059, 'learning_rate': 7.94459436502699e-05, 'epoch': 1.78} + 59%|█████▉ | 10270/17285 [91:59:51<61:34:29, 31.60s/it] 59%|█████▉ | 10271/17285 [92:00:25<63:01:11, 32.35s/it] 59%|█████▉ | 10272/17285 [92:01:02<65:53:13, 33.82s/it] 59%|█████▉ | 10273/17285 [92:01:28<60:56:57, 31.29s/it] 59%|█████▉ | 10274/17285 [92:02:03<63:10:47, 32.44s/it] 59%|█████▉ | 10275/17285 [92:02:28<58:39:36, 30.13s/it] 59%|█████▉ | 10276/17285 [92:03:01<60:32:51, 31.10s/it] 59%|█████▉ | 10277/17285 [92:03:36<62:54:39, 32.32s/it] 59%|█████▉ | 10278/17285 [92:04:11<64:38:46, 33.21s/it] 59%|█████▉ | 10279/17285 [92:04:49<67:20:49, 34.61s/it] 59%|█████▉ | 10280/17285 [92:05:16<62:38:37, 32.19s/it] {'loss': 1.4186, 'learning_rate': 7.925873928703986e-05, 'epoch': 1.78} + 59%|█████▉ | 10280/17285 [92:05:16<62:38:37, 32.19s/it] 59%|█████▉ | 10281/17285 [92:05:47<61:56:03, 31.83s/it] 59%|█████▉ | 10282/17285 [92:06:23<64:15:55, 33.04s/it] 59%|█████▉ | 10283/17285 [92:06:57<64:58:37, 33.41s/it] 59%|█████▉ | 10284/17285 [92:07:21<59:45:16, 30.73s/it] 60%|█████▉ | 10285/17285 [92:07:56<62:00:44, 31.89s/it] 60%|██���██▉ | 10286/17285 [92:08:35<66:05:29, 33.99s/it] 60%|█████▉ | 10287/17285 [92:09:08<65:36:02, 33.75s/it] 60%|█████▉ | 10288/17285 [92:09:41<65:03:41, 33.47s/it] 60%|█████▉ | 10289/17285 [92:10:14<64:33:17, 33.22s/it] 60%|█████▉ | 10290/17285 [92:10:45<63:42:42, 32.79s/it] {'loss': 1.4009, 'learning_rate': 7.90716108494492e-05, 'epoch': 1.79} + 60%|█████▉ | 10290/17285 [92:10:45<63:42:42, 32.79s/it] 60%|█████▉ | 10291/17285 [92:11:16<62:37:00, 32.23s/it] 60%|█████▉ | 10292/17285 [92:11:42<58:53:06, 30.31s/it] 60%|█████▉ | 10293/17285 [92:12:15<60:06:22, 30.95s/it] 60%|█████▉ | 10294/17285 [92:12:43<58:42:15, 30.23s/it] 60%|█████▉ | 10295/17285 [92:13:13<58:44:57, 30.26s/it] 60%|█████▉ | 10296/17285 [92:13:44<58:42:20, 30.24s/it] 60%|█████▉ | 10297/17285 [92:14:15<59:25:57, 30.62s/it] 60%|█████▉ | 10298/17285 [92:14:40<56:10:30, 28.94s/it] 60%|█████▉ | 10299/17285 [92:15:05<53:46:22, 27.71s/it] 60%|█████▉ | 10300/17285 [92:15:32<53:08:53, 27.39s/it] {'loss': 1.4537, 'learning_rate': 7.888455902250194e-05, 'epoch': 1.79} + 60%|█████▉ | 10300/17285 [92:15:32<53:08:53, 27.39s/it] 60%|█████▉ | 10301/17285 [92:15:58<52:34:05, 27.10s/it] 60%|█████▉ | 10302/17285 [92:16:24<51:44:51, 26.68s/it] 60%|█████▉ | 10303/17285 [92:16:58<55:52:09, 28.81s/it] 60%|█████▉ | 10304/17285 [92:17:35<60:43:18, 31.31s/it] 60%|█████▉ | 10305/17285 [92:18:07<61:16:04, 31.60s/it] 60%|█████▉ | 10306/17285 [92:18:33<57:53:39, 29.86s/it] 60%|█████▉ | 10307/17285 [92:19:05<59:04:23, 30.48s/it] 60%|█████▉ | 10308/17285 [92:19:42<62:58:43, 32.50s/it] 60%|█████▉ | 10309/17285 [92:20:13<61:58:42, 31.98s/it] 60%|█████▉ | 10310/17285 [92:20:46<63:00:32, 32.52s/it] {'loss': 1.4237, 'learning_rate': 7.869758449092155e-05, 'epoch': 1.79} + 60%|█████▉ | 10310/17285 [92:20:46<63:00:32, 32.52s/it] 60%|█████▉ | 10311/17285 [92:21:12<59:09:22, 30.54s/it] 60%|█████▉ | 10312/17285 [92:21:42<58:24:22, 30.15s/it] 60%|█████▉ | 10313/17285 [92:22:17<61:18:51, 31.66s/it] 60%|█████▉ | 10314/17285 [92:22:43<57:58:58, 29.94s/it] 60%|█████▉ | 10315/17285 [92:23:12<57:45:20, 29.83s/it] 60%|█████▉ | 10316/17285 [92:23:47<60:25:18, 31.21s/it] 60%|█████▉ | 10317/17285 [92:24:15<58:28:31, 30.21s/it] 60%|█████▉ | 10318/17285 [92:24:43<57:41:01, 29.81s/it] 60%|█████▉ | 10319/17285 [92:25:23<63:23:49, 32.76s/it] 60%|█████▉ | 10320/17285 [92:25:49<59:34:29, 30.79s/it] {'loss': 1.4158, 'learning_rate': 7.851068793914867e-05, 'epoch': 1.79} + 60%|█████▉ | 10320/17285 [92:25:49<59:34:29, 30.79s/it] 60%|█████▉ | 10321/17285 [92:26:22<60:38:34, 31.35s/it] 60%|█████▉ | 10322/17285 [92:26:53<60:21:00, 31.20s/it] 60%|█████▉ | 10323/17285 [92:27:24<60:21:19, 31.21s/it] 60%|█████▉ | 10324/17285 [92:27:57<61:16:06, 31.69s/it] 60%|█████▉ | 10325/17285 [92:28:31<62:42:24, 32.43s/it] 60%|█████▉ | 10326/17285 [92:29:08<65:12:27, 33.73s/it] 60%|█████▉ | 10327/17285 [92:29:50<70:16:01, 36.36s/it] 60%|█████▉ | 10328/17285 [92:30:16<64:11:43, 33.22s/it] 60%|█████▉ | 10329/17285 [92:30:41<59:14:50, 30.66s/it] 60%|█████▉ | 10330/17285 [92:31:15<61:08:42, 31.65s/it] {'loss': 1.3977, 'learning_rate': 7.832387005133845e-05, 'epoch': 1.79} + 60%|█████▉ | 10330/17285 [92:31:15<61:08:42, 31.65s/it] 60%|█████▉ | 10331/17285 [92:31:49<62:29:36, 32.35s/it] 60%|█████▉ | 10332/17285 [92:32:17<60:19:41, 31.24s/it] 60%|█████▉ | 10333/17285 [92:32:56<64:25:25, 33.36s/it] 60%|█████▉ | 10334/17285 [92:33:31<65:39:28, 34.00s/it] 60%|█████▉ | 10335/17285 [92:34:03<64:30:44, 33.42s/it] 60%|█████▉ | 10336/17285 [92:34:47<70:35:59, 36.58s/it] 60%|█████▉ | 10337/17285 [92:35:13<64:21:21, 33.35s/it] 60%|█████▉ | 10338/17285 [92:35:43<62:24:16, 32.34s/it] 60%|█████▉ | 10339/17285 [92:36:12<60:17:50, 31.25s/it] 60%|█████▉ | 10340/17285 [92:36:40<58:42:04, 30.43s/it] {'loss': 1.4253, 'learning_rate': 7.813713151135805e-05, 'epoch': 1.79} + 60%|█���███▉ | 10340/17285 [92:36:40<58:42:04, 30.43s/it] 60%|█████▉ | 10341/17285 [92:37:07<56:18:43, 29.19s/it] 60%|█████▉ | 10342/17285 [92:37:35<55:47:56, 28.93s/it] 60%|█████▉ | 10343/17285 [92:38:05<56:29:28, 29.30s/it] 60%|█████▉ | 10344/17285 [92:38:39<59:15:23, 30.73s/it] 60%|█████▉ | 10345/17285 [92:39:13<60:58:23, 31.63s/it] 60%|█████▉ | 10346/17285 [92:39:52<65:12:52, 33.83s/it] 60%|█████▉ | 10347/17285 [92:40:19<61:34:58, 31.95s/it] 60%|█████▉ | 10348/17285 [92:40:50<60:47:03, 31.54s/it] 60%|█████▉ | 10349/17285 [92:41:16<57:39:47, 29.93s/it] 60%|█████▉ | 10350/17285 [92:41:48<58:31:05, 30.38s/it] {'loss': 1.3966, 'learning_rate': 7.795047300278422e-05, 'epoch': 1.8} + 60%|█████▉ | 10350/17285 [92:41:48<58:31:05, 30.38s/it] 60%|█████▉ | 10351/17285 [92:42:17<57:41:15, 29.95s/it] 60%|█████▉ | 10352/17285 [92:42:49<58:54:28, 30.59s/it] 60%|█████▉ | 10353/17285 [92:43:23<61:19:42, 31.85s/it] 60%|█████▉ | 10354/17285 [92:43:55<61:20:25, 31.86s/it] 60%|█████▉ | 10355/17285 [92:44:25<60:13:26, 31.29s/it] 60%|█████▉ | 10356/17285 [92:44:58<61:14:38, 31.82s/it] 60%|█████▉ | 10357/17285 [92:45:28<60:15:09, 31.31s/it] 60%|█████▉ | 10358/17285 [92:46:05<63:33:01, 33.03s/it] 60%|█████▉ | 10359/17285 [92:46:39<64:00:28, 33.27s/it] 60%|█████▉ | 10360/17285 [92:47:05<59:51:03, 31.11s/it] {'loss': 1.4211, 'learning_rate': 7.776389520890071e-05, 'epoch': 1.8} + 60%|█████▉ | 10360/17285 [92:47:05<59:51:03, 31.11s/it] 60%|█████▉ | 10361/17285 [92:47:35<58:48:32, 30.58s/it] 60%|█████▉ | 10362/17285 [92:48:06<59:03:58, 30.71s/it] 60%|█████▉ | 10363/17285 [92:48:35<58:08:04, 30.23s/it] 60%|█████▉ | 10364/17285 [92:49:08<59:39:59, 31.04s/it] 60%|█████▉ | 10365/17285 [92:49:46<63:48:24, 33.19s/it] 60%|█████▉ | 10366/17285 [92:50:22<65:31:30, 34.09s/it] 60%|█████▉ | 10367/17285 [92:50:53<63:49:39, 33.21s/it] 60%|█████▉ | 10368/17285 [92:51:21<60:26:54, 31.46s/it] 60%|█████▉ | 10369/17285 [92:52:00<65:08:07, 33.91s/it] 60%|█████▉ | 10370/17285 [92:52:39<67:49:31, 35.31s/it] {'loss': 1.3777, 'learning_rate': 7.757739881269582e-05, 'epoch': 1.8} + 60%|█████▉ | 10370/17285 [92:52:39<67:49:31, 35.31s/it] 60%|██████ | 10371/17285 [92:53:14<67:25:11, 35.10s/it] 60%|██████ | 10372/17285 [92:53:42<63:32:48, 33.09s/it] 60%|██████ | 10373/17285 [92:54:18<64:58:19, 33.84s/it] 60%|██████ | 10374/17285 [92:54:49<63:25:53, 33.04s/it] 60%|██████ | 10375/17285 [92:55:27<66:12:44, 34.50s/it] 60%|██████ | 10376/17285 [92:55:55<62:44:28, 32.69s/it] 60%|██████ | 10377/17285 [92:56:29<63:18:50, 33.00s/it] 60%|██████ | 10378/17285 [92:56:58<60:52:32, 31.73s/it] 60%|██████ | 10379/17285 [92:57:37<65:03:06, 33.91s/it] 60%|██████ | 10380/17285 [92:58:13<66:30:08, 34.67s/it] {'loss': 1.3922, 'learning_rate': 7.739098449685987e-05, 'epoch': 1.8} + 60%|██████ | 10380/17285 [92:58:13<66:30:08, 34.67s/it] 60%|██████ | 10381/17285 [92:58:48<66:26:07, 34.64s/it] 60%|██████ | 10382/17285 [92:59:26<68:27:26, 35.70s/it] 60%|██████ | 10383/17285 [92:59:51<62:21:14, 32.52s/it] 60%|██████ | 10384/17285 [93:00:23<62:12:31, 32.45s/it] 60%|██████ | 10385/17285 [93:00:54<61:30:33, 32.09s/it] 60%|██████ | 10386/17285 [93:01:30<63:34:43, 33.18s/it] 60%|██████ | 10387/17285 [93:02:08<66:09:28, 34.53s/it] 60%|██████ | 10388/17285 [93:02:39<64:08:17, 33.48s/it] 60%|██████ | 10389/17285 [93:03:13<64:18:49, 33.57s/it] 60%|██████ | 10390/17285 [93:03:44<62:49:24, 32.80s/it] {'loss': 1.3884, 'learning_rate': 7.720465294378272e-05, 'epoch': 1.8} + 60%|██████ | 10390/17285 [93:03:44<62:49:24, 32.80s/it] 60%|██████ | 10391/17285 [93:04:13<60:38:27, 31.67s/it] 60%|██████ | 10392/17285 [93:04:46<61:36:09, 32.17s/it] 60%|██████ | 10393/17285 [93:05:16<60:38:19, 31.67s/it] 60%|██████ | 10394/17285 [93:05:42<57:02:46, 29.80s/it] 60%|██████ | 10395/17285 [93:06:14<58:13:15, 30.42s/it] 60%|██████ | 10396/17285 [93:06:45<58:27:20, 30.55s/it] 60%|██████ | 10397/17285 [93:07:20<61:14:55, 32.01s/it] 60%|██████ | 10398/17285 [93:08:03<67:23:40, 35.23s/it] 60%|██████ | 10399/17285 [93:08:42<69:40:17, 36.42s/it] 60%|██████ | 10400/17285 [93:09:14<66:50:39, 34.95s/it] {'loss': 1.4122, 'learning_rate': 7.70184048355513e-05, 'epoch': 1.81} + 60%|██████ | 10400/17285 [93:09:14<66:50:39, 34.95s/it] 60%|██████ | 10401/17285 [93:09:43<63:54:48, 33.42s/it] 60%|██████ | 10402/17285 [93:10:19<65:15:19, 34.13s/it] 60%|██████ | 10403/17285 [93:10:49<62:58:37, 32.94s/it] 60%|██████ | 10404/17285 [93:11:16<59:29:21, 31.12s/it] 60%|██████ | 10405/17285 [93:11:47<59:22:51, 31.07s/it] 60%|██████ | 10406/17285 [93:12:15<57:38:47, 30.17s/it] 60%|██████ | 10407/17285 [93:12:43<56:14:34, 29.44s/it] 60%|██████ | 10408/17285 [93:13:13<56:36:23, 29.63s/it] 60%|██████ | 10409/17285 [93:13:52<61:55:32, 32.42s/it] 60%|██████ | 10410/17285 [93:14:22<60:33:51, 31.71s/it] {'loss': 1.403, 'learning_rate': 7.683224085394702e-05, 'epoch': 1.81} + 60%|██████ | 10410/17285 [93:14:22<60:33:51, 31.71s/it] 60%|██████ | 10411/17285 [93:15:01<64:53:19, 33.98s/it] 60%|██████ | 10412/17285 [93:15:33<63:50:18, 33.44s/it] 60%|██████ | 10413/17285 [93:15:59<59:07:56, 30.98s/it] 60%|██████ | 10414/17285 [93:16:28<57:57:36, 30.37s/it] 60%|██████ | 10415/17285 [93:16:55<55:56:48, 29.32s/it] 60%|██████ | 10416/17285 [93:17:25<56:35:03, 29.66s/it] 60%|██████ | 10417/17285 [93:18:01<60:05:22, 31.50s/it] 60%|██████ | 10418/17285 [93:18:27<56:52:22, 29.82s/it] 60%|██████ | 10419/17285 [93:19:06<62:31:08, 32.78s/it] 60%|██████ | 10420/17285 [93:19:35<59:53:02, 31.40s/it] {'loss': 1.433, 'learning_rate': 7.664616168044339e-05, 'epoch': 1.81} + 60%|██████ | 10420/17285 [93:19:35<59:53:02, 31.40s/it] 60%|██████ | 10421/17285 [93:20:07<60:13:41, 31.59s/it] 60%|██████ | 10422/17285 [93:20:36<59:04:45, 30.99s/it] 60%|██████ | 10423/17285 [93:21:14<63:09:19, 33.13s/it] 60%|██████ | 10424/17285 [93:21:41<59:42:17, 31.33s/it] 60%|██████ | 10425/17285 [93:22:12<59:12:50, 31.07s/it] 60%|██████ | 10426/17285 [93:22:37<55:43:35, 29.25s/it] 60%|██████ | 10427/17285 [93:23:11<58:31:38, 30.72s/it] 60%|██████ | 10428/17285 [93:23:40<57:16:45, 30.07s/it] 60%|██████ | 10429/17285 [93:24:11<58:11:10, 30.55s/it] 60%|██████ | 10430/17285 [93:24:43<58:47:58, 30.88s/it] {'loss': 1.3929, 'learning_rate': 7.646016799620345e-05, 'epoch': 1.81} + 60%|██████ | 10430/17285 [93:24:43<58:47:58, 30.88s/it] 60%|██████ | 10431/17285 [93:25:16<60:17:02, 31.66s/it] 60%|██████ | 10432/17285 [93:25:44<57:43:53, 30.33s/it] 60%|██████ | 10433/17285 [93:26:24<63:41:56, 33.47s/it] 60%|██████ | 10434/17285 [93:27:08<69:13:48, 36.38s/it][2023-08-26 21:22:16,451] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 60%|██████ | 10435/17285 [93:27:39<66:16:07, 34.83s/it][2023-08-26 21:22:50,513] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 60%|██████ | 10436/17285 [93:28:13<65:49:19, 34.60s/it] 60%|██████ | 10437/17285 [93:28:42<62:54:05, 33.07s/it] 60%|██████ | 10438/17285 [93:29:19<64:43:20, 34.03s/it] 60%|██████ | 10439/17285 [93:29:53<64:52:06, 34.11s/it] 60%|██████ | 10440/17285 [93:30:39<71:53:16, 37.81s/it] {'loss': 1.4136, 'learning_rate': 7.631143505862324e-05, 'epoch': 1.81} + 60%|██████ | 10440/17285 [93:30:39<71:53:16, 37.81s/it] 60%|██████ | 10441/17285 [93:31:11<68:35:59, 36.08s/it][2023-08-26 21:26:19,264] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 60%|██████ | 10442/17285 [93:31:42<65:13:26, 34.31s/it] 60%|██████ | 10443/17285 [93:32:20<67:37:44, 35.58s/it] 60%|██████ | 10444/17285 [93:32:58<69:11:42, 36.41s/it] 60%|██████ | 10445/17285 [93:33:33<68:17:05, 35.94s/it] 60%|██████ | 10446/17285 [93:34:02<64:10:25, 33.78s/it] 60%|██████ | 10447/17285 [93:34:35<63:44:15, 33.56s/it] 60%|██████ | 10448/17285 [93:35:22<71:28:38, 37.64s/it] 60%|██████ | 10449/17285 [93:35:51<66:40:26, 35.11s/it] 60%|██████ | 10450/17285 [93:36:22<64:06:34, 33.77s/it] {'loss': 1.3835, 'learning_rate': 7.614417685784577e-05, 'epoch': 1.81} + 60%|██████ | 10450/17285 [93:36:22<64:06:34, 33.77s/it] 60%|██████ | 10451/17285 [93:36:59<65:49:14, 34.67s/it] 60%|██████ | 10452/17285 [93:37:30<64:01:16, 33.73s/it] 60%|██████ | 10453/17285 [93:38:05<64:33:47, 34.02s/it] 60%|██████ | 10454/17285 [93:38:32<60:40:23, 31.98s/it] 60%|██████ | 10455/17285 [93:39:07<61:59:02, 32.67s/it] 60%|██████ | 10456/17285 [93:39:42<63:21:42, 33.40s/it] 60%|██████ | 10457/17285 [93:40:07<58:58:23, 31.09s/it] 61%|██████ | 10458/17285 [93:40:40<59:36:14, 31.43s/it] 61%|██████ | 10459/17285 [93:41:13<60:35:38, 31.96s/it] 61%|██████ | 10460/17285 [93:41:54<65:36:28, 34.61s/it] {'loss': 1.3911, 'learning_rate': 7.595841739456996e-05, 'epoch': 1.82} + 61%|██████ | 10460/17285 [93:41:54<65:36:28, 34.61s/it] 61%|██████ | 10461/17285 [93:42:25<63:54:20, 33.71s/it] 61%|██████ | 10462/17285 [93:42:50<58:56:52, 31.10s/it] 61%|██████ | 10463/17285 [93:43:18<56:52:41, 30.01s/it] 61%|██████ | 10464/17285 [93:43:44<54:53:51, 28.97s/it] 61%|██████ | 10465/17285 [93:44:21<59:34:23, 31.45s/it] 61%|██████ | 10466/17285 [93:44:46<55:29:17, 29.29s/it] 61%|██████ | 10467/17285 [93:45:14<54:53:15, 28.98s/it] 61%|██████ | 10468/17285 [93:45:44<55:33:34, 29.34s/it] 61%|██████ | 10469/17285 [93:46:22<60:18:44, 31.86s/it] 61%|██████ | 10470/17285 [93:46:59<63:17:20, 33.43s/it] {'loss': 1.3981, 'learning_rate': 7.577274593812058e-05, 'epoch': 1.82} + 61%|██████ | 10470/17285 [93:46:59<63:17:20, 33.43s/it] 61%|██████ | 10471/17285 [93:47:36<65:21:50, 34.53s/it] 61%|██████ | 10472/17285 [93:48:02<60:23:28, 31.91s/it] 61%|██████ | 10473/17285 [93:48:37<62:17:22, 32.92s/it] 61%|██████ | 10474/17285 [93:49:09<61:39:50, 32.59s/it] 61%|██████ | 10475/17285 [93:49:34<57:29:11, 30.39s/it] 61%|██████ | 10476/17285 [93:50:16<64:00:15, 33.84s/it] 61%|██████ | 10477/17285 [93:50:49<63:21:56, 33.51s/it] 61%|██████ | 10478/17285 [93:51:15<59:18:00, 31.36s/it] 61%|██████ | 10479/17285 [93:51:51<61:40:09, 32.62s/it] 61%|██████ | 10480/17285 [93:52:30<65:31:25, 34.66s/it] {'loss': 1.4022, 'learning_rate': 7.558716316816814e-05, 'epoch': 1.82} + 61%|██████ | 10480/17285 [93:52:30<65:31:25, 34.66s/it] 61%|██████ | 10481/17285 [93:52:58<61:23:43, 32.48s/it] 61%|██████ | 10482/17285 [93:53:30<61:13:20, 32.40s/it] 61%|██████ | 10483/17285 [93:53:56<57:37:58, 30.50s/it] 61%|██████ | 10484/17285 [93:54:32<60:56:55, 32.26s/it] 61%|██████ | 10485/17285 [93:55:01<58:53:45, 31.18s/it] 61%|██████ | 10486/17285 [93:55:28<56:37:30, 29.98s/it] 61%|██████ | 10487/17285 [93:56:06<61:05:19, 32.35s/it] 61%|██████ | 10488/17285 [93:56:41<62:25:58, 33.07s/it] 61%|██████ | 10489/17285 [93:57:11<60:37:38, 32.12s/it] 61%|██████ | 10490/17285 [93:57:42<60:16:54, 31.94s/it] {'loss': 1.4368, 'learning_rate': 7.54016697640586e-05, 'epoch': 1.82} + 61%|██████ | 10490/17285 [93:57:42<60:16:54, 31.94s/it] 61%|██████ | 10491/17285 [93:58:19<62:58:23, 33.37s/it] 61%|██████ | 10492/17285 [93:58:44<58:12:51, 30.85s/it] 61%|██████ | 10493/17285 [93:59:10<55:40:57, 29.51s/it] 61%|██████ | 10494/17285 [93:59:46<59:06:00, 31.33s/it] 61%|██████ | 10495/17285 [94:00:12<56:04:10, 29.73s/it] 61%|██████ | 10496/17285 [94:00:42<56:15:29, 29.83s/it] 61%|██████ | 10497/17285 [94:01:19<60:18:50, 31.99s/it] 61%|██████ | 10498/17285 [94:01:52<61:07:18, 32.42s/it] 61%|██████ | 10499/17285 [94:02:20<58:09:47, 30.86s/it] 61%|██████ | 10500/17285 [94:02:50<57:50:14, 30.69s/it] {'loss': 1.4221, 'learning_rate': 7.521626640481061e-05, 'epoch': 1.82} + 61%|██████ | 10500/17285 [94:02:50<57:50:14, 30.69s/it] 61%|██████ | 10501/17285 [94:03:24<59:33:09, 31.60s/it] 61%|██████ | 10502/17285 [94:03:54<58:40:54, 31.14s/it] 61%|██████ | 10503/17285 [94:04:34<63:38:44, 33.78s/it] 61%|██████ | 10504/17285 [94:05:10<65:23:17, 34.71s/it] 61%|██████ | 10505/17285 [94:05:37<60:53:07, 32.33s/it] 61%|██████ | 10506/17285 [94:06:10<60:55:57, 32.36s/it] 61%|██████ | 10507/17285 [94:06:38<58:38:33, 31.15s/it] 61%|██████ | 10508/17285 [94:07:15<61:44:53, 32.80s/it] 61%|██████ | 10509/17285 [94:07:49<62:38:30, 33.28s/it] 61%|██████ | 10510/17285 [94:08:22<62:23:57, 33.16s/it] {'loss': 1.4189, 'learning_rate': 7.503095376911342e-05, 'epoch': 1.82} + 61%|██████ | 10510/17285 [94:08:22<62:23:57, 33.16s/it] 61%|██████ | 10511/17285 [94:09:03<66:39:48, 35.43s/it] 61%|██████ | 10512/17285 [94:09:28<60:45:03, 32.29s/it] 61%|██████ | 10513/17285 [94:10:02<62:07:42, 33.03s/it] 61%|██████ | 10514/17285 [94:10:33<60:40:28, 32.26s/it] 61%|██████ | 10515/17285 [94:11:11<63:46:39, 33.91s/it] 61%|██████ | 10516/17285 [94:11:36<58:48:24, 31.28s/it] 61%|██████ | 10517/17285 [94:12:03<56:40:10, 30.14s/it] 61%|██████ | 10518/17285 [94:12:36<58:25:01, 31.08s/it] 61%|██████ | 10519/17285 [94:13:05<56:45:46, 30.20s/it] 61%|██████ | 10520/17285 [94:13:33<55:40:31, 29.63s/it] {'loss': 1.4389, 'learning_rate': 7.484573253532406e-05, 'epoch': 1.83} + 61%|██████ | 10520/17285 [94:13:33<55:40:31, 29.63s/it] 61%|██████ | 10521/17285 [94:14:07<58:21:25, 31.06s/it] 61%|██████ | 10522/17285 [94:14:38<58:06:45, 30.93s/it] 61%|██████ | 10523/17285 [94:15:09<58:24:24, 31.10s/it] 61%|██████ | 10524/17285 [94:15:36<55:44:57, 29.68s/it] 61%|██████ | 10525/17285 [94:16:07<56:27:21, 30.07s/it] 61%|██████ | 10526/17285 [94:16:43<59:58:22, 31.94s/it] 61%|██████ | 10527/17285 [94:17:26<65:51:48, 35.09s/it] 61%|██████ | 10528/17285 [94:17:52<61:12:01, 32.61s/it] 61%|██████ | 10529/17285 [94:18:23<59:50:22, 31.89s/it] 61%|██████ | 10530/17285 [94:18:58<62:05:20, 33.09s/it] {'loss': 1.3616, 'learning_rate': 7.466060338146498e-05, 'epoch': 1.83} + 61%|██████ | 10530/17285 [94:18:58<62:05:20, 33.09s/it] 61%|██████ | 10531/17285 [94:19:26<59:09:33, 31.53s/it] 61%|██████ | 10532/17285 [94:19:57<58:32:48, 31.21s/it] 61%|██████ | 10533/17285 [94:20:27<58:00:03, 30.92s/it] 61%|██████ | 10534/17285 [94:21:02<60:19:39, 32.17s/it] 61%|██████ | 10535/17285 [94:21:34<60:20:54, 32.19s/it] 61%|██████ | 10536/17285 [94:22:11<62:37:35, 33.41s/it] 61%|██████ | 10537/17285 [94:22:42<61:24:47, 32.76s/it] 61%|██████ | 10538/17285 [94:23:10<58:55:12, 31.44s/it] 61%|██████ | 10539/17285 [94:23:39<57:15:36, 30.56s/it] 61%|██████ | 10540/17285 [94:24:13<59:10:59, 31.59s/it] {'loss': 1.4295, 'learning_rate': 7.447556698522156e-05, 'epoch': 1.83} + 61%|██████ | 10540/17285 [94:24:13<59:10:59, 31.59s/it] 61%|██████ | 10541/17285 [94:24:47<60:47:00, 32.45s/it] 61%|██████ | 10542/17285 [94:25:26<64:34:19, 34.47s/it] 61%|██████ | 10543/17285 [94:25:58<62:51:57, 33.57s/it] 61%|██████ | 10544/17285 [94:26:27<60:13:50, 32.17s/it] 61%|██████ | 10545/17285 [94:26:53<56:43:46, 30.30s/it] 61%|██████ | 10546/17285 [94:27:25<57:59:29, 30.98s/it] 61%|██████ | 10547/17285 [94:28:01<60:27:52, 32.31s/it] 61%|██████ | 10548/17285 [94:28:45<67:25:01, 36.03s/it] 61%|██████ | 10549/17285 [94:29:20<66:23:12, 35.48s/it] 61%|██████ | 10550/17285 [94:29:50<63:26:07, 33.91s/it] {'loss': 1.3832, 'learning_rate': 7.429062402393965e-05, 'epoch': 1.83} + 61%|██████ | 10550/17285 [94:29:50<63:26:07, 33.91s/it] 61%|██████ | 10551/17285 [94:30:31<67:23:07, 36.02s/it] 61%|██████ | 10552/17285 [94:30:59<62:48:21, 33.58s/it] 61%|██████ | 10553/17285 [94:31:29<61:03:03, 32.65s/it] 61%|██████ | 10554/17285 [94:32:12<66:48:44, 35.73s/it] 61%|██████ | 10555/17285 [94:32:45<65:14:08, 34.90s/it] 61%|██████ | 10556/17285 [94:33:25<67:56:10, 36.35s/it] 61%|██████ | 10557/17285 [94:33:52<62:49:19, 33.61s/it] 61%|██████ | 10558/17285 [94:34:23<61:38:37, 32.99s/it] 61%|██████ | 10559/17285 [94:34:50<58:15:28, 31.18s/it] 61%|██████ | 10560/17285 [94:35:25<59:56:01, 32.08s/it] {'loss': 1.3762, 'learning_rate': 7.410577517462307e-05, 'epoch': 1.83} + 61%|██████ | 10560/17285 [94:35:25<59:56:01, 32.08s/it] 61%|██████ | 10561/17285 [94:36:04<63:48:27, 34.16s/it] 61%|██████ | 10562/17285 [94:36:35<62:11:24, 33.30s/it] 61%|██████ | 10563/17285 [94:37:10<63:27:17, 33.98s/it] 61%|██████ | 10564/17285 [94:37:46<64:28:23, 34.53s/it] 61%|██████ | 10565/17285 [94:38:19<63:18:47, 33.92s/it] 61%|██████ | 10566/17285 [94:38:49<61:26:12, 32.92s/it] 61%|██████ | 10567/17285 [94:39:23<61:59:27, 33.22s/it] 61%|██████ | 10568/17285 [94:39:59<63:28:51, 34.02s/it] 61%|██████ | 10569/17285 [94:40:34<63:54:58, 34.26s/it] 61%|██████ | 10570/17285 [94:41:10<65:04:47, 34.89s/it] {'loss': 1.4565, 'learning_rate': 7.392102111393116e-05, 'epoch': 1.83} + 61%|██████ | 10570/17285 [94:41:10<65:04:47, 34.89s/it] 61%|██████ | 10571/17285 [94:41:37<60:40:57, 32.54s/it] 61%|██████ | 10572/17285 [94:42:07<58:51:15, 31.56s/it] 61%|██████ | 10573/17285 [94:42:34<56:11:36, 30.14s/it] 61%|██████ | 10574/17285 [94:43:04<56:20:22, 30.22s/it] 61%|██████ | 10575/17285 [94:43:38<58:18:16, 31.28s/it] 61%|██████ | 10576/17285 [94:44:10<59:02:58, 31.69s/it] 61%|██████ | 10577/17285 [94:44:42<59:16:03, 31.81s/it] 61%|██████ | 10578/17285 [94:45:20<62:21:25, 33.47s/it] 61%|██████ | 10579/17285 [94:45:50<60:46:00, 32.62s/it] 61%|██████ | 10580/17285 [94:46:21<59:25:36, 31.91s/it] {'loss': 1.4553, 'learning_rate': 7.373636251817615e-05, 'epoch': 1.84} + 61%|██████ | 10580/17285 [94:46:21<59:25:36, 31.91s/it] 61%|██████ | 10581/17285 [94:46:46<55:33:41, 29.84s/it] 61%|██████ | 10582/17285 [94:47:13<54:00:27, 29.01s/it] 61%|██████ | 10583/17285 [94:47:40<53:12:20, 28.58s/it] 61%|██████ | 10584/17285 [94:48:12<55:10:24, 29.64s/it] 61%|██████ | 10585/17285 [94:48:43<55:36:13, 29.88s/it] 61%|██████ | 10586/17285 [94:49:14<56:19:19, 30.27s/it] 61%|██████ | 10587/17285 [94:49:55<62:09:11, 33.41s/it] 61%|██████▏ | 10588/17285 [94:50:28<62:04:08, 33.37s/it] 61%|██████▏ | 10589/17285 [94:50:59<60:53:05, 32.73s/it] 61%|██████▏ | 10590/17285 [94:51:24<56:38:55, 30.46s/it] {'loss': 1.4171, 'learning_rate': 7.355180006332097e-05, 'epoch': 1.84} + 61%|██████▏ | 10590/17285 [94:51:24<56:38:55, 30.46s/it] 61%|██████▏ | 10591/17285 [94:51:59<58:46:50, 31.61s/it] 61%|██████▏ | 10592/17285 [94:52:29<57:46:45, 31.08s/it] 61%|██████▏ | 10593/17285 [94:52:57<56:13:20, 30.25s/it] 61%|██████▏ | 10594/17285 [94:53:34<59:49:55, 32.19s/it] 61%|██████▏ | 10595/17285 [94:54:14<64:08:26, 34.52s/it] 61%|██████▏ | 10596/17285 [94:54:39<59:00:59, 31.76s/it] 61%|██████▏ | 10597/17285 [94:55:08<57:34:49, 30.99s/it] 61%|██████▏ | 10598/17285 [94:55:36<55:53:31, 30.09s/it] 61%|██████▏ | 10599/17285 [94:56:14<60:06:05, 32.36s/it] 61%|██████▏ | 10600/17285 [94:56:43<58:31:50, 31.52s/it] {'loss': 1.3761, 'learning_rate': 7.336733442497654e-05, 'epoch': 1.84} + 61%|██████▏ | 10600/17285 [94:56:43<58:31:50, 31.52s/it] 61%|██████▏ | 10601/17285 [94:57:08<54:42:32, 29.47s/it] 61%|██████▏ | 10602/17285 [94:57:39<55:35:46, 29.95s/it] 61%|██████▏ | 10603/17285 [94:58:07<54:41:37, 29.47s/it] 61%|██████▏ | 10604/17285 [94:58:37<54:53:18, 29.58s/it] 61%|██████▏ | 10605/17285 [94:59:07<55:11:23, 29.74s/it] 61%|██████▏ | 10606/17285 [94:59:41<57:15:54, 30.87s/it] 61%|██████▏ | 10607/17285 [95:00:07<54:47:59, 29.54s/it] 61%|██████▏ | 10608/17285 [95:00:42<57:48:36, 31.17s/it] 61%|██████▏ | 10609/17285 [95:01:15<58:40:56, 31.64s/it] 61%|██████▏ | 10610/17285 [95:01:44<57:20:30, 30.93s/it] {'loss': 1.3928, 'learning_rate': 7.318296627839935e-05, 'epoch': 1.84} + 61%|██████▏ | 10610/17285 [95:01:44<57:20:30, 30.93s/it] 61%|██████▏ | 10611/17285 [95:02:14<56:30:29, 30.48s/it] 61%|██████▏ | 10612/17285 [95:02:41<54:49:31, 29.58s/it] 61%|██████▏ | 10613/17285 [95:03:16<57:36:07, 31.08s/it] 61%|██████▏ | 10614/17285 [95:03:43<55:14:25, 29.81s/it] 61%|██████▏ | 10615/17285 [95:04:17<57:37:50, 31.11s/it] 61%|██████▏ | 10616/17285 [95:04:55<61:25:29, 33.16s/it] 61%|██████▏ | 10617/17285 [95:05:26<60:37:35, 32.73s/it] 61%|██████▏ | 10618/17285 [95:06:00<60:58:18, 32.92s/it] 61%|██████▏ | 10619/17285 [95:06:31<59:50:37, 32.32s/it] 61%|██████▏ | 10620/17285 [95:07:11<64:28:01, 34.82s/it] {'loss': 1.3833, 'learning_rate': 7.299869629848908e-05, 'epoch': 1.84} + 61%|██████▏ | 10620/17285 [95:07:11<64:28:01, 34.82s/it] 61%|██████▏ | 10621/17285 [95:07:41<61:47:19, 33.38s/it] 61%|██████▏ | 10622/17285 [95:08:09<58:48:04, 31.77s/it] 61%|██████▏ | 10623/17285 [95:08:37<56:13:58, 30.39s/it] 61%|██████▏ | 10624/17285 [95:09:08<57:02:20, 30.83s/it] 61%|██████▏ | 10625/17285 [95:09:39<56:44:30, 30.67s/it] 61%|██████▏ | 10626/17285 [95:10:10<56:58:45, 30.80s/it] 61%|██████▏ | 10627/17285 [95:10:41<57:13:27, 30.94s/it] 61%|██████▏ | 10628/17285 [95:11:09<55:39:30, 30.10s/it] 61%|██████▏ | 10629/17285 [95:11:41<56:41:06, 30.66s/it] 61%|██████▏ | 10630/17285 [95:12:05<52:54:28, 28.62s/it] {'loss': 1.3991, 'learning_rate': 7.281452515978599e-05, 'epoch': 1.84} + 61%|██████▏ | 10630/17285 [95:12:05<52:54:28, 28.62s/it] 62%|██████▏ | 10631/17285 [95:12:34<53:00:06, 28.68s/it] 62%|██████▏ | 10632/17285 [95:13:11<57:28:46, 31.10s/it] 62%|██████▏ | 10633/17285 [95:13:37<54:41:35, 29.60s/it] 62%|██████▏ | 10634/17285 [95:14:05<53:43:54, 29.08s/it] 62%|██████▏ | 10635/17285 [95:14:45<59:47:59, 32.37s/it] 62%|██████▏ | 10636/17285 [95:15:16<58:59:02, 31.94s/it] 62%|██████▏ | 10637/17285 [95:15:40<54:41:31, 29.62s/it] 62%|██████▏ | 10638/17285 [95:16:10<54:52:31, 29.72s/it] 62%|██████▏ | 10639/17285 [95:16:39<54:53:34, 29.73s/it] 62%|██████▏ | 10640/17285 [95:17:13<57:08:55, 30.96s/it] {'loss': 1.4247, 'learning_rate': 7.263045353646861e-05, 'epoch': 1.85} + 62%|██████▏ | 10640/17285 [95:17:13<57:08:55, 30.96s/it] 62%|██████▏ | 10641/17285 [95:17:42<56:00:08, 30.34s/it] 62%|██████▏ | 10642/17285 [95:18:15<57:12:11, 31.00s/it] 62%|██████▏ | 10643/17285 [95:18:41<54:49:41, 29.72s/it] 62%|██████▏ | 10644/17285 [95:19:08<53:18:58, 28.90s/it] 62%|██████▏ | 10645/17285 [95:19:37<53:16:06, 28.88s/it] 62%|██████▏ | 10646/17285 [95:20:09<54:37:42, 29.62s/it][2023-08-26 23:15:17,941] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 62%|██████▏ | 10647/17285 [95:20:40<55:43:37, 30.22s/it] 62%|██████▏ | 10648/17285 [95:21:09<54:59:21, 29.83s/it][2023-08-26 23:16:17,806] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 62%|██████▏ | 10649/17285 [95:21:40<55:36:28, 30.17s/it] 62%|██████▏ | 10650/17285 [95:22:11<55:58:32, 30.37s/it] {'loss': 1.473, 'learning_rate': 7.248326834170777e-05, 'epoch': 1.85} + 62%|██████▏ | 10650/17285 [95:22:11<55:58:32, 30.37s/it] 62%|██████▏ | 10651/17285 [95:22:46<58:32:08, 31.76s/it] 62%|██████▏ | 10652/17285 [95:23:21<60:07:49, 32.64s/it] 62%|██████▏ | 10653/17285 [95:23:58<62:54:02, 34.14s/it] 62%|██████▏ | 10654/17285 [95:24:37<65:21:21, 35.48s/it] 62%|██████▏ | 10655/17285 [95:25:12<65:02:15, 35.31s/it] 62%|██████▏ | 10656/17285 [95:25:48<65:23:39, 35.51s/it] 62%|██████▏ | 10657/17285 [95:26:25<66:16:30, 36.00s/it] 62%|██████▏ | 10658/17285 [95:27:02<66:47:01, 36.28s/it] 62%|██████▏ | 10659/17285 [95:27:40<67:47:13, 36.83s/it] 62%|██████▏ | 10660/17285 [95:28:10<63:50:44, 34.69s/it] {'loss': 1.3753, 'learning_rate': 7.229937754384992e-05, 'epoch': 1.85} + 62%|██████▏ | 10660/17285 [95:28:10<63:50:44, 34.69s/it] 62%|███���██▏ | 10661/17285 [95:28:47<65:27:40, 35.58s/it] 62%|██████▏ | 10662/17285 [95:29:13<59:51:20, 32.54s/it] 62%|██████▏ | 10663/17285 [95:29:44<58:59:27, 32.07s/it] 62%|██████▏ | 10664/17285 [95:30:12<56:43:40, 30.84s/it] 62%|██████▏ | 10665/17285 [95:30:47<59:11:00, 32.18s/it] 62%|██████▏ | 10666/17285 [95:31:18<58:33:37, 31.85s/it] 62%|██████▏ | 10667/17285 [95:31:47<56:58:01, 30.99s/it] 62%|██████▏ | 10668/17285 [95:32:18<57:06:36, 31.07s/it] 62%|██████▏ | 10669/17285 [95:32:45<54:47:53, 29.82s/it] 62%|██████▏ | 10670/17285 [95:33:19<56:46:24, 30.90s/it] {'loss': 1.41, 'learning_rate': 7.211558814713165e-05, 'epoch': 1.85} + 62%|██████▏ | 10670/17285 [95:33:19<56:46:24, 30.90s/it] 62%|██████▏ | 10671/17285 [95:33:54<58:56:15, 32.08s/it] 62%|██████▏ | 10672/17285 [95:34:25<58:28:04, 31.83s/it] 62%|██████▏ | 10673/17285 [95:34:55<57:24:16, 31.25s/it] 62%|██████▏ | 10674/17285 [95:35:21<54:56:30, 29.92s/it] 62%|██████▏ | 10675/17285 [95:35:46<51:46:20, 28.20s/it] 62%|██████▏ | 10676/17285 [95:36:18<54:15:11, 29.55s/it][2023-08-26 23:31:34,068] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 62%|██████▏ | 10677/17285 [95:36:56<58:54:02, 32.09s/it] 62%|██████▏ | 10678/17285 [95:37:37<63:31:38, 34.61s/it] 62%|██████▏ | 10679/17285 [95:38:06<60:44:31, 33.10s/it] 62%|██████▏ | 10680/17285 [95:38:35<58:14:00, 31.74s/it] {'loss': 1.4259, 'learning_rate': 7.195026494412065e-05, 'epoch': 1.85} + 62%|██████▏ | 10680/17285 [95:38:35<58:14:00, 31.74s/it] 62%|██████▏ | 10681/17285 [95:39:04<56:28:15, 30.78s/it] 62%|██████▏ | 10682/17285 [95:39:39<58:51:21, 32.09s/it] 62%|██████▏ | 10683/17285 [95:40:06<56:26:18, 30.78s/it] 62%|██████▏ | 10684/17285 [95:40:45<60:46:07, 33.14s/it] 62%|██████▏ | 10685/17285 [95:41:16<59:27:32, 32.43s/it] 62%|██████▏ | 10686/17285 [95:41:50<60:30:45, 33.01s/it] 62%|██████▏ | 10687/17285 [95:42:19<58:14:05, 31.77s/it] 62%|██████▏ | 10688/17285 [95:42:53<59:37:45, 32.54s/it] 62%|██████▏ | 10689/17285 [95:43:26<59:25:19, 32.43s/it] 62%|██████▏ | 10690/17285 [95:43:59<59:47:08, 32.64s/it] {'loss': 1.4327, 'learning_rate': 7.176667006277049e-05, 'epoch': 1.86} + 62%|██████▏ | 10690/17285 [95:43:59<59:47:08, 32.64s/it] 62%|██████▏ | 10691/17285 [95:44:34<61:03:39, 33.34s/it] 62%|██████▏ | 10692/17285 [95:45:05<59:59:27, 32.76s/it] 62%|██████▏ | 10693/17285 [95:45:41<61:54:17, 33.81s/it] 62%|██████▏ | 10694/17285 [95:46:13<60:48:27, 33.21s/it] 62%|██████▏ | 10695/17285 [95:46:44<59:18:43, 32.40s/it] 62%|██████▏ | 10696/17285 [95:47:12<56:50:46, 31.06s/it] 62%|██████▏ | 10697/17285 [95:47:38<54:01:52, 29.53s/it] 62%|██████▏ | 10698/17285 [95:48:23<62:55:27, 34.39s/it] 62%|██████▏ | 10699/17285 [95:48:55<61:18:07, 33.51s/it] 62%|██████▏ | 10700/17285 [95:49:34<64:31:28, 35.28s/it] {'loss': 1.397, 'learning_rate': 7.158317853259342e-05, 'epoch': 1.86} + 62%|██████▏ | 10700/17285 [95:49:34<64:31:28, 35.28s/it] 62%|██████▏ | 10701/17285 [95:50:09<64:02:16, 35.01s/it] 62%|██████▏ | 10702/17285 [95:50:46<65:20:16, 35.73s/it] 62%|██████▏ | 10703/17285 [95:51:21<65:09:02, 35.63s/it] 62%|██████▏ | 10704/17285 [95:51:58<65:27:04, 35.80s/it] 62%|██████▏ | 10705/17285 [95:52:32<64:52:08, 35.49s/it] 62%|██████▏ | 10706/17285 [95:53:08<64:41:47, 35.40s/it] 62%|██████▏ | 10707/17285 [95:53:38<61:52:19, 33.86s/it] 62%|██████▏ | 10708/17285 [95:54:06<58:45:20, 32.16s/it] 62%|██████▏ | 10709/17285 [95:54:39<59:08:21, 32.38s/it] 62%|██████▏ | 10710/17285 [95:55:11<59:13:41, 32.43s/it] {'loss': 1.3827, 'learning_rate': 7.13997910252802e-05, 'epoch': 1.86} + 62%|██████▏ | 10710/17285 [95:55:11<59:13:41, 32.43s/it] 62%|██████▏ | 10711/17285 [95:55:41<57:45:13, 31.63s/it] 62%|██████▏ | 10712/17285 [95:56:23<63:18:35, 34.67s/it] 62%|██████▏ | 10713/17285 [95:56:50<59:22:41, 32.53s/it] 62%|██████▏ | 10714/17285 [95:57:20<57:39:13, 31.59s/it] 62%|██████▏ | 10715/17285 [95:57:48<55:38:33, 30.49s/it] 62%|██████▏ | 10716/17285 [95:58:25<59:03:21, 32.36s/it] 62%|██████▏ | 10717/17285 [95:58:59<60:05:06, 32.93s/it] 62%|██████▏ | 10718/17285 [95:59:27<57:26:40, 31.49s/it] 62%|██████▏ | 10719/17285 [95:59:57<56:36:24, 31.04s/it] 62%|██████▏ | 10720/17285 [96:00:28<56:22:59, 30.92s/it] {'loss': 1.417, 'learning_rate': 7.121650821214074e-05, 'epoch': 1.86} + 62%|██████▏ | 10720/17285 [96:00:28<56:22:59, 30.92s/it] 62%|██████▏ | 10721/17285 [96:01:03<58:41:34, 32.19s/it] 62%|██████▏ | 10722/17285 [96:01:32<56:57:46, 31.25s/it] 62%|██████▏ | 10723/17285 [96:02:03<56:54:54, 31.22s/it] 62%|██████▏ | 10724/17285 [96:02:46<63:15:44, 34.71s/it] 62%|██████▏ | 10725/17285 [96:03:19<62:21:35, 34.22s/it] 62%|██████▏ | 10726/17285 [96:03:53<62:21:45, 34.23s/it] 62%|██████▏ | 10727/17285 [96:04:20<58:16:19, 31.99s/it] 62%|██████▏ | 10728/17285 [96:04:53<58:50:58, 32.31s/it] 62%|██████▏ | 10729/17285 [96:05:37<65:05:23, 35.74s/it] 62%|██████▏ | 10730/17285 [96:06:14<65:55:14, 36.20s/it] {'loss': 1.4142, 'learning_rate': 7.103333076410166e-05, 'epoch': 1.86} + 62%|██████▏ | 10730/17285 [96:06:14<65:55:14, 36.20s/it] 62%|██████▏ | 10731/17285 [96:06:46<63:52:36, 35.09s/it] 62%|██████▏ | 10732/17285 [96:07:36<71:43:05, 39.40s/it] 62%|██████▏ | 10733/17285 [96:08:08<67:37:51, 37.16s/it] 62%|██████▏ | 10734/17285 [96:08:35<62:25:07, 34.30s/it] 62%|██████▏ | 10735/17285 [96:09:08<61:24:24, 33.75s/it] 62%|██████▏ | 10736/17285 [96:09:47<64:21:44, 35.38s/it] 62%|██████▏ | 10737/17285 [96:10:14<59:59:31, 32.98s/it] 62%|██████▏ | 10738/17285 [96:10:50<61:36:53, 33.88s/it] 62%|██████▏ | 10739/17285 [96:11:20<59:02:13, 32.47s/it] 62%|██████▏ | 10740/17285 [96:11:50<57:46:31, 31.78s/it] {'loss': 1.4047, 'learning_rate': 7.085025935170397e-05, 'epoch': 1.86} + 62%|██████▏ | 10740/17285 [96:11:50<57:46:31, 31.78s/it] 62%|██████▏ | 10741/17285 [96:12:18<55:53:50, 30.75s/it] 62%|██████▏ | 10742/17285 [96:12:48<55:29:20, 30.53s/it] 62%|██████▏ | 10743/17285 [96:13:17<54:26:51, 29.96s/it] 62%|██████▏ | 10744/17285 [96:13:53<57:50:23, 31.83s/it] 62%|██████▏ | 10745/17285 [96:14:22<56:01:39, 30.84s/it] 62%|██████▏ | 10746/17285 [96:14:52<55:58:10, 30.81s/it] 62%|██████▏ | 10747/17285 [96:15:32<60:45:24, 33.45s/it] 62%|██████▏ | 10748/17285 [96:16:07<61:32:53, 33.90s/it] 62%|██████▏ | 10749/17285 [96:16:33<57:16:58, 31.55s/it] 62%|██████▏ | 10750/17285 [96:16:59<54:27:13, 30.00s/it] {'loss': 1.4153, 'learning_rate': 7.066729464510045e-05, 'epoch': 1.87} + 62%|██████▏ | 10750/17285 [96:16:59<54:27:13, 30.00s/it] 62%|██████▏ | 10751/17285 [96:17:30<54:55:41, 30.26s/it] 62%|██████▏ | 10752/17285 [96:18:00<54:46:06, 30.18s/it] 62%|██████▏ | 10753/17285 [96:18:35<57:05:40, 31.47s/it] 62%|██████▏ | 10754/17285 [96:19:00<53:49:51, 29.67s/it] 62%|██████▏ | 10755/17285 [96:19:24<50:47:01, 28.00s/it] 62%|██████▏ | 10756/17285 [96:19:59<54:34:23, 30.09s/it] 62%|██████▏ | 10757/17285 [96:20:28<53:57:28, 29.76s/it] 62%|██████▏ | 10758/17285 [96:21:01<55:22:53, 30.55s/it] 62%|██████▏ | 10759/17285 [96:21:38<59:12:24, 32.66s/it] 62%|██████▏ | 10760/17285 [96:22:10<58:45:24, 32.42s/it] {'loss': 1.3896, 'learning_rate': 7.04844373140533e-05, 'epoch': 1.87} + 62%|██████▏ | 10760/17285 [96:22:10<58:45:24, 32.42s/it] 62%|██████▏ | 10761/17285 [96:22:35<54:46:46, 30.23s/it] 62%|██████▏ | 10762/17285 [96:23:06<54:58:18, 30.34s/it] 62%|██████▏ | 10763/17285 [96:23:39<56:18:48, 31.08s/it] 62%|██████▏ | 10764/17285 [96:24:12<57:44:58, 31.88s/it] 62%|██████▏ | 10765/17285 [96:24:43<57:00:55, 31.48s/it] 62%|██████▏ | 10766/17285 [96:25:20<59:59:10, 33.13s/it] 62%|██████▏ | 10767/17285 [96:25:52<59:42:15, 32.98s/it] 62%|██████▏ | 10768/17285 [96:26:25<59:44:20, 33.00s/it] 62%|██████▏ | 10769/17285 [96:27:07<64:17:31, 35.52s/it] 62%|██████▏ | 10770/17285 [96:27:36<61:03:01, 33.73s/it] {'loss': 1.4255, 'learning_rate': 7.030168802793164e-05, 'epoch': 1.87} + 62%|██████▏ | 10770/17285 [96:27:37<61:03:01, 33.73s/it] 62%|██████▏ | 10771/17285 [96:28:06<58:59:25, 32.60s/it] 62%|██████▏ | 10772/17285 [96:28:46<62:56:04, 34.79s/it] 62%|██████▏ | 10773/17285 [96:29:12<57:48:55, 31.96s/it] 62%|██████▏ | 10774/17285 [96:29:45<58:21:15, 32.26s/it] 62%|██████▏ | 10775/17285 [96:30:20<59:49:18, 33.08s/it] 62%|██████▏ | 10776/17285 [96:30:56<61:43:35, 34.14s/it] 62%|██████▏ | 10777/17285 [96:31:25<59:04:25, 32.68s/it] 62%|██████▏ | 10778/17285 [96:31:55<57:12:16, 31.65s/it] 62%|██████▏ | 10779/17285 [96:32:20<53:47:43, 29.77s/it] 62%|██████▏ | 10780/17285 [96:32:59<58:58:59, 32.64s/it] {'loss': 1.4286, 'learning_rate': 7.011904745570912e-05, 'epoch': 1.87} + 62%|██████▏ | 10780/17285 [96:32:59<58:58:59, 32.64s/it] 62%|██████▏ | 10781/17285 [96:33:32<58:45:50, 32.53s/it] 62%|██████▏ | 10782/17285 [96:33:56<54:17:52, 30.06s/it] 62%|██████▏ | 10783/17285 [96:34:27<54:39:25, 30.26s/it] 62%|██████▏ | 10784/17285 [96:34:57<54:42:18, 30.29s/it] 62%|██████▏ | 10785/17285 [96:35:23<52:18:34, 28.97s/it] 62%|██████▏ | 10786/17285 [96:35:58<55:41:15, 30.85s/it] 62%|██████▏ | 10787/17285 [96:36:40<61:21:37, 33.99s/it] 62%|██████▏ | 10788/17285 [96:37:07<57:46:32, 32.01s/it] 62%|██████▏ | 10789/17285 [96:37:36<56:24:37, 31.26s/it] 62%|██████▏ | 10790/17285 [96:38:12<58:52:36, 32.63s/it] {'loss': 1.4337, 'learning_rate': 6.993651626596138e-05, 'epoch': 1.87} + 62%|██████▏ | 10790/17285 [96:38:12<58:52:36, 32.63s/it] 62%|██████▏ | 10791/17285 [96:38:46<59:22:54, 32.92s/it] 62%|██████▏ | 10792/17285 [96:39:14<56:53:38, 31.54s/it] 62%|██████▏ | 10793/17285 [96:39:55<61:57:44, 34.36s/it] 62%|██████▏ | 10794/17285 [96:40:27<60:36:03, 33.61s/it] 62%|██████▏ | 10795/17285 [96:41:00<60:16:55, 33.44s/it] 62%|██████▏ | 10796/17285 [96:41:28<57:29:10, 31.89s/it] 62%|██████▏ | 10797/17285 [96:42:03<58:45:00, 32.60s/it] 62%|██████▏ | 10798/17285 [96:42:33<57:39:01, 31.99s/it] 62%|██████▏ | 10799/17285 [96:43:16<63:15:01, 35.11s/it] 62%|██████▏ | 10800/17285 [96:43:49<62:25:54, 34.66s/it] {'loss': 1.3943, 'learning_rate': 6.97540951268637e-05, 'epoch': 1.87} + 62%|██████▏ | 10800/17285 [96:43:49<62:25:54, 34.66s/it] 62%|██████▏ | 10801/17285 [96:44:13<56:34:51, 31.41s/it] 62%|██████▏ | 10802/17285 [96:44:55<62:15:41, 34.57s/it] 62%|██████▏ | 10803/17285 [96:45:22<58:20:23, 32.40s/it] 63%|██████▎ | 10804/17285 [96:45:49<55:08:00, 30.62s/it] 63%|██████▎ | 10805/17285 [96:46:23<57:08:49, 31.75s/it] 63%|██████▎ | 10806/17285 [96:46:51<55:14:16, 30.69s/it] 63%|██████▎ | 10807/17285 [96:47:26<57:13:07, 31.80s/it] 63%|██████▎ | 10808/17285 [96:47:57<56:52:55, 31.62s/it] 63%|██████▎ | 10809/17285 [96:48:28<56:33:33, 31.44s/it] 63%|██████▎ | 10810/17285 [96:49:01<57:33:03, 32.00s/it] {'loss': 1.4138, 'learning_rate': 6.95717847061885e-05, 'epoch': 1.88} + 63%|██████▎ | 10810/17285 [96:49:01<57:33:03, 32.00s/it] 63%|██████▎ | 10811/17285 [96:49:35<58:26:34, 32.50s/it] 63%|██████▎ | 10812/17285 [96:50:06<57:55:13, 32.21s/it] 63%|██████▎ | 10813/17285 [96:50:34<55:22:26, 30.80s/it] 63%|██████▎ | 10814/17285 [96:51:12<59:21:48, 33.03s/it] 63%|██████▎ | 10815/17285 [96:51:43<58:08:06, 32.35s/it] 63%|██████▎ | 10816/17285 [96:52:11<55:55:11, 31.12s/it] 63%|██████▎ | 10817/17285 [96:52:48<58:50:39, 32.75s/it] 63%|██████▎ | 10818/17285 [96:53:15<55:52:41, 31.11s/it] 63%|██████▎ | 10819/17285 [96:53:49<57:35:29, 32.06s/it] 63%|██████▎ | 10820/17285 [96:54:19<56:05:53, 31.24s/it] {'loss': 1.3604, 'learning_rate': 6.938958567130285e-05, 'epoch': 1.88} + 63%|██████▎ | 10820/17285 [96:54:19<56:05:53, 31.24s/it] 63%|██████▎ | 10821/17285 [96:54:55<58:57:02, 32.83s/it] 63%|██████▎ | 10822/17285 [96:55:32<60:53:56, 33.92s/it] 63%|██████▎ | 10823/17285 [96:56:14<65:31:10, 36.50s/it] 63%|██████▎ | 10824/17285 [96:56:41<60:24:26, 33.66s/it] 63%|██████▎ | 10825/17285 [96:57:19<62:36:30, 34.89s/it] 63%|██████▎ | 10826/17285 [96:57:54<62:47:39, 35.00s/it] 63%|██████▎ | 10827/17285 [96:58:20<58:01:47, 32.35s/it] 63%|██████▎ | 10828/17285 [96:58:53<57:57:28, 32.31s/it] 63%|██████▎ | 10829/17285 [96:59:21<55:34:44, 30.99s/it] 63%|██████▎ | 10830/17285 [96:59:55<57:28:18, 32.05s/it] {'loss': 1.4096, 'learning_rate': 6.920749868916618e-05, 'epoch': 1.88} + 63%|██████▎ | 10830/17285 [96:59:55<57:28:18, 32.05s/it] 63%|██████▎ | 10831/17285 [97:00:24<56:03:43, 31.27s/it] 63%|██████▎ | 10832/17285 [97:01:00<58:18:14, 32.53s/it] 63%|██████▎ | 10833/17285 [97:01:39<61:53:20, 34.53s/it] 63%|██████▎ | 10834/17285 [97:02:12<61:03:53, 34.08s/it] 63%|██████▎ | 10835/17285 [97:02:39<57:09:57, 31.91s/it] 63%|██████▎ | 10836/17285 [97:03:12<57:59:10, 32.37s/it] 63%|██████▎ | 10837/17285 [97:03:45<58:18:38, 32.56s/it] 63%|██████▎ | 10838/17285 [97:04:18<58:03:45, 32.42s/it] 63%|██████▎ | 10839/17285 [97:04:55<60:52:44, 34.00s/it] 63%|██████▎ | 10840/17285 [97:05:22<56:51:54, 31.76s/it] {'loss': 1.3915, 'learning_rate': 6.902552442632765e-05, 'epoch': 1.88} + 63%|██████▎ | 10840/17285 [97:05:22<56:51:54, 31.76s/it] 63%|██████▎ | 10841/17285 [97:05:51<55:37:36, 31.08s/it] 63%|██████▎ | 10842/17285 [97:06:23<55:50:13, 31.20s/it] 63%|██████▎ | 10843/17285 [97:06:50<53:47:58, 30.07s/it] 63%|██████▎ | 10844/17285 [97:07:26<56:45:07, 31.72s/it] 63%|██████▎ | 10845/17285 [97:07:56<55:57:38, 31.28s/it] 63%|██████▎ | 10846/17285 [97:08:32<58:19:09, 32.61s/it] 63%|██████▎ | 10847/17285 [97:08:58<54:49:16, 30.65s/it] 63%|██████▎ | 10848/17285 [97:09:32<56:38:13, 31.68s/it] 63%|██████▎ | 10849/17285 [97:10:10<59:51:28, 33.48s/it] 63%|██████▎ | 10850/17285 [97:10:41<58:46:11, 32.88s/it] {'loss': 1.3946, 'learning_rate': 6.88436635489238e-05, 'epoch': 1.88} + 63%|██████▎ | 10850/17285 [97:10:41<58:46:11, 32.88s/it] 63%|██████▎ | 10851/17285 [97:11:22<63:20:30, 35.44s/it] 63%|██████▎ | 10852/17285 [97:11:58<63:18:55, 35.43s/it] 63%|██████▎ | 10853/17285 [97:12:30<61:30:07, 34.42s/it] 63%|██████▎ | 10854/17285 [97:12:58<57:57:14, 32.44s/it] 63%|██████▎ | 10855/17285 [97:13:23<53:58:24, 30.22s/it] 63%|██████▎ | 10856/17285 [97:13:56<55:47:04, 31.24s/it] 63%|██████▎ | 10857/17285 [97:14:26<54:52:06, 30.73s/it][2023-08-27 01:09:28,334] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 63%|██████▎ | 10858/17285 [97:14:51<51:38:27, 28.93s/it] 63%|██████▎ | 10859/17285 [97:15:27<55:36:54, 31.16s/it] 63%|██████▎ | 10860/17285 [97:15:56<54:39:23, 30.62s/it] {'loss': 1.3984, 'learning_rate': 6.868008625403449e-05, 'epoch': 1.88} + 63%|██████▎ | 10860/17285 [97:15:56<54:39:23, 30.62s/it] 63%|██████▎ | 10861/17285 [97:16:29<55:53:46, 31.32s/it][2023-08-27 01:11:39,294] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 63%|██████▎ | 10862/17285 [97:17:02<56:23:15, 31.60s/it] 63%|██████▎ | 10863/17285 [97:17:32<55:32:46, 31.14s/it] 63%|██████▎ | 10864/17285 [97:18:06<57:05:19, 32.01s/it] 63%|██████▎ | 10865/17285 [97:18:37<56:33:12, 31.71s/it] 63%|██████▎ | 10866/17285 [97:19:11<58:03:51, 32.56s/it] 63%|██████▎ | 10867/17285 [97:19:39<55:35:55, 31.19s/it] 63%|██████▎ | 10868/17285 [97:20:10<55:14:57, 31.00s/it] 63%|██████▎ | 10869/17285 [97:20:43<56:22:07, 31.63s/it] 63%|██████▎ | 10870/17285 [97:21:24<61:29:50, 34.51s/it] {'loss': 1.4222, 'learning_rate': 6.851660182560898e-05, 'epoch': 1.89} + 63%|██████▎ | 10870/17285 [97:21:24<61:29:50, 34.51s/it] 63%|██████▎ | 10871/17285 [97:22:04<64:33:45, 36.24s/it] 63%|██████▎ | 10872/17285 [97:22:31<59:17:40, 33.29s/it] 63%|██████▎ | 10873/17285 [97:23:04<59:20:32, 33.32s/it] 63%|██████▎ | 10874/17285 [97:23:31<55:43:53, 31.30s/it] 63%|██████▎ | 10875/17285 [97:24:07<58:10:23, 32.67s/it] 63%|██████▎ | 10876/17285 [97:24:40<58:45:44, 33.01s/it] 63%|██████▎ | 10877/17285 [97:25:23<63:54:32, 35.90s/it] 63%|██████▎ | 10878/17285 [97:25:57<62:43:55, 35.25s/it] 63%|██████▎ | 10879/17285 [97:26:26<59:34:17, 33.48s/it] 63%|██████▎ | 10880/17285 [97:26:54<56:45:52, 31.91s/it] {'loss': 1.43, 'learning_rate': 6.833506196772657e-05, 'epoch': 1.89} + 63%|██████▎ | 10880/17285 [97:26:54<56:45:52, 31.91s/it] 63%|██████▎ | 10881/17285 [97:27:25<56:17:23, 31.64s/it] 63%|██████▎ | 10882/17285 [97:27:55<55:15:41, 31.07s/it] 63%|██████▎ | 10883/17285 [97:28:28<56:22:57, 31.71s/it] 63%|██████▎ | 10884/17285 [97:29:04<58:16:20, 32.77s/it] 63%|██████▎ | 10885/17285 [97:29:33<56:22:07, 31.71s/it] 63%|██████▎ | 10886/17285 [97:30:00<53:47:12, 30.26s/it] 63%|██████▎ | 10887/17285 [97:30:34<56:04:40, 31.55s/it] 63%|██████▎ | 10888/17285 [97:31:05<55:46:38, 31.39s/it] 63%|██████▎ | 10889/17285 [97:31:36<55:22:02, 31.16s/it] 63%|██████▎ | 10890/17285 [97:32:07<55:25:26, 31.20s/it] {'loss': 1.3911, 'learning_rate': 6.815363802279173e-05, 'epoch': 1.89} + 63%|██████▎ | 10890/17285 [97:32:07<55:25:26, 31.20s/it] 63%|██████▎ | 10891/17285 [97:32:43<57:38:07, 32.45s/it] 63%|██████▎ | 10892/17285 [97:33:10<55:02:42, 31.00s/it] 63%|██████▎ | 10893/17285 [97:33:40<54:32:10, 30.72s/it] 63%|██████▎ | 10894/17285 [97:34:16<57:21:43, 32.31s/it] 63%|██████▎ | 10895/17285 [97:34:46<56:09:16, 31.64s/it] 63%|██████▎ | 10896/17285 [97:35:28<61:22:01, 34.58s/it] 63%|██████▎ | 10897/17285 [97:35:59<59:31:47, 33.55s/it] 63%|██████▎ | 10898/17285 [97:36:36<61:08:10, 34.46s/it] 63%|██████▎ | 10899/17285 [97:37:02<57:06:38, 32.20s/it] 63%|██████▎ | 10900/17285 [97:37:30<54:46:34, 30.88s/it] {'loss': 1.4225, 'learning_rate': 6.797233065492654e-05, 'epoch': 1.89} + 63%|██████▎ | 10900/17285 [97:37:30<54:46:34, 30.88s/it] 63%|██████▎ | 10901/17285 [97:38:02<55:24:30, 31.25s/it] 63%|██████▎ | 10902/17285 [97:38:31<54:15:39, 30.60s/it] 63%|██████▎ | 10903/17285 [97:39:06<56:27:54, 31.85s/it] 63%|██████▎ | 10904/17285 [97:39:42<58:22:25, 32.93s/it] 63%|██████▎ | 10905/17285 [97:40:13<57:39:43, 32.54s/it] 63%|██████▎ | 10906/17285 [97:40:53<61:20:47, 34.62s/it] 63%|██████▎ | 10907/17285 [97:41:25<60:08:05, 33.94s/it] 63%|██████▎ | 10908/17285 [97:41:55<58:05:56, 32.80s/it] 63%|██████▎ | 10909/17285 [97:42:24<55:57:11, 31.59s/it] 63%|██████▎ | 10910/17285 [97:42:59<57:50:06, 32.66s/it] {'loss': 1.4097, 'learning_rate': 6.779114052782636e-05, 'epoch': 1.89} + 63%|██████▎ | 10910/17285 [97:42:59<57:50:06, 32.66s/it] 63%|██████▎ | 10911/17285 [97:43:37<60:23:13, 34.11s/it] 63%|██████▎ | 10912/17285 [97:44:10<59:51:27, 33.81s/it] 63%|██████▎ | 10913/17285 [97:44:45<60:40:54, 34.28s/it] 63%|██████▎ | 10914/17285 [97:45:23<62:35:08, 35.36s/it] 63%|██████▎ | 10915/17285 [97:45:53<59:40:39, 33.73s/it] 63%|██████▎ | 10916/17285 [97:46:20<55:57:11, 31.63s/it] 63%|██████▎ | 10917/17285 [97:46:54<57:22:08, 32.43s/it] 63%|██████▎ | 10918/17285 [97:47:32<60:06:32, 33.99s/it] 63%|██████▎ | 10919/17285 [97:48:15<64:52:36, 36.69s/it] 63%|██████▎ | 10920/17285 [97:48:43<60:37:51, 34.29s/it] {'loss': 1.3827, 'learning_rate': 6.761006830475733e-05, 'epoch': 1.9} + 63%|██████▎ | 10920/17285 [97:48:43<60:37:51, 34.29s/it] 63%|██████▎ | 10921/17285 [97:49:20<62:02:26, 35.10s/it] 63%|██████▎ | 10922/17285 [97:49:48<58:10:04, 32.91s/it] 63%|██████▎ | 10923/17285 [97:50:19<57:03:29, 32.29s/it] 63%|██████▎ | 10924/17285 [97:50:50<56:38:38, 32.06s/it] 63%|██████▎ | 10925/17285 [97:51:33<61:57:23, 35.07s/it] 63%|██████▎ | 10926/17285 [97:52:04<59:56:08, 33.93s/it] 63%|██████▎ | 10927/17285 [97:52:37<59:42:59, 33.81s/it] 63%|██████▎ | 10928/17285 [97:53:04<56:07:10, 31.78s/it] 63%|██████▎ | 10929/17285 [97:53:38<56:54:31, 32.23s/it] 63%|██████▎ | 10930/17285 [97:54:13<58:23:27, 33.08s/it] {'loss': 1.4585, 'learning_rate': 6.742911464855399e-05, 'epoch': 1.9} + 63%|██████▎ | 10930/17285 [97:54:13<58:23:27, 33.08s/it] 63%|██████▎ | 10931/17285 [97:54:48<59:22:38, 33.64s/it] 63%|██████▎ | 10932/17285 [97:55:23<60:02:24, 34.02s/it] 63%|██████▎ | 10933/17285 [97:55:57<60:20:49, 34.20s/it] 63%|██████▎ | 10934/17285 [97:56:33<60:56:50, 34.55s/it] 63%|██████▎ | 10935/17285 [97:57:00<56:55:59, 32.28s/it] 63%|██████▎ | 10936/17285 [97:57:32<57:11:26, 32.43s/it] 63%|██████▎ | 10937/17285 [97:58:01<55:05:58, 31.25s/it] 63%|██████▎ | 10938/17285 [97:58:33<55:24:18, 31.43s/it] 63%|██████▎ | 10939/17285 [97:59:02<54:18:34, 30.81s/it] 63%|██████▎ | 10940/17285 [97:59:33<54:17:26, 30.80s/it] {'loss': 1.4062, 'learning_rate': 6.724828022161692e-05, 'epoch': 1.9} + 63%|██████▎ | 10940/17285 [97:59:33<54:17:26, 30.80s/it] 63%|██████▎ | 10941/17285 [98:00:10<57:47:32, 32.80s/it] 63%|██████▎ | 10942/17285 [98:00:50<61:20:53, 34.82s/it] 63%|██████▎ | 10943/17285 [98:01:30<64:23:55, 36.56s/it] 63%|██████▎ | 10944/17285 [98:01:56<58:47:32, 33.38s/it] 63%|██████▎ | 10945/17285 [98:02:27<57:10:39, 32.47s/it] 63%|██████▎ | 10946/17285 [98:02:58<56:24:39, 32.04s/it] 63%|██████▎ | 10947/17285 [98:03:32<57:42:08, 32.78s/it] 63%|██████▎ | 10948/17285 [98:03:58<53:47:22, 30.56s/it] 63%|██████▎ | 10949/17285 [98:04:26<52:48:48, 30.01s/it] 63%|██████▎ | 10950/17285 [98:05:05<57:06:54, 32.46s/it] {'loss': 1.4324, 'learning_rate': 6.706756568591013e-05, 'epoch': 1.9} + 63%|██████▎ | 10950/17285 [98:05:05<57:06:54, 32.46s/it] 63%|██████▎ | 10951/17285 [98:05:36<56:23:03, 32.05s/it] 63%|██████▎ | 10952/17285 [98:06:02<53:38:01, 30.49s/it] 63%|██████▎ | 10953/17285 [98:06:32<53:01:26, 30.15s/it] 63%|██████▎ | 10954/17285 [98:07:06<55:04:04, 31.31s/it] 63%|██████▎ | 10955/17285 [98:07:39<56:04:50, 31.89s/it] 63%|██████▎ | 10956/17285 [98:08:12<56:49:59, 32.33s/it] 63%|██████▎ | 10957/17285 [98:08:47<57:51:54, 32.92s/it] 63%|██████▎ | 10958/17285 [98:09:16<55:56:58, 31.83s/it] 63%|██████▎ | 10959/17285 [98:09:46<55:10:05, 31.40s/it] 63%|██████▎ | 10960/17285 [98:10:17<54:33:29, 31.05s/it] {'loss': 1.3791, 'learning_rate': 6.68869717029588e-05, 'epoch': 1.9} + 63%|██████▎ | 10960/17285 [98:10:17<54:33:29, 31.05s/it] 63%|██████▎ | 10961/17285 [98:10:50<55:38:49, 31.68s/it] 63%|██████▎ | 10962/17285 [98:11:23<56:10:00, 31.98s/it] 63%|██████▎ | 10963/17285 [98:11:54<56:06:52, 31.95s/it] 63%|██████▎ | 10964/17285 [98:12:25<55:23:17, 31.55s/it] 63%|██████▎ | 10965/17285 [98:13:07<61:06:00, 34.80s/it] 63%|██████▎ | 10966/17285 [98:13:35<57:28:32, 32.74s/it] 63%|██████▎ | 10967/17285 [98:14:05<55:52:23, 31.84s/it] 63%|██████▎ | 10968/17285 [98:14:35<54:44:38, 31.20s/it] 63%|██████▎ | 10969/17285 [98:15:04<53:47:49, 30.66s/it] 63%|██████▎ | 10970/17285 [98:15:44<58:48:25, 33.52s/it] {'loss': 1.4147, 'learning_rate': 6.670649893384692e-05, 'epoch': 1.9} + 63%|██████▎ | 10970/17285 [98:15:44<58:48:25, 33.52s/it] 63%|██████▎ | 10971/17285 [98:16:23<61:42:01, 35.18s/it] 63%|██████▎ | 10972/17285 [98:16:54<59:17:45, 33.81s/it] 63%|██████▎ | 10973/17285 [98:17:24<57:20:42, 32.71s/it] 63%|██████▎ | 10974/17285 [98:17:56<56:50:27, 32.42s/it] 63%|██████▎ | 10975/17285 [98:18:22<53:37:30, 30.59s/it] 64%|██████▎ | 10976/17285 [98:18:54<54:25:35, 31.06s/it] 64%|██████▎ | 10977/17285 [98:19:25<54:05:07, 30.87s/it] 64%|██████▎ | 10978/17285 [98:19:54<52:59:43, 30.25s/it] 64%|██████▎ | 10979/17285 [98:20:33<57:45:09, 32.97s/it] 64%|██████▎ | 10980/17285 [98:21:04<56:51:28, 32.46s/it] {'loss': 1.4273, 'learning_rate': 6.652614803921461e-05, 'epoch': 1.91} + 64%|██████▎ | 10980/17285 [98:21:04<56:51:28, 32.46s/it] 64%|██████▎ | 10981/17285 [98:21:36<56:15:28, 32.13s/it] 64%|██████▎ | 10982/17285 [98:22:09<56:58:39, 32.54s/it] 64%|██████▎ | 10983/17285 [98:22:41<56:23:09, 32.21s/it] 64%|██████▎ | 10984/17285 [98:23:09<54:28:45, 31.13s/it] 64%|██████▎ | 10985/17285 [98:23:45<57:03:57, 32.61s/it] 64%|██████▎ | 10986/17285 [98:24:17<56:46:44, 32.45s/it] 64%|██████▎ | 10987/17285 [98:24:55<59:18:28, 33.90s/it] 64%|██████▎ | 10988/17285 [98:25:23<56:23:27, 32.24s/it] 64%|██████▎ | 10989/17285 [98:26:01<59:21:23, 33.94s/it] 64%|██████▎ | 10990/17285 [98:26:27<55:30:15, 31.74s/it] {'loss': 1.3976, 'learning_rate': 6.634591967925598e-05, 'epoch': 1.91} + 64%|██████▎ | 10990/17285 [98:26:27<55:30:15, 31.74s/it] 64%|██████▎ | 10991/17285 [98:27:02<56:43:01, 32.44s/it] 64%|██████▎ | 10992/17285 [98:27:32<55:30:42, 31.76s/it] 64%|██████▎ | 10993/17285 [98:28:06<56:48:19, 32.50s/it] 64%|██████▎ | 10994/17285 [98:28:39<56:51:25, 32.54s/it] 64%|██████▎ | 10995/17285 [98:29:10<56:21:25, 32.26s/it] 64%|██████▎ | 10996/17285 [98:29:49<59:41:48, 34.17s/it] 64%|██████▎ | 10997/17285 [98:30:16<56:12:28, 32.18s/it] 64%|██████▎ | 10998/17285 [98:30:53<58:40:30, 33.60s/it] 64%|██████▎ | 10999/17285 [98:31:29<59:54:17, 34.31s/it] 64%|██████▎ | 11000/17285 [98:32:05<60:36:48, 34.72s/it] {'loss': 1.3826, 'learning_rate': 6.616581451371651e-05, 'epoch': 1.91} + 64%|██████▎ | 11000/17285 [98:32:05<60:36:48, 34.72s/it][INFO|trainer.py:3081] 2023-08-27 02:26:42,557 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-27 02:26:42,559 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-27 02:26:42,559 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-8000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-11000 +[INFO|tokenization_utils_base.py:2210] 2023-08-27 02:28:09,016 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-11000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-27 02:28:09,021 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-11000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-11000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-11000 + 64%|██████▎ | 11001/17285 [98:34:12<108:51:09, 62.36s/it] 64%|██████▎ | 11002/17285 [98:34:42<92:07:00, 52.78s/it] 64%|██████▎ | 11003/17285 [98:35:09<78:25:08, 44.94s/it] 64%|██████▎ | 11004/17285 [98:35:43<72:56:49, 41.81s/it] 64%|██████▎ | 11005/17285 [98:36:13<66:22:17, 38.05s/it] 64%|██████▎ | 11006/17285 [98:36:47<64:19:25, 36.88s/it] 64%|██████▎ | 11007/17285 [98:37:17<60:49:18, 34.88s/it] 64%|██████▎ | 11008/17285 [98:37:49<59:13:00, 33.96s/it] 64%|██████▎ | 11009/17285 [98:38:16<55:52:37, 32.05s/it] 64%|██████▎ | 11010/17285 [98:38:48<55:32:29, 31.86s/it] {'loss': 1.4273, 'learning_rate': 6.598583320189075e-05, 'epoch': 1.91} + 64%|██████▎ | 11010/17285 [98:38:48<55:32:29, 31.86s/it] 64%|██████▎ | 11011/17285 [98:39:21<56:25:01, 32.37s/it] 64%|██████▎ | 11012/17285 [98:39:57<58:08:12, 33.36s/it] 64%|██████▎ | 11013/17285 [98:40:24<54:39:59, 31.38s/it] 64%|██████▎ | 11014/17285 [98:40:52<53:17:02, 30.59s/it] 64%|██████▎ | 11015/17285 [98:41:19<51:06:46, 29.35s/it] 64%|██████▎ | 11016/17285 [98:41:46<50:05:44, 28.77s/it] 64%|██████▎ | 11017/17285 [98:42:21<53:07:27, 30.51s/it] 64%|██████▎ | 11018/17285 [98:42:51<52:54:48, 30.40s/it] 64%|██████▎ | 11019/17285 [98:43:20<52:20:09, 30.07s/it] 64%|██████▍ | 11020/17285 [98:43:47<50:46:46, 29.18s/it] {'loss': 1.3904, 'learning_rate': 6.580597640261978e-05, 'epoch': 1.91} + 64%|██████▍ | 11020/17285 [98:43:47<50:46:46, 29.18s/it] 64%|██████▍ | 11021/17285 [98:44:19<51:45:44, 29.75s/it] 64%|██████▍ | 11022/17285 [98:44:48<51:37:22, 29.67s/it] 64%|██████▍ | 11023/17285 [98:45:17<50:59:03, 29.31s/it] 64%|██████▍ | 11024/17285 [98:45:50<53:21:01, 30.68s/it] 64%|██████▍ | 11025/17285 [98:46:18<51:54:09, 29.85s/it] 64%|██████▍ | 11026/17285 [98:46:50<52:42:43, 30.32s/it] 64%|██████▍ | 11027/17285 [98:47:22<53:47:38, 30.95s/it] 64%|██████▍ | 11028/17285 [98:47:48<51:03:41, 29.38s/it] 64%|██████▍ | 11029/17285 [98:48:18<51:42:07, 29.75s/it] 64%|██████▍ | 11030/17285 [98:48:47<51:06:19, 29.41s/it] {'loss': 1.3824, 'learning_rate': 6.562624477428905e-05, 'epoch': 1.91} + 64%|██████▍ | 11030/17285 [98:48:47<51:06:19, 29.41s/it] 64%|██████▍ | 11031/17285 [98:49:24<55:15:38, 31.81s/it] 64%|██████▍ | 11032/17285 [98:50:04<59:28:39, 34.24s/it] 64%|██████▍ | 11033/17285 [98:50:41<60:53:55, 35.07s/it] 64%|██████▍ | 11034/17285 [98:51:23<64:27:36, 37.12s/it] 64%|██████▍ | 11035/17285 [98:51:58<63:02:48, 36.31s/it] 64%|██████▍ | 11036/17285 [98:52:37<64:24:32, 37.11s/it] 64%|██████▍ | 11037/17285 [98:53:04<59:17:23, 34.16s/it] 64%|██████▍ | 11038/17285 [98:53:34<57:20:38, 33.05s/it] 64%|██████▍ | 11039/17285 [98:54:05<56:05:16, 32.33s/it] 64%|██████▍ | 11040/17285 [98:54:39<57:00:31, 32.86s/it] {'loss': 1.3709, 'learning_rate': 6.544663897482568e-05, 'epoch': 1.92} + 64%|██████▍ | 11040/17285 [98:54:39<57:00:31, 32.86s/it] 64%|██████▍ | 11041/17285 [98:55:10<55:56:41, 32.26s/it] 64%|██████▍ | 11042/17285 [98:55:41<55:00:49, 31.72s/it] 64%|██████▍ | 11043/17285 [98:56:19<58:21:10, 33.65s/it] 64%|██████▍ | 11044/17285 [98:56:52<58:22:24, 33.67s/it] 64%|██████▍ | 11045/17285 [98:57:27<59:06:14, 34.10s/it] 64%|██████▍ | 11046/17285 [98:57:53<54:51:45, 31.66s/it][2023-08-27 02:52:55,942] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 64%|██████▍ | 11047/17285 [98:58:18<51:18:19, 29.61s/it] 64%|██████▍ | 11048/17285 [98:58:47<50:53:23, 29.37s/it] 64%|██████▍ | 11049/17285 [98:59:23<54:17:44, 31.34s/it] 64%|██████▍ | 11050/17285 [98:59:50<51:57:26, 30.00s/it] {'loss': 1.4232, 'learning_rate': 6.528510188239592e-05, 'epoch': 1.92} + 64%|██████▍ | 11050/17285 [98:59:50<51:57:26, 30.00s/it] 64%|██████▍ | 11051/17285 [99:00:21<52:20:26, 30.23s/it] 64%|██████▍ | 11052/17285 [99:00:53<53:32:15, 30.92s/it] 64%|██████▍ | 11053/17285 [99:01:25<53:54:40, 31.14s/it] 64%|██████▍ | 11054/17285 [99:02:00<56:08:06, 32.43s/it] 64%|██████▍ | 11055/17285 [99:02:35<57:08:29, 33.02s/it] 64%|██████▍ | 11056/17285 [99:03:08<57:09:55, 33.04s/it] 64%|██████▍ | 11057/17285 [99:03:45<59:33:50, 34.43s/it] 64%|██████▍ | 11058/17285 [99:04:12<55:34:39, 32.13s/it] 64%|██████▍ | 11059/17285 [99:04:41<53:53:12, 31.16s/it] 64%|██████▍ | 11060/17285 [99:05:14<55:00:16, 31.81s/it] {'loss': 1.3842, 'learning_rate': 6.510573696871829e-05, 'epoch': 1.92} + 64%|██████▍ | 11060/17285 [99:05:14<55:00:16, 31.81s/it] 64%|██████▍ | 11061/17285 [99:05:39<51:29:50, 29.79s/it] 64%|██████▍ | 11062/17285 [99:06:21<57:20:46, 33.17s/it] 64%|██████▍ | 11063/17285 [99:06:52<56:33:48, 32.73s/it] 64%|██████▍ | 11064/17285 [99:07:17<52:35:39, 30.44s/it] 64%|██████▍ | 11065/17285 [99:07:46<51:44:45, 29.95s/it] 64%|██████▍ | 11066/17285 [99:08:31<59:22:35, 34.37s/it] 64%|██████▍ | 11067/17285 [99:09:10<61:39:33, 35.70s/it] 64%|██████▍ | 11068/17285 [99:09:38<58:03:05, 33.62s/it] 64%|██████▍ | 11069/17285 [99:10:09<56:17:51, 32.60s/it] 64%|██████▍ | 11070/17285 [99:10:38<54:22:06, 31.49s/it] {'loss': 1.3655, 'learning_rate': 6.492649978928341e-05, 'epoch': 1.92} + 64%|██████▍ | 11070/17285 [99:10:38<54:22:06, 31.49s/it] 64%|██████▍ | 11071/17285 [99:11:05<52:14:51, 30.27s/it] 64%|██████▍ | 11072/17285 [99:11:40<54:34:39, 31.62s/it] 64%|██████▍ | 11073/17285 [99:12:09<53:09:01, 30.80s/it] 64%|██████▍ | 11074/17285 [99:12:36<51:36:47, 29.92s/it] 64%|██████▍ | 11075/17285 [99:13:03<49:47:42, 28.87s/it] 64%|██████▍ | 11076/17285 [99:13:35<51:17:33, 29.74s/it] 64%|██████▍ | 11077/17285 [99:14:13<55:49:05, 32.37s/it] 64%|██████▍ | 11078/17285 [99:14:45<55:32:15, 32.21s/it] 64%|██████▍ | 11079/17285 [99:15:27<60:48:49, 35.28s/it] 64%|██████▍ | 11080/17285 [99:15:59<58:57:54, 34.21s/it] {'loss': 1.3651, 'learning_rate': 6.47473910002085e-05, 'epoch': 1.92} + 64%|██████▍ | 11080/17285 [99:15:59<58:57:54, 34.21s/it] 64%|██████▍ | 11081/17285 [99:16:37<60:59:24, 35.39s/it] 64%|██████▍ | 11082/17285 [99:17:08<58:36:36, 34.02s/it] 64%|██████▍ | 11083/17285 [99:17:40<57:24:08, 33.32s/it] 64%|██████▍ | 11084/17285 [99:18:19<60:24:27, 35.07s/it] 64%|██████▍ | 11085/17285 [99:18:50<58:22:41, 33.90s/it] 64%|██████▍ | 11086/17285 [99:19:23<57:53:29, 33.62s/it] 64%|██████▍ | 11087/17285 [99:19:55<57:06:16, 33.17s/it] 64%|██████▍ | 11088/17285 [99:20:34<60:02:44, 34.88s/it] 64%|██████▍ | 11089/17285 [99:21:09<60:15:10, 35.01s/it] 64%|██████▍ | 11090/17285 [99:21:36<55:53:26, 32.48s/it] {'loss': 1.3925, 'learning_rate': 6.456841125714071e-05, 'epoch': 1.92} + 64%|██████▍ | 11090/17285 [99:21:36<55:53:26, 32.48s/it] 64%|██████▍ | 11091/17285 [99:22:05<54:16:16, 31.54s/it][2023-08-27 03:17:19,774] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 64%|██████▍ | 11092/17285 [99:22:42<56:57:16, 33.11s/it] 64%|██████▍ | 11093/17285 [99:23:06<52:26:31, 30.49s/it] 64%|██████▍ | 11094/17285 [99:23:36<51:44:27, 30.09s/it] 64%|██████▍ | 11095/17285 [99:24:03<50:32:04, 29.39s/it] 64%|██████▍ | 11096/17285 [99:24:37<52:56:12, 30.79s/it] 64%|██████▍ | 11097/17285 [99:25:10<53:47:34, 31.30s/it] 64%|██████▍ | 11098/17285 [99:25:42<54:10:46, 31.53s/it] 64%|██████▍ | 11099/17285 [99:26:12<53:28:42, 31.12s/it] 64%|██████▍ | 11100/17285 [99:26:51<57:26:13, 33.43s/it] {'loss': 1.3946, 'learning_rate': 6.440744036422758e-05, 'epoch': 1.93} + 64%|██████▍ | 11100/17285 [99:26:51<57:26:13, 33.43s/it] 64%|██████▍ | 11101/17285 [99:27:17<53:30:03, 31.15s/it] 64%|██████▍ | 11102/17285 [99:28:03<61:17:52, 35.69s/it] 64%|██████▍ | 11103/17285 [99:28:29<56:15:50, 32.76s/it] 64%|██████▍ | 11104/17285 [99:29:00<55:05:24, 32.09s/it] 64%|██████▍ | 11105/17285 [99:29:38<58:28:41, 34.06s/it] 64%|██████▍ | 11106/17285 [99:30:04<54:07:01, 31.53s/it] 64%|██████▍ | 11107/17285 [99:30:42<57:27:28, 33.48s/it] 64%|██████▍ | 11108/17285 [99:31:14<56:54:16, 33.16s/it] 64%|██████▍ | 11109/17285 [99:31:41<53:22:58, 31.12s/it] 64%|██████▍ | 11110/17285 [99:32:15<54:49:05, 31.96s/it] {'loss': 1.4216, 'learning_rate': 6.422870761318759e-05, 'epoch': 1.93} + 64%|██████▍ | 11110/17285 [99:32:15<54:49:05, 31.96s/it] 64%|██████▍ | 11111/17285 [99:32:52<57:46:46, 33.69s/it] 64%|██████▍ | 11112/17285 [99:33:23<56:25:25, 32.91s/it] 64%|██████▍ | 11113/17285 [99:33:59<57:37:51, 33.61s/it] 64%|██████▍ | 11114/17285 [99:34:30<56:37:55, 33.04s/it] 64%|██████▍ | 11115/17285 [99:34:58<53:44:28, 31.36s/it] 64%|██████▍ | 11116/17285 [99:35:30<54:13:10, 31.64s/it] 64%|██████▍ | 11117/17285 [99:36:05<56:00:00, 32.68s/it] 64%|██████▍ | 11118/17285 [99:36:38<55:50:00, 32.59s/it] 64%|██████▍ | 11119/17285 [99:37:06<53:31:52, 31.25s/it] 64%|██████▍ | 11120/17285 [99:37:37<53:18:41, 31.13s/it] {'loss': 1.3654, 'learning_rate': 6.405010580685171e-05, 'epoch': 1.93} + 64%|██████▍ | 11120/17285 [99:37:37<53:18:41, 31.13s/it] 64%|██████▍ | 11121/17285 [99:38:08<53:15:40, 31.11s/it] 64%|██████▍ | 11122/17285 [99:38:34<50:56:41, 29.76s/it] 64%|██████▍ | 11123/17285 [99:39:10<54:11:21, 31.66s/it] 64%|██████▍ | 11124/17285 [99:39:49<57:38:47, 33.68s/it] 64%|██████▍ | 11125/17285 [99:40:16<54:15:11, 31.71s/it] 64%|██████▍ | 11126/17285 [99:40:42<51:33:26, 30.14s/it] 64%|██████▍ | 11127/17285 [99:41:18<54:34:20, 31.90s/it] 64%|██████▍ | 11128/17285 [99:41:50<54:14:16, 31.71s/it] 64%|██████▍ | 11129/17285 [99:42:32<59:33:41, 34.83s/it] 64%|██████▍ | 11130/17285 [99:43:03<57:39:16, 33.72s/it] {'loss': 1.3438, 'learning_rate': 6.387163559901117e-05, 'epoch': 1.93} + 64%|██████▍ | 11130/17285 [99:43:03<57:39:16, 33.72s/it] 64%|██████▍ | 11131/17285 [99:43:37<58:07:39, 34.00s/it] 64%|██████▍ | 11132/17285 [99:44:05<54:57:49, 32.16s/it] 64%|██████▍ | 11133/17285 [99:44:33<52:34:11, 30.76s/it] 64%|██████▍ | 11134/17285 [99:44:58<49:27:08, 28.94s/it] 64%|██████▍ | 11135/17285 [99:45:34<53:07:19, 31.10s/it] 64%|██████▍ | 11136/17285 [99:46:05<53:30:24, 31.33s/it] 64%|██████▍ | 11137/17285 [99:46:42<56:22:09, 33.01s/it] 64%|██████▍ | 11138/17285 [99:47:17<57:04:25, 33.43s/it] 64%|██████▍ | 11139/17285 [99:47:49<56:35:41, 33.15s/it] 64%|██████▍ | 11140/17285 [99:48:18<54:15:47, 31.79s/it] {'loss': 1.4255, 'learning_rate': 6.36932976429756e-05, 'epoch': 1.93} + 64%|██████▍ | 11140/17285 [99:48:18<54:15:47, 31.79s/it] 64%|██████▍ | 11141/17285 [99:48:50<54:09:51, 31.74s/it] 64%|██████▍ | 11142/17285 [99:49:22<54:27:26, 31.91s/it] 64%|██████▍ | 11143/17285 [99:49:52<53:44:46, 31.50s/it] 64%|██████▍ | 11144/17285 [99:50:29<56:12:29, 32.95s/it] 64%|██████▍ | 11145/17285 [99:50:53<51:42:55, 30.32s/it] 64%|██████▍ | 11146/17285 [99:51:21<50:20:44, 29.52s/it] 64%|██████▍ | 11147/17285 [99:51:51<50:57:44, 29.89s/it] 64%|██████▍ | 11148/17285 [99:52:26<53:30:50, 31.39s/it] 65%|██████▍ | 11149/17285 [99:52:53<50:53:14, 29.86s/it] 65%|██████▍ | 11150/17285 [99:53:24<51:45:51, 30.38s/it] {'loss': 1.4268, 'learning_rate': 6.35150925915705e-05, 'epoch': 1.94} + 65%|██████▍ | 11150/17285 [99:53:24<51:45:51, 30.38s/it] 65%|██████▍ | 11151/17285 [99:53:53<50:58:47, 29.92s/it] 65%|██████▍ | 11152/17285 [99:54:23<51:12:49, 30.06s/it] 65%|██████▍ | 11153/17285 [99:54:52<50:29:11, 29.64s/it] 65%|██████▍ | 11154/17285 [99:55:30<54:42:04, 32.12s/it] 65%|██████▍ | 11155/17285 [99:56:00<53:52:21, 31.64s/it] 65%|██████▍ | 11156/17285 [99:56:25<50:17:52, 29.54s/it] 65%|██████▍ | 11157/17285 [99:57:00<53:08:48, 31.22s/it] 65%|██████▍ | 11158/17285 [99:57:29<52:07:46, 30.63s/it] 65%|██████▍ | 11159/17285 [99:58:10<56:57:03, 33.47s/it] 65%|██████▍ | 11160/17285 [99:58:43<56:52:18, 33.43s/it] {'loss': 1.3947, 'learning_rate': 6.333702109713477e-05, 'epoch': 1.94} + 65%|██████▍ | 11160/17285 [99:58:43<56:52:18, 33.43s/it] 65%|██████▍ | 11161/17285 [99:59:08<52:49:21, 31.05s/it] 65%|██████▍ | 11162/17285 [99:59:36<51:17:06, 30.15s/it] 65%|██████▍ | 11163/17285 [100:00:04<49:49:46, 29.30s/it] 65%|██████▍ | 11164/17285 [100:00:34<50:21:02, 29.61s/it] 65%|██████▍ | 11165/17285 [100:01:06<51:28:52, 30.28s/it] 65%|██████▍ | 11166/17285 [100:01:39<52:44:29, 31.03s/it] 65%|██████▍ | 11167/17285 [100:02:04<49:39:51, 29.22s/it] 65%|██████▍ | 11168/17285 [100:02:35<50:47:46, 29.89s/it] 65%|██████▍ | 11169/17285 [100:03:06<51:11:53, 30.14s/it] 65%|██████▍ | 11170/17285 [100:03:38<52:19:15, 30.80s/it] {'loss': 1.4452, 'learning_rate': 6.315908381151857e-05, 'epoch': 1.94} + 65%|██████▍ | 11170/17285 [100:03:38<52:19:15, 30.80s/it] 65%|██████▍ | 11171/17285 [100:04:09<52:03:46, 30.66s/it] 65%|██████▍ | 11172/17285 [100:04:45<54:52:47, 32.32s/it] 65%|██████▍ | 11173/17285 [100:05:19<55:41:17, 32.80s/it] 65%|██████▍ | 11174/17285 [100:05:50<54:42:58, 32.23s/it] 65%|██████▍ | 11175/17285 [100:06:17<52:02:20, 30.66s/it] 65%|██████▍ | 11176/17285 [100:06:48<52:38:27, 31.02s/it] 65%|██████▍ | 11177/17285 [100:07:19<52:37:58, 31.02s/it] 65%|██████▍ | 11178/17285 [100:07:48<51:20:59, 30.27s/it] 65%|██████▍ | 11179/17285 [100:08:17<50:50:06, 29.97s/it] 65%|██████▍ | 11180/17285 [100:08:43<48:52:45, 28.82s/it] {'loss': 1.4187, 'learning_rate': 6.298128138608059e-05, 'epoch': 1.94} + 65%|██████▍ | 11180/17285 [100:08:43<48:52:45, 28.82s/it] 65%|██████▍ | 11181/17285 [100:09:14<49:30:49, 29.20s/it] 65%|██████▍ | 11182/17285 [100:09:47<51:38:59, 30.47s/it] 65%|██████▍ | 11183/17285 [100:10:17<51:36:50, 30.45s/it] 65%|██████▍ | 11184/17285 [100:10:51<53:08:33, 31.36s/it] 65%|██████▍ | 11185/17285 [100:11:18<50:57:21, 30.07s/it] 65%|██████▍ | 11186/17285 [100:11:47<50:17:47, 29.69s/it] 65%|██████▍ | 11187/17285 [100:12:15<49:50:18, 29.42s/it] 65%|██████▍ | 11188/17285 [100:12:50<52:21:36, 30.92s/it] 65%|██████▍ | 11189/17285 [100:13:22<52:47:59, 31.18s/it] 65%|██████▍ | 11190/17285 [100:13:57<54:54:05, 32.43s/it] {'loss': 1.3878, 'learning_rate': 6.280361447168603e-05, 'epoch': 1.94} + 65%|██████▍ | 11190/17285 [100:13:57<54:54:05, 32.43s/it] 65%|██████▍ | 11191/17285 [100:14:26<53:01:30, 31.32s/it] 65%|██████▍ | 11192/17285 [100:14:58<53:34:09, 31.65s/it] 65%|██████▍ | 11193/17285 [100:15:36<56:51:34, 33.60s/it] 65%|██████▍ | 11194/17285 [100:16:02<52:37:16, 31.10s/it] 65%|██████▍ | 11195/17285 [100:16:31<51:58:48, 30.73s/it] 65%|██████▍ | 11196/17285 [100:16:59<50:36:43, 29.92s/it] 65%|██████▍ | 11197/17285 [100:17:31<51:39:19, 30.55s/it] 65%|██████▍ | 11198/17285 [100:18:10<55:55:02, 33.07s/it][2023-08-27 04:13:14,757] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 65%|██████▍ | 11199/17285 [100:18:37<52:38:08, 31.14s/it] 65%|██████▍ | 11200/17285 [100:19:04<50:17:26, 29.75s/it] {'loss': 1.3753, 'learning_rate': 6.264383064821323e-05, 'epoch': 1.94} + 65%|██████▍ | 11200/17285 [100:19:04<50:17:26, 29.75s/it] 65%|██████▍ | 11201/17285 [100:19:38<52:44:28, 31.21s/it] 65%|██████▍ | 11202/17285 [100:20:16<56:14:04, 33.28s/it] 65%|██████▍ | 11203/17285 [100:20:51<56:53:43, 33.68s/it] 65%|██████▍ | 11204/17285 [100:21:28<58:33:29, 34.67s/it] 65%|██████▍ | 11205/17285 [100:22:10<62:21:51, 36.93s/it] 65%|██████▍ | 11206/17285 [100:22:46<61:39:58, 36.52s/it] 65%|██████▍ | 11207/17285 [100:23:17<58:50:45, 34.85s/it] 65%|██████▍ | 11208/17285 [100:23:50<58:03:16, 34.39s/it] 65%|██████▍ | 11209/17285 [100:24:24<58:03:55, 34.40s/it] 65%|██████▍ | 11210/17285 [100:25:02<59:45:59, 35.42s/it] {'loss': 1.372, 'learning_rate': 6.246642299615586e-05, 'epoch': 1.95} + 65%|██████▍ | 11210/17285 [100:25:02<59:45:59, 35.42s/it] 65%|██████▍ | 11211/17285 [100:25:32<57:03:36, 33.82s/it] 65%|██████▍ | 11212/17285 [100:26:15<61:37:14, 36.53s/it] 65%|██████▍ | 11213/17285 [100:26:49<60:06:47, 35.64s/it][2023-08-27 04:21:59,942] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 65%|██████▍ | 11214/17285 [100:27:22<59:03:56, 35.02s/it] 65%|██████▍ | 11215/17285 [100:27:55<58:03:32, 34.43s/it] 65%|██████▍ | 11216/17285 [100:28:22<54:06:59, 32.10s/it] 65%|██████▍ | 11217/17285 [100:28:49<51:46:45, 30.72s/it] 65%|██████▍ | 11218/17285 [100:29:32<57:57:49, 34.39s/it] 65%|██████▍ | 11219/17285 [100:30:05<56:54:14, 33.77s/it] 65%|██████▍ | 11220/17285 [100:30:39<57:00:04, 33.83s/it] {'loss': 1.361, 'learning_rate': 6.230687356416249e-05, 'epoch': 1.95} + 65%|██████▍ | 11220/17285 [100:30:39<57:00:04, 33.83s/it] 65%|██████▍ | 11221/17285 [100:31:06<53:40:53, 31.87s/it] 65%|██████▍ | 11222/17285 [100:31:38<53:31:31, 31.78s/it] 65%|██████▍ | 11223/17285 [100:32:05<51:25:21, 30.54s/it] 65%|██████▍ | 11224/17285 [100:32:36<51:24:22, 30.53s/it] 65%|██████▍ | 11225/17285 [100:33:03<49:44:42, 29.55s/it] 65%|██████▍ | 11226/17285 [100:33:34<50:42:02, 30.12s/it] 65%|██████▍ | 11227/17285 [100:34:02<49:19:07, 29.31s/it] 65%|██████▍ | 11228/17285 [100:34:35<51:03:50, 30.35s/it] 65%|██████▍ | 11229/17285 [100:35:06<51:32:14, 30.64s/it] 65%|██████▍ | 11230/17285 [100:35:44<55:28:15, 32.98s/it] {'loss': 1.4421, 'learning_rate': 6.212972751884663e-05, 'epoch': 1.95} + 65%|██████▍ | 11230/17285 [100:35:44<55:28:15, 32.98s/it] 65%|██████▍ | 11231/17285 [100:36:11<52:14:02, 31.06s/it] 65%|██████▍ | 11232/17285 [100:36:39<50:29:04, 30.03s/it] 65%|██████▍ | 11233/17285 [100:37:06<49:12:05, 29.27s/it] 65%|██████▍ | 11234/17285 [100:37:36<49:23:49, 29.39s/it] 65%|██████▍ | 11235/17285 [100:38:08<51:03:36, 30.38s/it] 65%|██████▌ | 11236/17285 [100:38:37<50:06:22, 29.82s/it] 65%|██████▌ | 11237/17285 [100:39:07<50:14:30, 29.91s/it] 65%|██████▌ | 11238/17285 [100:39:39<51:25:43, 30.62s/it] 65%|██████▌ | 11239/17285 [100:40:15<53:43:01, 31.98s/it] 65%|██████▌ | 11240/17285 [100:40:48<54:13:18, 32.29s/it] {'loss': 1.4402, 'learning_rate': 6.195272010177959e-05, 'epoch': 1.95} + 65%|██████▌ | 11240/17285 [100:40:48<54:13:18, 32.29s/it] 65%|██████▌ | 11241/17285 [100:41:21<54:51:06, 32.67s/it] 65%|██████▌ | 11242/17285 [100:41:57<56:40:36, 33.76s/it] 65%|██████▌ | 11243/17285 [100:42:41<61:39:06, 36.73s/it] 65%|██████▌ | 11244/17285 [100:43:15<60:02:33, 35.78s/it] 65%|██████▌ | 11245/17285 [100:43:44<56:53:56, 33.91s/it] 65%|██████▌ | 11246/17285 [100:44:13<54:11:55, 32.31s/it] 65%|██████▌ | 11247/17285 [100:44:42<52:50:02, 31.50s/it] 65%|██████▌ | 11248/17285 [100:45:14<53:07:06, 31.68s/it] 65%|██████▌ | 11249/17285 [100:45:47<53:39:42, 32.01s/it] 65%|██████▌ | 11250/17285 [100:46:14<51:03:33, 30.46s/it] {'loss': 1.3958, 'learning_rate': 6.177585196091631e-05, 'epoch': 1.95} + 65%|██████▌ | 11250/17285 [100:46:14<51:03:33, 30.46s/it] 65%|██████▌ | 11251/17285 [100:46:39<48:22:22, 28.86s/it] 65%|██████▌ | 11252/17285 [100:47:05<46:39:52, 27.85s/it] 65%|██████▌ | 11253/17285 [100:47:33<46:51:16, 27.96s/it] 65%|██████▌ | 11254/17285 [100:48:22<57:13:49, 34.16s/it] 65%|██████▌ | 11255/17285 [100:49:01<59:43:36, 35.66s/it] 65%|██████▌ | 11256/17285 [100:49:30<56:44:18, 33.88s/it] 65%|██████▌ | 11257/17285 [100:49:56<52:31:16, 31.37s/it] 65%|██████▌ | 11258/17285 [100:50:32<54:58:42, 32.84s/it] 65%|██████▌ | 11259/17285 [100:51:02<53:32:42, 31.99s/it] 65%|██████▌ | 11260/17285 [100:51:29<50:44:26, 30.32s/it] {'loss': 1.4365, 'learning_rate': 6.159912374370183e-05, 'epoch': 1.95} + 65%|██████▌ | 11260/17285 [100:51:29<50:44:26, 30.32s/it] 65%|██████▌ | 11261/17285 [100:51:57<49:52:54, 29.81s/it] 65%|██████▌ | 11262/17285 [100:52:31<52:01:39, 31.10s/it] 65%|██████▌ | 11263/17285 [100:53:08<54:33:33, 32.62s/it] 65%|██████▌ | 11264/17285 [100:53:33<51:09:32, 30.59s/it] 65%|██████▌ | 11265/17285 [100:54:04<51:18:01, 30.68s/it] 65%|██████▌ | 11266/17285 [100:54:38<52:42:08, 31.52s/it] 65%|██████▌ | 11267/17285 [100:55:10<53:07:56, 31.78s/it] 65%|██████▌ | 11268/17285 [100:55:38<51:07:59, 30.59s/it] 65%|██████▌ | 11269/17285 [100:56:07<50:15:56, 30.08s/it] 65%|██████▌ | 11270/17285 [100:56:40<51:35:35, 30.88s/it] {'loss': 1.4041, 'learning_rate': 6.142253609706898e-05, 'epoch': 1.96} + 65%|██████▌ | 11270/17285 [100:56:40<51:35:35, 30.88s/it] 65%|██████▌ | 11271/17285 [100:57:15<53:55:06, 32.28s/it] 65%|██████▌ | 11272/17285 [100:57:44<52:01:27, 31.15s/it] 65%|██████▌ | 11273/17285 [100:58:22<55:25:02, 33.18s/it] 65%|██████▌ | 11274/17285 [100:58:51<53:23:47, 31.98s/it] 65%|██████▌ | 11275/17285 [100:59:28<55:51:41, 33.46s/it] 65%|██████▌ | 11276/17285 [101:00:01<55:56:48, 33.52s/it] 65%|██████▌ | 11277/17285 [101:00:36<56:20:05, 33.76s/it] 65%|██████▌ | 11278/17285 [101:01:11<56:57:40, 34.14s/it] 65%|██████▌ | 11279/17285 [101:01:46<57:43:45, 34.60s/it] 65%|██████▌ | 11280/17285 [101:02:15<54:44:42, 32.82s/it] {'loss': 1.4321, 'learning_rate': 6.124608966743606e-05, 'epoch': 1.96} + 65%|██████▌ | 11280/17285 [101:02:15<54:44:42, 32.82s/it] 65%|██████▌ | 11281/17285 [101:02:49<55:32:21, 33.30s/it] 65%|██████▌ | 11282/17285 [101:03:24<55:57:44, 33.56s/it] 65%|██████▌ | 11283/17285 [101:04:01<57:52:15, 34.71s/it] 65%|██████▌ | 11284/17285 [101:04:30<54:53:51, 32.93s/it] 65%|██████▌ | 11285/17285 [101:04:59<53:12:20, 31.92s/it] 65%|██████▌ | 11286/17285 [101:05:36<55:35:44, 33.36s/it] 65%|██████▌ | 11287/17285 [101:06:01<51:33:39, 30.95s/it] 65%|██████▌ | 11288/17285 [101:06:37<53:58:43, 32.40s/it] 65%|██████▌ | 11289/17285 [101:07:11<54:49:21, 32.92s/it] 65%|██████▌ | 11290/17285 [101:07:40<52:37:12, 31.60s/it] {'loss': 1.4154, 'learning_rate': 6.106978510070443e-05, 'epoch': 1.96} + 65%|██████▌ | 11290/17285 [101:07:40<52:37:12, 31.60s/it] 65%|██████▌ | 11291/17285 [101:08:21<57:30:43, 34.54s/it] 65%|██████▌ | 11292/17285 [101:08:58<58:46:17, 35.30s/it] 65%|██████▌ | 11293/17285 [101:09:42<62:54:47, 37.80s/it] 65%|██████▌ | 11294/17285 [101:10:18<62:03:14, 37.29s/it] 65%|██████▌ | 11295/17285 [101:10:53<60:57:53, 36.64s/it] 65%|██████▌ | 11296/17285 [101:11:29<60:24:01, 36.31s/it] 65%|██████▌ | 11297/17285 [101:11:56<55:41:13, 33.48s/it] 65%|██████▌ | 11298/17285 [101:12:27<54:53:27, 33.01s/it] 65%|██████▌ | 11299/17285 [101:12:59<54:05:10, 32.53s/it] 65%|██████▌ | 11300/17285 [101:13:42<59:17:40, 35.67s/it] {'loss': 1.4208, 'learning_rate': 6.089362304225603e-05, 'epoch': 1.96} + 65%|██████▌ | 11300/17285 [101:13:42<59:17:40, 35.67s/it] 65%|██████▌ | 11301/17285 [101:14:07<54:16:24, 32.65s/it] 65%|██████▌ | 11302/17285 [101:14:43<55:53:19, 33.63s/it] 65%|██████▌ | 11303/17285 [101:15:13<53:39:13, 32.29s/it] 65%|██████▌ | 11304/17285 [101:15:40<51:16:04, 30.86s/it] 65%|██████▌ | 11305/17285 [101:16:19<55:12:44, 33.24s/it] 65%|██████▌ | 11306/17285 [101:16:54<56:11:59, 33.84s/it] 65%|██████▌ | 11307/17285 [101:17:23<53:30:23, 32.22s/it] 65%|██████▌ | 11308/17285 [101:17:52<52:19:38, 31.52s/it] 65%|██████▌ | 11309/17285 [101:18:18<49:08:48, 29.61s/it] 65%|██████▌ | 11310/17285 [101:18:54<52:28:54, 31.62s/it] {'loss': 1.3757, 'learning_rate': 6.071760413695131e-05, 'epoch': 1.96} + 65%|██████▌ | 11310/17285 [101:18:54<52:28:54, 31.62s/it] 65%|██████▌ | 11311/17285 [101:19:20<49:40:05, 29.93s/it] 65%|██████▌ | 11312/17285 [101:19:51<50:09:24, 30.23s/it] 65%|██████▌ | 11313/17285 [101:20:26<52:45:41, 31.81s/it] 65%|██████▌ | 11314/17285 [101:20:59<53:00:52, 31.96s/it] 65%|██████▌ | 11315/17285 [101:21:30<52:46:45, 31.83s/it] 65%|██████▌ | 11316/17285 [101:22:05<54:19:42, 32.77s/it] 65%|██████▌ | 11317/17285 [101:22:35<52:40:11, 31.77s/it] 65%|██████▌ | 11318/17285 [101:23:07<52:59:02, 31.97s/it] 65%|██████▌ | 11319/17285 [101:23:37<51:58:59, 31.37s/it] 65%|██████▌ | 11320/17285 [101:24:10<52:34:29, 31.73s/it] {'loss': 1.4266, 'learning_rate': 6.054172902912656e-05, 'epoch': 1.96} + 65%|██████▌ | 11320/17285 [101:24:10<52:34:29, 31.73s/it] 65%|██████▌ | 11321/17285 [101:24:41<52:28:46, 31.68s/it] 66%|██████▌ | 11322/17285 [101:25:15<53:28:13, 32.28s/it] 66%|██████▌ | 11323/17285 [101:25:42<50:46:31, 30.66s/it] 66%|██████▌ | 11324/17285 [101:26:16<52:31:02, 31.72s/it] 66%|██████▌ | 11325/17285 [101:26:41<49:18:21, 29.78s/it] 66%|██████▌ | 11326/17285 [101:27:07<47:35:30, 28.75s/it] 66%|██████▌ | 11327/17285 [101:27:34<46:23:31, 28.03s/it] 66%|██████▌ | 11328/17285 [101:27:59<44:46:48, 27.06s/it] 66%|██████▌ | 11329/17285 [101:28:28<46:09:39, 27.90s/it] 66%|██████▌ | 11330/17285 [101:28:56<46:04:20, 27.85s/it] {'loss': 1.4081, 'learning_rate': 6.0365998362591744e-05, 'epoch': 1.97} + 66%|██████▌ | 11330/17285 [101:28:56<46:04:20, 27.85s/it] 66%|██████▌ | 11331/17285 [101:29:29<48:37:24, 29.40s/it] 66%|██████▌ | 11332/17285 [101:30:03<50:46:00, 30.70s/it] 66%|██████▌ | 11333/17285 [101:30:31<49:22:24, 29.86s/it] 66%|██████▌ | 11334/17285 [101:31:02<49:54:56, 30.20s/it] 66%|██████▌ | 11335/17285 [101:31:32<49:44:12, 30.09s/it] 66%|██████▌ | 11336/17285 [101:32:02<49:46:45, 30.12s/it] 66%|██████▌ | 11337/17285 [101:32:31<49:29:26, 29.95s/it] 66%|██████▌ | 11338/17285 [101:33:05<51:24:44, 31.12s/it] 66%|██████▌ | 11339/17285 [101:33:41<53:50:49, 32.60s/it] 66%|██████▌ | 11340/17285 [101:34:11<52:21:25, 31.70s/it] {'loss': 1.4108, 'learning_rate': 6.019041278062807e-05, 'epoch': 1.97} + 66%|██████▌ | 11340/17285 [101:34:11<52:21:25, 31.70s/it] 66%|██████▌ | 11341/17285 [101:34:51<56:21:48, 34.14s/it][2023-08-27 05:29:53,867] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 66%|██████▌ | 11342/17285 [101:35:16<52:03:09, 31.53s/it] 66%|██████▌ | 11343/17285 [101:35:43<49:33:23, 30.02s/it] 66%|██████▌ | 11344/17285 [101:36:13<49:52:17, 30.22s/it] 66%|██████▌ | 11345/17285 [101:36:44<49:57:06, 30.27s/it] 66%|██████▌ | 11346/17285 [101:37:10<47:46:04, 28.96s/it] 66%|██████▌ | 11347/17285 [101:37:36<46:38:51, 28.28s/it] 66%|██████▌ | 11348/17285 [101:38:09<48:56:36, 29.68s/it] 66%|██████▌ | 11349/17285 [101:38:39<48:53:18, 29.65s/it] 66%|██████▌ | 11350/17285 [101:39:04<46:31:24, 28.22s/it] {'loss': 1.3667, 'learning_rate': 6.0032510335413086e-05, 'epoch': 1.97} + 66%|██████▌ | 11350/17285 [101:39:04<46:31:24, 28.22s/it] 66%|██████▌ | 11351/17285 [101:39:35<48:03:16, 29.15s/it] 66%|██████▌ | 11352/17285 [101:40:14<52:57:30, 32.13s/it] 66%|██████▌ | 11353/17285 [101:40:41<50:32:59, 30.68s/it] 66%|██████▌ | 11354/17285 [101:41:06<47:44:42, 28.98s/it] 66%|██████▌ | 11355/17285 [101:41:36<48:11:28, 29.26s/it] 66%|██████▌ | 11356/17285 [101:42:16<53:32:45, 32.51s/it] 66%|██████▌ | 11357/17285 [101:42:48<53:03:17, 32.22s/it] 66%|██████▌ | 11358/17285 [101:43:14<49:43:02, 30.20s/it] 66%|██████▌ | 11359/17285 [101:43:42<48:40:23, 29.57s/it] 66%|██████▌ | 11360/17285 [101:44:07<46:25:34, 28.21s/it] {'loss': 1.3987, 'learning_rate': 5.985720218447026e-05, 'epoch': 1.97} + 66%|██████▌ | 11360/17285 [101:44:07<46:25:34, 28.21s/it] 66%|██████▌ | 11361/17285 [101:44:33<45:18:15, 27.53s/it] 66%|██████▌ | 11362/17285 [101:45:03<46:54:32, 28.51s/it] 66%|██████▌ | 11363/17285 [101:45:30<46:06:13, 28.03s/it] 66%|██████▌ | 11364/17285 [101:45:57<45:32:39, 27.69s/it] 66%|██████▌ | 11365/17285 [101:46:25<45:21:17, 27.58s/it] 66%|██████▌ | 11366/17285 [101:46:57<47:41:50, 29.01s/it] 66%|██████▌ | 11367/17285 [101:47:28<48:30:18, 29.51s/it] 66%|██████▌ | 11368/17285 [101:47:57<48:37:50, 29.59s/it] 66%|██████▌ | 11369/17285 [101:48:30<50:21:21, 30.64s/it] 66%|██████▌ | 11370/17285 [101:49:09<54:09:34, 32.96s/it] {'loss': 1.3553, 'learning_rate': 5.9682040980602316e-05, 'epoch': 1.97} + 66%|██████▌ | 11370/17285 [101:49:09<54:09:34, 32.96s/it] 66%|██████▌ | 11371/17285 [101:49:37<51:42:04, 31.47s/it] 66%|██████▌ | 11372/17285 [101:50:06<50:21:34, 30.66s/it] 66%|██████▌ | 11373/17285 [101:50:46<55:06:23, 33.56s/it] 66%|██████▌ | 11374/17285 [101:51:16<53:22:48, 32.51s/it][2023-08-27 05:46:21,126] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 66%|██████▌ | 11375/17285 [101:51:43<50:54:41, 31.01s/it] 66%|██████▌ | 11376/17285 [101:52:14<50:41:17, 30.88s/it] 66%|██████▌ | 11377/17285 [101:52:44<50:21:19, 30.68s/it] 66%|██████▌ | 11378/17285 [101:53:16<50:51:45, 31.00s/it] 66%|██████▌ | 11379/17285 [101:53:55<54:43:57, 33.36s/it] 66%|██████▌ | 11380/17285 [101:54:40<60:35:11, 36.94s/it] {'loss': 1.3722, 'learning_rate': 5.9524522066830346e-05, 'epoch': 1.98} + 66%|██████▌ | 11380/17285 [101:54:40<60:35:11, 36.94s/it] 66%|██████▌ | 11381/17285 [101:55:10<57:11:26, 34.87s/it] 66%|██████▌ | 11382/17285 [101:55:40<54:38:17, 33.32s/it] 66%|██████▌ | 11383/17285 [101:56:07<51:24:23, 31.36s/it] 66%|██████▌ | 11384/17285 [101:56:38<51:19:36, 31.31s/it] 66%|██████▌ | 11385/17285 [101:57:13<52:59:29, 32.33s/it] 66%|██████▌ | 11386/17285 [101:57:46<53:16:39, 32.51s/it] 66%|██████▌ | 11387/17285 [101:58:14<51:30:01, 31.43s/it] 66%|██████▌ | 11388/17285 [101:58:42<49:24:23, 30.16s/it] 66%|██████▌ | 11389/17285 [101:59:16<51:21:40, 31.36s/it] 66%|██████▌ | 11390/17285 [101:59:45<50:16:21, 30.70s/it] {'loss': 1.4445, 'learning_rate': 5.934964182845485e-05, 'epoch': 1.98} + 66%|██████▌ | 11390/17285 [101:59:45<50:16:21, 30.70s/it] 66%|██████▌ | 11391/17285 [102:00:11<47:53:11, 29.25s/it] 66%|██████▌ | 11392/17285 [102:00:47<51:18:24, 31.34s/it] 66%|██████▌ | 11393/17285 [102:01:19<51:22:00, 31.39s/it] 66%|██████▌ | 11394/17285 [102:01:56<54:09:34, 33.10s/it] 66%|██████▌ | 11395/17285 [102:02:28<53:33:55, 32.74s/it] 66%|██████▌ | 11396/17285 [102:03:00<53:12:05, 32.52s/it] 66%|██████▌ | 11397/17285 [102:03:38<56:08:09, 34.32s/it] 66%|██████▌ | 11398/17285 [102:04:10<54:48:19, 33.51s/it] 66%|██████▌ | 11399/17285 [102:04:43<54:47:30, 33.51s/it] 66%|██████▌ | 11400/17285 [102:05:22<57:31:22, 35.19s/it] {'loss': 1.3968, 'learning_rate': 5.917491039513411e-05, 'epoch': 1.98} + 66%|██████▌ | 11400/17285 [102:05:22<57:31:22, 35.19s/it] 66%|██████▌ | 11401/17285 [102:05:49<53:31:30, 32.75s/it] 66%|██████▌ | 11402/17285 [102:06:26<55:26:30, 33.93s/it] 66%|██████▌ | 11403/17285 [102:07:06<58:14:09, 35.64s/it] 66%|██████▌ | 11404/17285 [102:07:39<57:11:59, 35.01s/it] 66%|██████▌ | 11405/17285 [102:08:06<53:17:17, 32.63s/it] 66%|██████▌ | 11406/17285 [102:08:48<57:33:22, 35.24s/it] 66%|██████▌ | 11407/17285 [102:09:19<55:46:01, 34.15s/it] 66%|██████▌ | 11408/17285 [102:09:45<51:53:11, 31.78s/it] 66%|██████▌ | 11409/17285 [102:10:21<53:48:47, 32.97s/it] 66%|██████▌ | 11410/17285 [102:10:52<52:49:02, 32.36s/it] {'loss': 1.3855, 'learning_rate': 5.9000328406491425e-05, 'epoch': 1.98} + 66%|██████▌ | 11410/17285 [102:10:52<52:49:02, 32.36s/it] 66%|██████▌ | 11411/17285 [102:11:22<51:20:59, 31.47s/it] 66%|██████��� | 11412/17285 [102:11:51<50:09:27, 30.75s/it] 66%|██████▌ | 11413/17285 [102:12:26<52:17:09, 32.06s/it] 66%|██████▌ | 11414/17285 [102:13:00<53:27:09, 32.78s/it] 66%|██████▌ | 11415/17285 [102:13:28<50:55:38, 31.23s/it] 66%|██████▌ | 11416/17285 [102:13:58<50:31:07, 30.99s/it] 66%|██████▌ | 11417/17285 [102:14:25<48:34:57, 29.81s/it] 66%|██████▌ | 11418/17285 [102:15:04<52:45:15, 32.37s/it] 66%|██████▌ | 11419/17285 [102:15:37<53:06:42, 32.60s/it] 66%|██████▌ | 11420/17285 [102:16:08<52:17:15, 32.09s/it] {'loss': 1.3988, 'learning_rate': 5.882589650160322e-05, 'epoch': 1.98} + 66%|██████▌ | 11420/17285 [102:16:08<52:17:15, 32.09s/it] 66%|██████▌ | 11421/17285 [102:16:41<53:00:57, 32.55s/it] 66%|██████▌ | 11422/17285 [102:17:11<51:40:49, 31.73s/it] 66%|██████▌ | 11423/17285 [102:17:52<55:56:02, 34.35s/it] 66%|██████▌ | 11424/17285 [102:18:22<53:53:23, 33.10s/it] 66%|██████▌ | 11425/17285 [102:18:55<53:45:05, 33.02s/it] 66%|██████▌ | 11426/17285 [102:19:28<54:07:22, 33.26s/it] 66%|██████▌ | 11427/17285 [102:20:01<53:57:18, 33.16s/it] 66%|██████▌ | 11428/17285 [102:20:39<55:57:30, 34.39s/it] 66%|██████▌ | 11429/17285 [102:21:09<53:50:25, 33.10s/it] 66%|██████▌ | 11430/17285 [102:21:39<52:32:58, 32.31s/it] {'loss': 1.3642, 'learning_rate': 5.865161531899642e-05, 'epoch': 1.98} + 66%|██████▌ | 11430/17285 [102:21:39<52:32:58, 32.31s/it] 66%|██████▌ | 11431/17285 [102:22:11<52:17:13, 32.15s/it] 66%|██████▌ | 11432/17285 [102:22:36<49:01:08, 30.15s/it] 66%|██████▌ | 11433/17285 [102:23:02<46:53:14, 28.84s/it] 66%|██████▌ | 11434/17285 [102:23:36<49:09:38, 30.25s/it] 66%|██████▌ | 11435/17285 [102:24:10<51:17:26, 31.56s/it] 66%|██████▌ | 11436/17285 [102:24:45<52:59:05, 32.61s/it] 66%|██████▌ | 11437/17285 [102:25:12<50:05:37, 30.84s/it] 66%|██████▌ | 11438/17285 [102:25:49<52:50:15, 32.53s/it] 66%|██████▌ | 11439/17285 [102:26:15<49:57:30, 30.76s/it] 66%|██████▌ | 11440/17285 [102:26:48<50:56:15, 31.37s/it] {'loss': 1.4189, 'learning_rate': 5.8477485496646245e-05, 'epoch': 1.99} + 66%|██████▌ | 11440/17285 [102:26:48<50:56:15, 31.37s/it] 66%|██████▌ | 11441/17285 [102:27:13<47:51:26, 29.48s/it] 66%|██████▌ | 11442/17285 [102:27:40<46:34:13, 28.69s/it] 66%|██████▌ | 11443/17285 [102:28:18<51:19:44, 31.63s/it] 66%|██████▌ | 11444/17285 [102:28:45<48:44:33, 30.04s/it] 66%|██████▌ | 11445/17285 [102:29:26<54:21:10, 33.51s/it] 66%|██████▌ | 11446/17285 [102:30:07<57:47:01, 35.63s/it] 66%|██████▌ | 11447/17285 [102:30:48<60:38:26, 37.39s/it] 66%|██████▌ | 11448/17285 [102:31:16<55:47:08, 34.41s/it] 66%|██████▌ | 11449/17285 [102:31:50<55:29:19, 34.23s/it] 66%|██████▌ | 11450/17285 [102:32:22<54:36:36, 33.69s/it] {'loss': 1.4004, 'learning_rate': 5.8303507671973864e-05, 'epoch': 1.99} + 66%|██████▌ | 11450/17285 [102:32:22<54:36:36, 33.69s/it] 66%|██████▌ | 11451/17285 [102:32:48<50:43:37, 31.30s/it] 66%|██████▋ | 11452/17285 [102:33:29<55:28:27, 34.24s/it] 66%|██████▋ | 11453/17285 [102:34:03<55:13:09, 34.09s/it] 66%|██████▋ | 11454/17285 [102:34:28<50:50:32, 31.39s/it] 66%|██████▋ | 11455/17285 [102:34:57<49:52:40, 30.80s/it] 66%|██████▋ | 11456/17285 [102:35:30<50:35:48, 31.25s/it] 66%|██████▋ | 11457/17285 [102:36:01<50:35:25, 31.25s/it] 66%|██████▋ | 11458/17285 [102:36:42<55:33:49, 34.33s/it] 66%|██████▋ | 11459/17285 [102:37:16<55:17:50, 34.17s/it] 66%|██████▋ | 11460/17285 [102:37:49<54:44:58, 33.84s/it] {'loss': 1.3928, 'learning_rate': 5.812968248184392e-05, 'epoch': 1.99} + 66%|██████▋ | 11460/17285 [102:37:49<54:44:58, 33.84s/it] 66%|██████▋ | 11461/17285 [102:38:17<51:42:35, 31.96s/it] 66%|██████▋ | 11462/17285 [102:38:43<48:44:24, 30.13s/it] 66%|██████▋ | 11463/17285 [102:39:15<50:01:56, 30.94s/it] 66%|██████▋ | 11464/17285 [102:39:42<47:53:58, 29.62s/it] 66%|██████▋ | 11465/17285 [102:40:13<48:47:07, 30.18s/it] 66%|██████▋ | 11466/17285 [102:40:44<49:08:56, 30.41s/it] 66%|██████▋ | 11467/17285 [102:41:11<47:04:26, 29.13s/it] 66%|██████▋ | 11468/17285 [102:41:37<45:48:42, 28.35s/it] 66%|██████▋ | 11469/17285 [102:42:23<54:08:02, 33.51s/it] 66%|██████▋ | 11470/17285 [102:43:02<56:56:29, 35.25s/it] {'loss': 1.4273, 'learning_rate': 5.795601056256257e-05, 'epoch': 1.99} + 66%|██████▋ | 11470/17285 [102:43:02<56:56:29, 35.25s/it] 66%|██████▋ | 11471/17285 [102:43:33<55:07:58, 34.14s/it] 66%|██████▋ | 11472/17285 [102:44:05<53:54:51, 33.39s/it] 66%|██████▋ | 11473/17285 [102:44:36<52:43:23, 32.66s/it] 66%|██████▋ | 11474/17285 [102:45:02<49:33:20, 30.70s/it] 66%|██████▋ | 11475/17285 [102:45:38<52:06:15, 32.29s/it] 66%|██████▋ | 11476/17285 [102:46:11<52:29:20, 32.53s/it] 66%|██████▋ | 11477/17285 [102:46:42<51:29:02, 31.91s/it] 66%|██████▋ | 11478/17285 [102:47:24<56:22:37, 34.95s/it] 66%|██████▋ | 11479/17285 [102:47:56<54:57:51, 34.08s/it] 66%|██████▋ | 11480/17285 [102:48:28<53:56:20, 33.45s/it] {'loss': 1.3824, 'learning_rate': 5.778249254987461e-05, 'epoch': 1.99} + 66%|██████▋ | 11480/17285 [102:48:28<53:56:20, 33.45s/it] 66%|██████▋ | 11481/17285 [102:49:02<54:23:03, 33.73s/it] 66%|██████▋ | 11482/17285 [102:49:35<53:57:20, 33.47s/it] 66%|██████▋ | 11483/17285 [102:50:17<57:55:32, 35.94s/it] 66%|██████▋ | 11484/17285 [102:50:54<58:45:40, 36.47s/it] 66%|██████▋ | 11485/17285 [102:51:23<54:53:47, 34.07s/it] 66%|██████▋ | 11486/17285 [102:51:56<54:21:28, 33.75s/it] 66%|██████▋ | 11487/17285 [102:52:32<55:42:35, 34.59s/it] 66%|██████▋ | 11488/17285 [102:53:05<54:29:00, 33.83s/it] 66%|██████▋ | 11489/17285 [102:53:36<53:24:54, 33.18s/it] 66%|██████▋ | 11490/17285 [102:54:14<55:29:58, 34.48s/it] {'loss': 1.4004, 'learning_rate': 5.7609129078961655e-05, 'epoch': 1.99} + 66%|██████▋ | 11490/17285 [102:54:14<55:29:58, 34.48s/it] 66%|██████▋ | 11491/17285 [102:54:46<54:13:47, 33.69s/it] 66%|██████▋ | 11492/17285 [102:55:13<51:19:14, 31.89s/it] 66%|██████▋ | 11493/17285 [102:55:58<57:19:56, 35.63s/it] 66%|██████▋ | 11494/17285 [102:56:31<55:59:48, 34.81s/it] 67%|██████▋ | 11495/17285 [102:56:59<52:46:49, 32.82s/it] 67%|██████▋ | 11496/17285 [102:57:37<55:34:46, 34.56s/it] 67%|██████▋ | 11497/17285 [102:58:22<60:31:04, 37.64s/it] 67%|██████▋ | 11498/17285 [102:58:52<56:33:57, 35.19s/it] 67%|██████▋ | 11499/17285 [102:59:24<55:05:20, 34.28s/it] 67%|██████▋ | 11500/17285 [103:00:05<58:35:38, 36.46s/it] {'loss': 1.3801, 'learning_rate': 5.7435920784439514e-05, 'epoch': 2.0} + 67%|██████▋ | 11500/17285 [103:00:05<58:35:38, 36.46s/it] 67%|██████▋ | 11501/17285 [103:00:31<53:25:30, 33.25s/it] 67%|██████▋ | 11502/17285 [103:01:07<54:43:10, 34.06s/it] 67%|██████▋ | 11503/17285 [103:01:43<55:48:57, 34.75s/it] 67%|██████▋ | 11504/17285 [103:02:13<53:31:33, 33.33s/it] 67%|██████▋ | 11505/17285 [103:02:50<55:11:33, 34.38s/it] 67%|██████▋ | 11506/17285 [103:03:26<55:55:12, 34.84s/it] 67%|██████▋ | 11507/17285 [103:03:59<54:56:38, 34.23s/it] 67%|██████▋ | 11508/17285 [103:04:33<54:47:06, 34.14s/it] 67%|██████▋ | 11509/17285 [103:05:10<56:19:17, 35.10s/it] 67%|██████▋ | 11510/17285 [103:05:44<55:43:18, 34.74s/it] {'loss': 1.373, 'learning_rate': 5.7262868300355975e-05, 'epoch': 2.0} + 67%|██████▋ | 11510/17285 [103:05:44<55:43:18, 34.74s/it] 67%|██████▋ | 11511/17285 [103:06:12<52:14:51, 32.58s/it] 67%|██████▋ | 11512/17285 [103:06:42<51:17:22, 31.98s/it] 67%|██████▋ | 11513/17285 [103:07:09<48:58:53, 30.55s/it] 67%|██████▋ | 11514/17285 [103:07:44<50:55:04, 31.76s/it] 67%|██████▋ | 11515/17285 [103:08:19<52:20:38, 32.66s/it] 67%|██████▋ | 11516/17285 [103:08:45<49:22:54, 30.82s/it] 67%|██████▋ | 11517/17285 [103:09:23<52:33:30, 32.80s/it] 67%|██████▋ | 11518/17285 [103:09:57<53:20:14, 33.30s/it] 67%|██████▋ | 11519/17285 [103:10:30<52:53:06, 33.02s/it] 67%|██████▋ | 11520/17285 [103:11:03<53:10:00, 33.20s/it] {'loss': 1.414, 'learning_rate': 5.7089972260188485e-05, 'epoch': 2.0} + 67%|██████▋ | 11520/17285 [103:11:03<53:10:00, 33.20s/it] 67%|██████▋ | 11521/17285 [103:11:38<54:05:07, 33.78s/it] 67%|██████▋ | 11522/17285 [103:12:06<51:07:40, 31.94s/it] 67%|██████▋ | 11523/17285 [103:12:39<51:48:18, 32.37s/it] 67%|██████▋ | 11524/17285 [103:13:17<54:24:43, 34.00s/it] 67%|██████▋ | 11525/17285 [103:13:49<53:11:55, 33.25s/it] 67%|██████▋ | 11526/17285 [103:14:19<51:58:48, 32.49s/it] 67%|██████▋ | 11527/17285 [103:14:56<53:46:16, 33.62s/it] 67%|██████▋ | 11528/17285 [103:15:27<52:50:04, 33.04s/it] 67%|██████▋ | 11529/17285 [103:15:58<51:43:00, 32.35s/it] 67%|██████▋ | 11530/17285 [103:16:35<53:44:07, 33.61s/it] {'loss': 1.3001, 'learning_rate': 5.6917233296841776e-05, 'epoch': 2.0} + 67%|██████▋ | 11530/17285 [103:16:35<53:44:07, 33.61s/it] 67%|██████▋ | 11531/17285 [103:17:06<52:47:18, 33.03s/it] 67%|██████▋ | 11532/17285 [103:17:41<53:26:02, 33.44s/it] 67%|██████▋ | 11533/17285 [103:18:07<49:53:50, 31.23s/it] 67%|██████▋ | 11534/17285 [103:18:39<50:10:00, 31.40s/it] 67%|██████▋ | 11535/17285 [103:19:17<53:40:20, 33.60s/it] 67%|██████▋ | 11536/17285 [103:19:50<53:22:13, 33.42s/it] 67%|██████▋ | 11537/17285 [103:20:22<52:22:18, 32.80s/it] 67%|██████▋ | 11538/17285 [103:20:52<51:19:42, 32.15s/it] 67%|██████▋ | 11539/17285 [103:21:23<50:25:00, 31.59s/it] 67%|██████▋ | 11540/17285 [103:21:57<51:47:56, 32.46s/it] {'loss': 1.3162, 'learning_rate': 5.6744652042645616e-05, 'epoch': 2.0} + 67%|██████▋ | 11540/17285 [103:21:57<51:47:56, 32.46s/it] 67%|██████▋ | 11541/17285 [103:22:23<48:46:41, 30.57s/it] 67%|██████▋ | 11542/17285 [103:22:52<47:42:11, 29.90s/it] 67%|██████▋ | 11543/17285 [103:23:34<53:34:30, 33.59s/it] 67%|██████▋ | 11544/17285 [103:24:08<53:50:03, 33.76s/it] 67%|██████▋ | 11545/17285 [103:24:40<53:08:38, 33.33s/it] 67%|██████▋ | 11546/17285 [103:25:08<50:23:01, 31.61s/it] 67%|██████▋ | 11547/17285 [103:25:51<55:49:11, 35.02s/it] 67%|██████▋ | 11548/17285 [103:26:25<55:14:43, 34.67s/it] 67%|██████▋ | 11549/17285 [103:26:56<53:25:39, 33.53s/it] 67%|██████▋ | 11550/17285 [103:27:25<51:41:54, 32.45s/it] {'loss': 1.2765, 'learning_rate': 5.6572229129352474e-05, 'epoch': 2.0} + 67%|██████▋ | 11550/17285 [103:27:25<51:41:54, 32.45s/it] 67%|██████▋ | 11551/17285 [103:28:04<54:35:27, 34.27s/it] 67%|██████▋ | 11552/17285 [103:28:31<50:57:56, 32.00s/it] 67%|██████▋ | 11553/17285 [103:29:07<53:06:07, 33.35s/it] 67%|██████▋ | 11554/17285 [103:29:36<51:01:08, 32.05s/it] 67%|██████▋ | 11555/17285 [103:30:15<54:13:06, 34.06s/it] 67%|██████▋ | 11556/17285 [103:30:56<57:23:09, 36.06s/it] 67%|██████▋ | 11557/17285 [103:31:35<59:02:34, 37.11s/it] 67%|██████▋ | 11558/17285 [103:32:06<56:04:29, 35.25s/it] 67%|██████▋ | 11559/17285 [103:32:31<51:05:39, 32.12s/it] 67%|██████▋ | 11560/17285 [103:32:59<49:01:54, 30.83s/it] {'loss': 1.2453, 'learning_rate': 5.6399965188135084e-05, 'epoch': 2.01} + 67%|██████▋ | 11560/17285 [103:32:59<49:01:54, 30.83s/it] 67%|██████▋ | 11561/17285 [103:33:35<51:24:51, 32.34s/it] 67%|██████▋ | 11562/17285 [103:34:05<50:32:18, 31.79s/it] 67%|██████▋ | 11563/17285 [103:34:32<48:13:39, 30.34s/it] 67%|██████▋ | 11564/17285 [103:35:02<47:59:14, 30.20s/it] 67%|██████▋ | 11565/17285 [103:35:32<48:02:06, 30.23s/it] 67%|██████▋ | 11566/17285 [103:35:59<46:33:48, 29.31s/it] 67%|██████▋ | 11567/17285 [103:36:31<47:50:49, 30.12s/it] 67%|██████▋ | 11568/17285 [103:37:07<50:31:50, 31.82s/it] 67%|██████▋ | 11569/17285 [103:37:37<49:34:11, 31.22s/it] 67%|██████▋ | 11570/17285 [103:38:09<49:51:03, 31.40s/it] {'loss': 1.2787, 'learning_rate': 5.622786084958437e-05, 'epoch': 2.01} + 67%|██████▋ | 11570/17285 [103:38:09<49:51:03, 31.40s/it] 67%|██████▋ | 11571/17285 [103:38:42<50:47:50, 32.00s/it] 67%|█████��▋ | 11572/17285 [103:39:14<50:32:53, 31.85s/it] 67%|██████▋ | 11573/17285 [103:39:44<49:35:29, 31.26s/it] 67%|██████▋ | 11574/17285 [103:40:19<51:36:54, 32.54s/it] 67%|██████▋ | 11575/17285 [103:40:50<50:57:15, 32.13s/it] 67%|██████▋ | 11576/17285 [103:41:20<49:52:40, 31.45s/it] 67%|██████▋ | 11577/17285 [103:41:48<48:16:20, 30.45s/it] 67%|██████▋ | 11578/17285 [103:42:20<48:48:48, 30.79s/it][2023-08-27 07:37:37,721] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 67%|██████▋ | 11579/17285 [103:43:00<53:14:01, 33.59s/it][2023-08-27 07:38:12,481] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 67%|██████▋ | 11580/17285 [103:43:35<53:46:57, 33.94s/it] {'loss': 1.2726, 'learning_rate': 5.6090292716047934e-05, 'epoch': 2.01} + 67%|██████▋ | 11580/17285 [103:43:35<53:46:57, 33.94s/it] 67%|██████▋ | 11581/17285 [103:44:04<51:33:05, 32.54s/it] 67%|██████▋ | 11582/17285 [103:44:40<53:03:19, 33.49s/it] 67%|██████▋ | 11583/17285 [103:45:05<49:02:45, 30.97s/it] 67%|██████▋ | 11584/17285 [103:45:39<50:25:46, 31.84s/it] 67%|██████▋ | 11585/17285 [103:46:20<54:50:15, 34.63s/it] 67%|██████▋ | 11586/17285 [103:46:51<52:57:28, 33.45s/it] 67%|██████▋ | 11587/17285 [103:47:29<55:11:53, 34.87s/it] 67%|██████▋ | 11588/17285 [103:48:03<54:53:47, 34.69s/it] 67%|██████▋ | 11589/17285 [103:48:37<54:19:41, 34.34s/it] 67%|██████▋ | 11590/17285 [103:49:02<49:58:25, 31.59s/it] {'loss': 1.2786, 'learning_rate': 5.591847724951989e-05, 'epoch': 2.01} + 67%|██████▋ | 11590/17285 [103:49:02<49:58:25, 31.59s/it] 67%|██████▋ | 11591/17285 [103:49:32<49:33:06, 31.33s/it] 67%|██████▋ | 11592/17285 [103:50:03<48:59:52, 30.98s/it] 67%|██████▋ | 11593/17285 [103:50:36<50:07:21, 31.70s/it] 67%|██████▋ | 11594/17285 [103:51:11<51:34:55, 32.63s/it] 67%|██████▋ | 11595/17285 [103:51:42<50:56:19, 32.23s/it] 67%|██████▋ | 11596/17285 [103:52:17<52:03:41, 32.94s/it] 67%|██████▋ | 11597/17285 [103:52:49<51:50:18, 32.81s/it] 67%|██████▋ | 11598/17285 [103:53:18<49:43:58, 31.48s/it] 67%|██████▋ | 11599/17285 [103:53:43<47:02:02, 29.78s/it] 67%|██████▋ | 11600/17285 [103:54:16<48:24:12, 30.65s/it] {'loss': 1.2788, 'learning_rate': 5.574682314819745e-05, 'epoch': 2.01} + 67%|██████▋ | 11600/17285 [103:54:16<48:24:12, 30.65s/it] 67%|██████▋ | 11601/17285 [103:54:53<51:28:18, 32.60s/it] 67%|██████▋ | 11602/17285 [103:55:18<47:45:25, 30.25s/it] 67%|██████▋ | 11603/17285 [103:55:52<49:23:01, 31.29s/it] 67%|██████▋ | 11604/17285 [103:56:17<46:24:52, 29.41s/it] 67%|██████▋ | 11605/17285 [103:56:46<46:24:03, 29.41s/it] 67%|██████▋ | 11606/17285 [103:57:14<45:52:11, 29.08s/it] 67%|██████▋ | 11607/17285 [103:57:45<46:29:15, 29.47s/it] 67%|██████▋ | 11608/17285 [103:58:22<50:01:04, 31.72s/it] 67%|██████▋ | 11609/17285 [103:58:51<48:46:47, 30.94s/it] 67%|██████▋ | 11610/17285 [103:59:24<49:41:09, 31.52s/it] {'loss': 1.2654, 'learning_rate': 5.557533104043913e-05, 'epoch': 2.02} + 67%|██████▋ | 11610/17285 [103:59:24<49:41:09, 31.52s/it] 67%|██████▋ | 11611/17285 [104:00:00<51:55:10, 32.94s/it] 67%|██████▋ | 11612/17285 [104:00:35<52:40:26, 33.43s/it] 67%|██████▋ | 11613/17285 [104:01:07<52:10:45, 33.12s/it] 67%|██████▋ | 11614/17285 [104:01:32<48:06:24, 30.54s/it] 67%|██████▋ | 11615/17285 [104:01:58<46:20:32, 29.42s/it] 67%|██████▋ | 11616/17285 [104:02:35<49:36:39, 31.50s/it] 67%|██████▋ | 11617/17285 [104:03:06<49:25:06, 31.39s/it] 67%|██████▋ | 11618/17285 [104:03:35<48:29:22, 30.80s/it] 67%|██████▋ | 11619/17285 [104:04:10<50:26:59, 32.05s/it] 67%|██████▋ | 11620/17285 [104:04:41<49:42:00, 31.58s/it] {'loss': 1.2507, 'learning_rate': 5.54040015540104e-05, 'epoch': 2.02} + 67%|██████▋ | 11620/17285 [104:04:41<49:42:00, 31.58s/it] 67%|██████▋ | 11621/17285 [104:05:12<49:22:05, 31.38s/it] 67%|██████▋ | 11622/17285 [104:05:42<48:50:30, 31.05s/it] 67%|██████▋ | 11623/17285 [104:06:10<47:27:36, 30.18s/it] 67%|██████▋ | 11624/17285 [104:06:34<44:19:17, 28.19s/it] 67%|██████▋ | 11625/17285 [104:07:02<44:11:33, 28.11s/it] 67%|██████▋ | 11626/17285 [104:07:33<45:44:27, 29.10s/it] 67%|██████▋ | 11627/17285 [104:08:01<45:21:33, 28.86s/it] 67%|██████▋ | 11628/17285 [104:08:38<48:59:16, 31.17s/it] 67%|██████▋ | 11629/17285 [104:09:04<46:51:17, 29.82s/it][2023-08-27 08:04:14,465] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 67%|██████▋ | 11630/17285 [104:09:37<48:01:02, 30.57s/it] {'loss': 1.3315, 'learning_rate': 5.5249944575829906e-05, 'epoch': 2.02} + 67%|██████▋ | 11630/17285 [104:09:37<48:01:02, 30.57s/it] 67%|██████▋ | 11631/17285 [104:10:02<45:17:08, 28.83s/it] 67%|██████▋ | 11632/17285 [104:10:42<50:55:21, 32.43s/it] 67%|██████▋ | 11633/17285 [104:11:15<50:52:06, 32.40s/it] 67%|██████▋ | 11634/17285 [104:11:44<49:17:13, 31.40s/it] 67%|██████▋ | 11635/17285 [104:12:14<48:38:28, 30.99s/it] 67%|██████▋ | 11636/17285 [104:12:50<50:58:19, 32.48s/it] 67%|██████▋ | 11637/17285 [104:13:15<47:35:26, 30.33s/it] 67%|██████▋ | 11638/17285 [104:13:50<49:53:42, 31.81s/it] 67%|██████▋ | 11639/17285 [104:14:15<46:25:39, 29.60s/it] 67%|██████▋ | 11640/17285 [104:14:53<50:41:13, 32.32s/it] {'loss': 1.2968, 'learning_rate': 5.507892579728751e-05, 'epoch': 2.02} + 67%|██████▋ | 11640/17285 [104:14:53<50:41:13, 32.32s/it] 67%|██████▋ | 11641/17285 [104:15:18<47:04:20, 30.02s/it] 67%|██████▋ | 11642/17285 [104:15:52<49:05:54, 31.32s/it] 67%|██████▋ | 11643/17285 [104:16:18<46:30:12, 29.67s/it] 67%|██████▋ | 11644/17285 [104:16:50<47:38:07, 30.40s/it] 67%|██████▋ | 11645/17285 [104:17:18<46:04:46, 29.41s/it] 67%|██████▋ | 11646/17285 [104:17:49<46:59:18, 30.00s/it] 67%|██████▋ | 11647/17285 [104:18:14<44:54:17, 28.67s/it] 67%|██████▋ | 11648/17285 [104:18:47<46:31:11, 29.71s/it] 67%|██████▋ | 11649/17285 [104:19:15<45:47:56, 29.25s/it] 67%|██████▋ | 11650/17285 [104:19:44<45:38:46, 29.16s/it] {'loss': 1.3051, 'learning_rate': 5.490807145722008e-05, 'epoch': 2.02} + 67%|██████▋ | 11650/17285 [104:19:44<45:38:46, 29.16s/it] 67%|██████▋ | 11651/17285 [104:20:17<47:26:56, 30.32s/it] 67%|██████▋ | 11652/17285 [104:20:46<47:07:47, 30.12s/it] 67%|██████▋ | 11653/17285 [104:21:21<49:15:16, 31.48s/it] 67%|██████▋ | 11654/17285 [104:21:53<49:22:28, 31.57s/it] 67%|██████▋ | 11655/17285 [104:22:26<50:14:01, 32.12s/it] 67%|██████▋ | 11656/17285 [104:22:58<49:55:48, 31.93s/it] 67%|██████▋ | 11657/17285 [104:23:25<47:40:02, 30.49s/it] 67%|██████▋ | 11658/17285 [104:23:55<47:42:57, 30.53s/it] 67%|██████▋ | 11659/17285 [104:24:25<47:02:11, 30.10s/it] 67%|██████▋ | 11660/17285 [104:25:01<50:05:17, 32.06s/it] {'loss': 1.306, 'learning_rate': 5.47373821810585e-05, 'epoch': 2.02} + 67%|██████▋ | 11660/17285 [104:25:01<50:05:17, 32.06s/it] 67%|██████▋ | 11661/17285 [104:25:38<52:16:38, 33.46s/it] 67%|██████▋ | 11662/17285 [104:26:18<55:08:37, 35.30s/it] 67%|██████▋ | 11663/17285 [104:26:44<50:58:56, 32.65s/it] 67%|██████▋ | 11664/17285 [104:27:09<47:14:58, 30.26s/it] 67%|██████▋ | 11665/17285 [104:27:44<49:36:53, 31.78s/it] 67%|██████▋ | 11666/17285 [104:28:12<47:39:59, 30.54s/it] 67%|██████▋ | 11667/17285 [104:28:39<46:20:25, 29.69s/it] 68%|██████▊ | 11668/17285 [104:29:08<45:36:19, 29.23s/it] 68%|██████▊ | 11669/17285 [104:29:37<45:55:08, 29.44s/it] 68%|██████▊ | 11670/17285 [104:30:07<45:50:43, 29.39s/it] {'loss': 1.3139, 'learning_rate': 5.4566858593629454e-05, 'epoch': 2.03} + 68%|██████▊ | 11670/17285 [104:30:07<45:50:43, 29.39s/it] 68%|██████▊ | 11671/17285 [104:30:38<46:56:29, 30.10s/it] 68%|██████▊ | 11672/17285 [104:31:12<48:40:19, 31.22s/it] 68%|██████▊ | 11673/17285 [104:31:47<50:20:01, 32.29s/it] 68%|██████▊ | 11674/17285 [104:32:21<50:59:22, 32.71s/it] 68%|██████▊ | 11675/17285 [104:32:54<51:00:43, 32.74s/it] 68%|██████▊ | 11676/17285 [104:33:23<49:17:11, 31.63s/it] 68%|██████▊ | 11677/17285 [104:34:00<51:45:31, 33.23s/it] 68%|██████▊ | 11678/17285 [104:34:28<49:28:35, 31.77s/it] 68%|██████▊ | 11679/17285 [104:35:02<50:31:13, 32.44s/it] 68%|██████▊ | 11680/17285 [104:35:28<47:36:21, 30.58s/it] {'loss': 1.277, 'learning_rate': 5.439650131915299e-05, 'epoch': 2.03} + 68%|██████▊ | 11680/17285 [104:35:28<47:36:21, 30.58s/it] 68%|██████▊ | 11681/17285 [104:35:58<47:24:31, 30.46s/it] 68%|██████▊ | 11682/17285 [104:36:23<44:51:36, 28.82s/it] 68%|██████▊ | 11683/17285 [104:36:54<45:54:46, 29.50s/it] 68%|██████▊ | 11684/17285 [104:37:27<47:30:28, 30.54s/it] 68%|██████▊ | 11685/17285 [104:37:59<48:12:38, 30.99s/it] 68%|██████▊ | 11686/17285 [104:38:32<48:49:25, 31.39s/it] 68%|██████▊ | 11687/17285 [104:39:02<48:05:24, 30.93s/it] 68%|██████▊ | 11688/17285 [104:39:38<50:22:44, 32.40s/it] 68%|██████▊ | 11689/17285 [104:40:23<56:20:04, 36.24s/it] 68%|██████▊ | 11690/17285 [104:40:51<52:50:15, 34.00s/it] {'loss': 1.2737, 'learning_rate': 5.4226310981240466e-05, 'epoch': 2.03} + 68%|██████▊ | 11690/17285 [104:40:51<52:50:15, 34.00s/it] 68%|██████▊ | 11691/17285 [104:41:25<52:28:56, 33.77s/it] 68%|██████▊ | 11692/17285 [104:42:00<53:16:08, 34.29s/it] 68%|██████▊ | 11693/17285 [104:42:31<51:44:57, 33.31s/it] 68%|██████▊ | 11694/17285 [104:43:10<54:30:02, 35.09s/it] 68%|██████▊ | 11695/17285 [104:43:45<54:13:40, 34.92s/it] 68%|██████▊ | 11696/17285 [104:44:10<49:35:57, 31.95s/it] 68%|██████▊ | 11697/17285 [104:44:36<46:57:53, 30.26s/it] 68%|██████▊ | 11698/17285 [104:45:07<46:58:16, 30.27s/it] 68%|██████▊ | 11699/17285 [104:45:44<50:06:45, 32.30s/it] 68%|██████▊ | 11700/17285 [104:46:12<48:27:01, 31.23s/it] {'loss': 1.311, 'learning_rate': 5.4056288202892126e-05, 'epoch': 2.03} + 68%|██████▊ | 11700/17285 [104:46:12<48:27:01, 31.23s/it] 68%|██████▊ | 11701/17285 [104:46:37<45:29:55, 29.33s/it] 68%|██████▊ | 11702/17285 [104:47:09<46:43:51, 30.13s/it] 68%|██████▊ | 11703/17285 [104:47:42<47:46:13, 30.81s/it] 68%|██████▊ | 11704/17285 [104:48:17<49:57:22, 32.22s/it] 68%|██████▊ | 11705/17285 [104:48:47<48:46:43, 31.47s/it] 68%|██████▊ | 11706/17285 [104:49:17<48:18:54, 31.18s/it] 68%|██████▊ | 11707/17285 [104:49:45<46:52:00, 30.25s/it] 68%|██████▊ | 11708/17285 [104:50:22<49:45:29, 32.12s/it] 68%|██████▊ | 11709/17285 [104:50:58<51:38:42, 33.34s/it] 68%|██████▊ | 11710/17285 [104:51:24<47:57:43, 30.97s/it] {'loss': 1.2775, 'learning_rate': 5.3886433606494804e-05, 'epoch': 2.03} + 68%|██████▊ | 11710/17285 [104:51:24<47:57:43, 30.97s/it] 68%|██████▊ | 11711/17285 [104:51:58<49:28:05, 31.95s/it] 68%|██████▊ | 11712/17285 [104:52:27<48:17:54, 31.20s/it] 68%|██████▊ | 11713/17285 [104:53:05<51:16:02, 33.12s/it] 68%|██████▊ | 11714/17285 [104:53:43<53:45:38, 34.74s/it] 68%|██████▊ | 11715/17285 [104:54:18<53:32:21, 34.60s/it] 68%|██████▊ | 11716/17285 [104:54:52<53:17:01, 34.44s/it] 68%|██████▊ | 11717/17285 [104:55:28<53:54:59, 34.86s/it] 68%|██████▊ | 11718/17285 [104:56:02<53:39:40, 34.70s/it] 68%|██████▊ | 11719/17285 [104:56:36<53:11:32, 34.40s/it] 68%|██████▊ | 11720/17285 [104:57:08<52:23:15, 33.89s/it] {'loss': 1.2633, 'learning_rate': 5.37167478138197e-05, 'epoch': 2.03} + 68%|██████▊ | 11720/17285 [104:57:08<52:23:15, 33.89s/it] 68%|██████▊ | 11721/17285 [104:57:41<52:01:32, 33.66s/it] 68%|██████▊ | 11722/17285 [104:58:14<51:35:19, 33.38s/it] 68%|██████▊ | 11723/17285 [104:58:43<49:16:00, 31.89s/it] 68%|██████▊ | 11724/17285 [104:59:11<47:32:38, 30.78s/it] 68%|██████▊ | 11725/17285 [104:59:43<48:20:04, 31.30s/it] 68%|██████▊ | 11726/17285 [105:00:12<47:04:03, 30.48s/it] 68%|██████▊ | 11727/17285 [105:00:49<49:54:55, 32.33s/it] 68%|██████▊ | 11728/17285 [105:01:17<48:05:56, 31.16s/it] 68%|██████▊ | 11729/17285 [105:01:50<49:02:16, 31.77s/it] 68%|██████▊ | 11730/17285 [105:02:21<48:27:52, 31.41s/it] {'loss': 1.3022, 'learning_rate': 5.354723144602016e-05, 'epoch': 2.04} + 68%|██████▊ | 11730/17285 [105:02:21<48:27:52, 31.41s/it] 68%|██████▊ | 11731/17285 [105:02:47<46:11:56, 29.95s/it] 68%|██████▊ | 11732/17285 [105:03:18<46:23:18, 30.07s/it] 68%|██████▊ | 11733/17285 [105:03:55<49:47:45, 32.29s/it] 68%|██████▊ | 11734/17285 [105:04:22<47:19:59, 30.70s/it] 68%|██████▊ | 11735/17285 [105:04:49<45:28:39, 29.50s/it] 68%|██████▊ | 11736/17285 [105:05:21<46:42:17, 30.30s/it] 68%|██████▊ | 11737/17285 [105:05:58<49:48:30, 32.32s/it] 68%|██████▊ | 11738/17285 [105:06:24<46:47:53, 30.37s/it] 68%|██████▊ | 11739/17285 [105:06:55<47:01:10, 30.52s/it] 68%|██████▊ | 11740/17285 [105:07:27<48:03:18, 31.20s/it] {'loss': 1.2979, 'learning_rate': 5.337788512362931e-05, 'epoch': 2.04} + 68%|██████▊ | 11740/17285 [105:07:27<48:03:18, 31.20s/it] 68%|██████▊ | 11741/17285 [105:07:52<45:07:11, 29.30s/it] 68%|██████▊ | 11742/17285 [105:08:22<45:18:27, 29.43s/it] 68%|██████▊ | 11743/17285 [105:08:51<45:06:52, 29.31s/it] 68%|██████▊ | 11744/17285 [105:09:25<47:19:32, 30.75s/it] 68%|██████▊ | 11745/17285 [105:10:02<50:17:25, 32.68s/it] 68%|██████▊ | 11746/17285 [105:10:29<47:30:45, 30.88s/it] 68%|██████▊ | 11747/17285 [105:11:04<49:31:09, 32.19s/it] 68%|██████▊ | 11748/17285 [105:11:34<48:15:05, 31.37s/it] 68%|██████▊ | 11749/17285 [105:12:08<49:34:58, 32.24s/it] 68%|██████▊ | 11750/17285 [105:12:43<50:53:40, 33.10s/it] {'loss': 1.2726, 'learning_rate': 5.320870946655765e-05, 'epoch': 2.04} + 68%|██████▊ | 11750/17285 [105:12:43<50:53:40, 33.10s/it] 68%|██████▊ | 11751/17285 [105:13:26<55:32:24, 36.13s/it] 68%|██████▊ | 11752/17285 [105:13:56<52:30:27, 34.16s/it] 68%|██████▊ | 11753/17285 [105:14:32<53:13:49, 34.64s/it] 68%|██████▊ | 11754/17285 [105:15:12<55:52:02, 36.36s/it] 68%|██████▊ | 11755/17285 [105:15:48<55:37:44, 36.21s/it] 68%|██████▊ | 11756/17285 [105:16:13<50:22:45, 32.80s/it] 68%|██████▊ | 11757/17285 [105:16:49<52:09:50, 33.97s/it] 68%|██████▊ | 11758/17285 [105:17:28<54:08:19, 35.26s/it] 68%|██████▊ | 11759/17285 [105:17:59<52:09:59, 33.98s/it] 68%|██████▊ | 11760/17285 [105:18:36<53:30:32, 34.87s/it] {'loss': 1.2303, 'learning_rate': 5.303970509409113e-05, 'epoch': 2.04} + 68%|██████▊ | 11760/17285 [105:18:36<53:30:32, 34.87s/it] 68%|██████▊ | 11761/17285 [105:19:15<55:23:48, 36.10s/it] 68%|██████▊ | 11762/17285 [105:19:52<55:50:21, 36.40s/it] 68%|██████▊ | 11763/17285 [105:20:21<52:43:07, 34.37s/it] 68%|██████▊ | 11764/17285 [105:20:52<50:55:50, 33.21s/it] 68%|██████▊ | 11765/17285 [105:21:27<51:54:50, 33.86s/it] 68%|██████▊ | 11766/17285 [105:21:56<49:45:59, 32.46s/it] 68%|██████▊ | 11767/17285 [105:22:29<49:36:51, 32.37s/it] 68%|██████▊ | 11768/17285 [105:22:57<47:53:39, 31.25s/it] 68%|██████▊ | 11769/17285 [105:23:25<46:30:21, 30.35s/it] 68%|██████▊ | 11770/17285 [105:24:02<49:09:49, 32.09s/it] {'loss': 1.2648, 'learning_rate': 5.2870872624888615e-05, 'epoch': 2.04} + 68%|██████▊ | 11770/17285 [105:24:02<49:09:49, 32.09s/it] 68%|██████▊ | 11771/17285 [105:24:30<47:38:11, 31.10s/it] 68%|██████▊ | 11772/17285 [105:25:08<50:32:10, 33.00s/it] 68%|██████▊ | 11773/17285 [105:25:41<50:30:13, 32.99s/it] 68%|██████▊ | 11774/17285 [105:26:12<49:49:26, 32.55s/it] 68%|██████▊ | 11775/17285 [105:26:40<47:42:50, 31.17s/it] 68%|██████▊ | 11776/17285 [105:27:11<47:34:54, 31.09s/it] 68%|██████▊ | 11777/17285 [105:27:45<48:59:03, 32.02s/it] 68%|██████▊ | 11778/17285 [105:28:15<47:56:07, 31.34s/it] 68%|██████▊ | 11779/17285 [105:28:49<49:04:32, 32.09s/it] 68%|██████▊ | 11780/17285 [105:29:28<52:13:21, 34.15s/it] {'loss': 1.2865, 'learning_rate': 5.2702212676979704e-05, 'epoch': 2.04} + 68%|██████▊ | 11780/17285 [105:29:28<52:13:21, 34.15s/it] 68%|██████▊ | 11781/17285 [105:30:03<52:27:17, 34.31s/it] 68%|██████▊ | 11782/17285 [105:30:29<48:56:51, 32.02s/it] 68%|██████▊ | 11783/17285 [105:31:02<49:24:00, 32.32s/it] 68%|██████▊ | 11784/17285 [105:31:28<46:09:59, 30.21s/it] 68%|██████▊ | 11785/17285 [105:31:53<44:09:16, 28.90s/it] 68%|██████▊ | 11786/17285 [105:32:22<43:59:59, 28.81s/it] 68%|██████▊ | 11787/17285 [105:32:47<42:07:15, 27.58s/it] 68%|██████▊ | 11788/17285 [105:33:15<42:37:54, 27.92s/it] 68%|██████▊ | 11789/17285 [105:33:45<43:25:42, 28.45s/it] 68%|██████▊ | 11790/17285 [105:34:11<42:25:15, 27.79s/it] {'loss': 1.2944, 'learning_rate': 5.253372586776248e-05, 'epoch': 2.05} + 68%|██████▊ | 11790/17285 [105:34:11<42:25:15, 27.79s/it] 68%|██████▊ | 11791/17285 [105:34:40<42:35:43, 27.91s/it] 68%|██████▊ | 11792/17285 [105:35:11<44:17:52, 29.03s/it] 68%|██████▊ | 11793/17285 [105:35:41<44:49:34, 29.38s/it] 68%|██████▊ | 11794/17285 [105:36:13<46:02:36, 30.19s/it] 68%|██████▊ | 11795/17285 [105:36:44<46:02:00, 30.19s/it] 68%|██████▊ | 11796/17285 [105:37:19<48:09:49, 31.59s/it] 68%|██████▊ | 11797/17285 [105:37:56<50:38:59, 33.23s/it] 68%|██████▊ | 11798/17285 [105:38:29<50:41:01, 33.25s/it] 68%|██████▊ | 11799/17285 [105:39:01<50:20:06, 33.03s/it] 68%|██████▊ | 11800/17285 [105:39:39<52:19:01, 34.34s/it] {'loss': 1.3188, 'learning_rate': 5.236541281400122e-05, 'epoch': 2.05} + 68%|██████▊ | 11800/17285 [105:39:39<52:19:01, 34.34s/it] 68%|██████▊ | 11801/17285 [105:40:17<54:14:53, 35.61s/it] 68%|██████▊ | 11802/17285 [105:40:45<50:34:23, 33.21s/it] 68%|██████▊ | 11803/17285 [105:41:10<46:48:07, 30.73s/it] 68%|██████▊ | 11804/17285 [105:41:51<51:36:03, 33.89s/it] 68%|██████▊ | 11805/17285 [105:42:26<51:56:22, 34.12s/it] 68%|██████▊ | 11806/17285 [105:42:58<51:01:48, 33.53s/it] 68%|██████▊ | 11807/17285 [105:43:28<49:12:02, 32.33s/it] 68%|██████▊ | 11808/17285 [105:44:02<49:56:32, 32.83s/it] 68%|██████▊ | 11809/17285 [105:44:28<47:03:02, 30.93s/it] 68%|██████▊ | 11810/17285 [105:44:59<47:02:59, 30.94s/it] {'loss': 1.3009, 'learning_rate': 5.219727413182419e-05, 'epoch': 2.05} + 68%|██████▊ | 11810/17285 [105:44:59<47:02:59, 30.94s/it] 68%|██████▊ | 11811/17285 [105:45:28<46:16:36, 30.43s/it] 68%|██████▊ | 11812/17285 [105:46:01<47:12:42, 31.05s/it] 68%|██████▊ | 11813/17285 [105:46:35<48:31:09, 31.92s/it] 68%|██████▊ | 11814/17285 [105:47:01<46:09:55, 30.38s/it] 68%|██████▊ | 11815/17285 [105:47:31<45:57:27, 30.25s/it] 68%|██████▊ | 11816/17285 [105:48:01<45:52:25, 30.20s/it] 68%|██████▊ | 11817/17285 [105:48:32<46:00:56, 30.30s/it] 68%|██████▊ | 11818/17285 [105:49:08<48:46:49, 32.12s/it] 68%|██████▊ | 11819/17285 [105:49:47<51:53:48, 34.18s/it] 68%|██████▊ | 11820/17285 [105:50:19<50:50:25, 33.49s/it] {'loss': 1.2727, 'learning_rate': 5.202931043672124e-05, 'epoch': 2.05} + 68%|██████▊ | 11820/17285 [105:50:19<50:50:25, 33.49s/it] 68%|██████▊ | 11821/17285 [105:50:52<50:32:06, 33.30s/it] 68%|██████▊ | 11822/17285 [105:51:37<55:55:20, 36.85s/it] 68%|██████▊ | 11823/17285 [105:52:07<52:40:00, 34.71s/it] 68%|██████▊ | 11824/17285 [105:52:38<50:57:09, 33.59s/it] 68%|██████▊ | 11825/17285 [105:53:02<46:48:36, 30.86s/it] 68%|██████▊ | 11826/17285 [105:53:37<48:24:52, 31.93s/it] 68%|██████▊ | 11827/17285 [105:54:10<49:08:35, 32.41s/it] 68%|██████▊ | 11828/17285 [105:54:37<46:37:20, 30.76s/it] 68%|██████▊ | 11829/17285 [105:55:15<49:58:39, 32.98s/it] 68%|██████▊ | 11830/17285 [105:55:45<48:24:54, 31.95s/it] {'loss': 1.2515, 'learning_rate': 5.186152234354172e-05, 'epoch': 2.05} + 68%|██████▊ | 11830/17285 [105:55:45<48:24:54, 31.95s/it] 68%|██████▊ | 11831/17285 [105:56:14<47:09:01, 31.12s/it][2023-08-27 09:51:21,667] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 68%|██████▊ | 11832/17285 [105:56:44<46:31:51, 30.72s/it] 68%|██████▊ | 11833/17285 [105:57:26<51:48:52, 34.21s/it] 68%|██████▊ | 11834/17285 [105:58:01<52:10:24, 34.46s/it][2023-08-27 09:53:09,052] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 68%|██████▊ | 11835/17285 [105:58:31<50:08:12, 33.12s/it] 68%|██████▊ | 11836/17285 [105:58:56<46:19:55, 30.61s/it] 68%|██████▊ | 11837/17285 [105:59:24<45:05:37, 29.80s/it] 68%|██████▊ | 11838/17285 [106:00:02<48:54:27, 32.32s/it] 68%|██████▊ | 11839/17285 [106:00:38<50:22:41, 33.30s/it] 68%|██████▊ | 11840/17285 [106:01:14<51:43:52, 34.20s/it] {'loss': 1.3243, 'learning_rate': 5.172741871515152e-05, 'epoch': 2.05} + 68%|██████▊ | 11840/17285 [106:01:14<51:43:52, 34.20s/it] 69%|██████▊ | 11841/17285 [106:01:55<54:34:58, 36.09s/it] 69%|██████▊ | 11842/17285 [106:02:20<49:48:15, 32.94s/it] 69%|██████▊ | 11843/17285 [106:02:49<47:56:43, 31.72s/it] 69%|██████▊ | 11844/17285 [106:03:22<48:30:30, 32.10s/it] 69%|██████▊ | 11845/17285 [106:03:56<49:30:26, 32.76s/it] 69%|██████▊ | 11846/17285 [106:04:28<48:54:39, 32.37s/it] 69%|██████▊ | 11847/17285 [106:04:58<48:00:37, 31.78s/it] 69%|██████▊ | 11848/17285 [106:05:29<47:33:54, 31.49s/it] 69%|██████▊ | 11849/17285 [106:06:05<49:24:41, 32.72s/it] 69%|██████▊ | 11850/17285 [106:06:33<47:14:23, 31.29s/it] {'loss': 1.3009, 'learning_rate': 5.1559948252801414e-05, 'epoch': 2.06} + 69%|██████▊ | 11850/17285 [106:06:33<47:14:23, 31.29s/it] 69%|██████▊ | 11851/17285 [106:07:02<46:32:14, 30.83s/it] 69%|██████▊ | 11852/17285 [106:07:34<46:40:48, 30.93s/it] 69%|██████▊ | 11853/17285 [106:08:01<45:05:36, 29.89s/it] 69%|██████▊ | 11854/17285 [106:08:28<43:51:09, 29.07s/it] 69%|██████▊ | 11855/17285 [106:08:58<44:21:36, 29.41s/it] 69%|██████▊ | 11856/17285 [106:09:36<47:51:46, 31.74s/it] 69%|██████▊ | 11857/17285 [106:10:06<47:24:34, 31.44s/it] 69%|██████▊ | 11858/17285 [106:10:35<45:56:49, 30.48s/it] 69%|██████▊ | 11859/17285 [106:11:16<50:45:47, 33.68s/it] 69%|██████▊ | 11860/17285 [106:11:48<49:58:44, 33.17s/it] {'loss': 1.3033, 'learning_rate': 5.139265511052607e-05, 'epoch': 2.06} + 69%|██████▊ | 11860/17285 [106:11:48<49:58:44, 33.17s/it] 69%|██████▊ | 11861/17285 [106:12:17<48:21:53, 32.10s/it] 69%|██████▊ | 11862/17285 [106:12:44<46:07:06, 30.62s/it] 69%|██████▊ | 11863/17285 [106:13:14<45:35:08, 30.27s/it] 69%|██████▊ | 11864/17285 [106:13:42<44:48:08, 29.75s/it] 69%|██████▊ | 11865/17285 [106:14:23<49:43:09, 33.02s/it] 69%|██████▊ | 11866/17285 [106:14:59<50:55:50, 33.83s/it] 69%|██████▊ | 11867/17285 [106:15:27<48:15:09, 32.06s/it] 69%|██████▊ | 11868/17285 [106:15:59<48:28:18, 32.21s/it] 69%|██████▊ | 11869/17285 [106:16:28<46:57:55, 31.22s/it] 69%|██████▊ | 11870/17285 [106:17:02<48:07:45, 32.00s/it] {'loss': 1.2961, 'learning_rate': 5.122553990072023e-05, 'epoch': 2.06} + 69%|██████▊ | 11870/17285 [106:17:02<48:07:45, 32.00s/it] 69%|██████▊ | 11871/17285 [106:17:37<49:27:08, 32.88s/it] 69%|██████▊ | 11872/17285 [106:18:19<53:23:17, 35.51s/it] 69%|██████▊ | 11873/17285 [106:18:55<53:44:52, 35.75s/it] 69%|██████▊ | 11874/17285 [106:19:22<49:57:14, 33.23s/it] 69%|██████▊ | 11875/17285 [106:19:54<49:25:29, 32.89s/it] 69%|██████▊ | 11876/17285 [106:20:35<53:08:07, 35.36s/it] 69%|██████▊ | 11877/17285 [106:21:12<53:26:12, 35.57s/it] 69%|██████▊ | 11878/17285 [106:21:50<54:50:36, 36.52s/it] 69%|██████▊ | 11879/17285 [106:22:16<49:55:12, 33.24s/it] 69%|██████▊ | 11880/17285 [106:22:52<51:19:55, 34.19s/it] {'loss': 1.2361, 'learning_rate': 5.10586032351273e-05, 'epoch': 2.06} + 69%|██████▊ | 11880/17285 [106:22:52<51:19:55, 34.19s/it] 69%|██████▊ | 11881/17285 [106:23:29<52:37:05, 35.05s/it][2023-08-27 10:18:40,627] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 69%|██████▊ | 11882/17285 [106:24:03<51:58:04, 34.63s/it] 69%|██████▊ | 11883/17285 [106:24:34<50:16:57, 33.51s/it] 69%|██████▉ | 11884/17285 [106:25:06<49:41:22, 33.12s/it] 69%|██████▉ | 11885/17285 [106:25:33<47:04:38, 31.38s/it] 69%|██████▉ | 11886/17285 [106:26:04<46:42:09, 31.14s/it] 69%|██████▉ | 11887/17285 [106:26:29<43:56:47, 29.31s/it] 69%|██████▉ | 11888/17285 [106:26:58<43:37:01, 29.09s/it] 69%|██████▉ | 11889/17285 [106:27:34<46:43:28, 31.17s/it] 69%|██████▉ | 11890/17285 [106:28:10<48:51:37, 32.60s/it] {'loss': 1.2856, 'learning_rate': 5.090851339647496e-05, 'epoch': 2.06} + 69%|██████▉ | 11890/17285 [106:28:10<48:51:37, 32.60s/it] 69%|██████▉ | 11891/17285 [106:28:38<46:56:47, 31.33s/it] 69%|██████▉ | 11892/17285 [106:29:03<44:09:34, 29.48s/it] 69%|██████▉ | 11893/17285 [106:29:29<42:22:14, 28.29s/it] 69%|██████▉ | 11894/17285 [106:29:56<41:48:22, 27.92s/it] 69%|██████▉ | 11895/17285 [106:30:37<47:40:08, 31.84s/it] 69%|██████▉ | 11896/17285 [106:31:04<45:30:31, 30.40s/it] 69%|██████▉ | 11897/17285 [106:31:30<43:29:23, 29.06s/it] 69%|██████▉ | 11898/17285 [106:32:02<44:48:36, 29.95s/it] 69%|██████▉ | 11899/17285 [106:32:34<45:58:30, 30.73s/it] 69%|██████▉ | 11900/17285 [106:33:04<45:41:45, 30.55s/it] {'loss': 1.2688, 'learning_rate': 5.074191764789694e-05, 'epoch': 2.07} + 69%|██████▉ | 11900/17285 [106:33:04<45:41:45, 30.55s/it] 69%|██████▉ | 11901/17285 [106:33:35<45:47:23, 30.62s/it] 69%|██████▉ | 11902/17285 [106:34:20<52:10:25, 34.89s/it] 69%|██████▉ | 11903/17285 [106:34:50<49:58:47, 33.43s/it] 69%|██████▉ | 11904/17285 [106:35:27<51:23:57, 34.39s/it] 69%|██████▉ | 11905/17285 [106:36:03<52:10:45, 34.92s/it] 69%|██████▉ | 11906/17285 [106:36:32<49:43:07, 33.28s/it] 69%|██████▉ | 11907/17285 [106:37:04<48:53:35, 32.73s/it] 69%|██████▉ | 11908/17285 [106:37:33<47:13:25, 31.62s/it] 69%|██████▉ | 11909/17285 [106:38:08<48:53:47, 32.74s/it] 69%|██████▉ | 11910/17285 [106:38:57<56:07:10, 37.59s/it] {'loss': 1.2338, 'learning_rate': 5.0575502213883655e-05, 'epoch': 2.07} + 69%|██████▉ | 11910/17285 [106:38:57<56:07:10, 37.59s/it] 69%|██████▉ | 11911/17285 [106:39:28<53:01:17, 35.52s/it] 69%|██████▉ | 11912/17285 [106:39:56<49:58:31, 33.48s/it] 69%|██████▉ | 11913/17285 [106:40:26<48:04:59, 32.22s/it] 69%|██████▉ | 11914/17285 [106:40:56<47:04:11, 31.55s/it] 69%|██████▉ | 11915/17285 [106:41:32<49:17:08, 33.04s/it] 69%|██████▉ | 11916/17285 [106:42:06<49:44:17, 33.35s/it] 69%|██████▉ | 11917/17285 [106:42:34<47:03:00, 31.55s/it] 69%|██████▉ | 11918/17285 [106:43:08<48:29:53, 32.53s/it] 69%|██████▉ | 11919/17285 [106:43:39<47:36:53, 31.94s/it] 69%|██████▉ | 11920/17285 [106:44:03<44:13:00, 29.67s/it] {'loss': 1.3065, 'learning_rate': 5.040926770361687e-05, 'epoch': 2.07} + 69%|██████▉ | 11920/17285 [106:44:03<44:13:00, 29.67s/it] 69%|██████▉ | 11921/17285 [106:44:39<46:46:11, 31.39s/it] 69%|██████▉ | 11922/17285 [106:45:12<47:42:10, 32.02s/it] 69%|██████▉ | 11923/17285 [106:45:46<48:28:14, 32.54s/it] 69%|██████▉ | 11924/17285 [106:46:14<46:28:44, 31.21s/it] 69%|██████▉ | 11925/17285 [106:46:44<45:41:29, 30.69s/it] 69%|██████▉ | 11926/17285 [106:47:15<46:06:02, 30.97s/it] 69%|██████▉ | 11927/17285 [106:47:45<45:31:07, 30.58s/it] 69%|██████▉ | 11928/17285 [106:48:16<45:46:41, 30.76s/it] 69%|██████▉ | 11929/17285 [106:48:54<48:59:12, 32.93s/it] 69%|██████▉ | 11930/17285 [106:49:30<50:18:32, 33.82s/it] {'loss': 1.2683, 'learning_rate': 5.0243214725616126e-05, 'epoch': 2.07} + 69%|██████▉ | 11930/17285 [106:49:30<50:18:32, 33.82s/it] 69%|██████▉ | 11931/17285 [106:49:55<46:32:36, 31.30s/it] 69%|██████▉ | 11932/17285 [106:50:26<46:26:47, 31.24s/it] 69%|██████▉ | 11933/17285 [106:50:58<46:45:30, 31.45s/it] 69%|██████▉ | 11934/17285 [106:51:23<43:43:03, 29.41s/it] 69%|██████▉ | 11935/17285 [106:51:53<43:56:12, 29.56s/it] 69%|██████▉ | 11936/17285 [106:52:35<49:20:15, 33.21s/it] 69%|██████▉ | 11937/17285 [106:53:03<46:57:24, 31.61s/it] 69%|██████▉ | 11938/17285 [106:53:40<49:23:51, 33.26s/it] 69%|██████▉ | 11939/17285 [106:54:18<51:52:50, 34.94s/it] 69%|██████▉ | 11940/17285 [106:54:47<49:12:29, 33.14s/it] {'loss': 1.3036, 'learning_rate': 5.00773438877363e-05, 'epoch': 2.07} + 69%|██████▉ | 11940/17285 [106:54:47<49:12:29, 33.14s/it] 69%|██████▉ | 11941/17285 [106:55:25<51:08:52, 34.46s/it] 69%|██████▉ | 11942/17285 [106:55:57<50:06:40, 33.76s/it] 69%|██████▉ | 11943/17285 [106:56:27<48:25:17, 32.63s/it] 69%|██████▉ | 11944/17285 [106:57:00<48:41:43, 32.82s/it] 69%|██████▉ | 11945/17285 [106:57:34<49:13:36, 33.19s/it] 69%|██████▉ | 11946/17285 [106:58:18<53:55:58, 36.37s/it] 69%|██████▉ | 11947/17285 [106:58:45<49:39:58, 33.50s/it] 69%|██████▉ | 11948/17285 [106:59:14<47:45:13, 32.21s/it] 69%|██████▉ | 11949/17285 [106:59:44<46:47:54, 31.57s/it] 69%|██████▉ | 11950/17285 [107:00:15<46:17:31, 31.24s/it] {'loss': 1.25, 'learning_rate': 4.99116557971657e-05, 'epoch': 2.07} + 69%|██████▉ | 11950/17285 [107:00:15<46:17:31, 31.24s/it] 69%|██████▉ | 11951/17285 [107:00:50<48:12:53, 32.54s/it] 69%|██████▉ | 11952/17285 [107:01:21<47:11:12, 31.85s/it] 69%|██████▉ | 11953/17285 [107:01:51<46:24:46, 31.34s/it] 69%|██████▉ | 11954/17285 [107:02:26<48:03:32, 32.45s/it] 69%|██████▉ | 11955/17285 [107:02:56<47:08:04, 31.84s/it] 69%|██████▉ | 11956/17285 [107:03:31<48:15:57, 32.61s/it] 69%|██████▉ | 11957/17285 [107:04:05<48:59:15, 33.10s/it] 69%|██████▉ | 11958/17285 [107:04:33<47:00:54, 31.77s/it] 69%|██████▉ | 11959/17285 [107:05:03<46:05:32, 31.16s/it] 69%|██████▉ | 11960/17285 [107:05:33<45:33:47, 30.80s/it] {'loss': 1.2719, 'learning_rate': 4.9746151060423564e-05, 'epoch': 2.08} + 69%|██████▉ | 11960/17285 [107:05:33<45:33:47, 30.80s/it] 69%|██████▉ | 11961/17285 [107:06:02<44:30:13, 30.09s/it] 69%|██████▉ | 11962/17285 [107:06:34<45:39:44, 30.88s/it] 69%|██████▉ | 11963/17285 [107:07:11<48:21:46, 32.71s/it] 69%|██████▉ | 11964/17285 [107:07:40<46:35:30, 31.52s/it] 69%|██████▉ | 11965/17285 [107:08:11<46:29:53, 31.47s/it] 69%|██████▉ | 11966/17285 [107:08:39<44:46:17, 30.30s/it] 69%|██████▉ | 11967/17285 [107:09:14<46:50:48, 31.71s/it] 69%|██████▉ | 11968/17285 [107:09:48<47:56:07, 32.46s/it] 69%|██████▉ | 11969/17285 [107:10:19<47:17:32, 32.03s/it] 69%|██████▉ | 11970/17285 [107:10:52<47:40:42, 32.29s/it] {'loss': 1.2411, 'learning_rate': 4.958083028335794e-05, 'epoch': 2.08} + 69%|██████▉ | 11970/17285 [107:10:52<47:40:42, 32.29s/it] 69%|██████▉ | 11971/17285 [107:11:25<48:02:32, 32.55s/it] 69%|██████▉ | 11972/17285 [107:11:55<46:35:29, 31.57s/it] 69%|██████▉ | 11973/17285 [107:12:28<47:12:22, 31.99s/it] 69%|██████▉ | 11974/17285 [107:13:07<50:42:20, 34.37s/it] 69%|██████▉ | 11975/17285 [107:13:40<49:57:15, 33.87s/it] 69%|██████▉ | 11976/17285 [107:14:12<49:10:44, 33.35s/it] 69%|██████▉ | 11977/17285 [107:14:43<48:08:43, 32.65s/it] 69%|██████▉ | 11978/17285 [107:15:19<49:39:03, 33.68s/it] 69%|██████▉ | 11979/17285 [107:15:50<48:18:22, 32.77s/it] 69%|██████▉ | 11980/17285 [107:16:28<50:45:31, 34.45s/it] {'loss': 1.286, 'learning_rate': 4.9415694071143584e-05, 'epoch': 2.08} + 69%|██████▉ | 11980/17285 [107:16:28<50:45:31, 34.45s/it] 69%|██████▉ | 11981/17285 [107:16:54<46:58:25, 31.88s/it] 69%|██████▉ | 11982/17285 [107:17:29<48:18:01, 32.79s/it] 69%|██████▉ | 11983/17285 [107:17:54<44:56:36, 30.52s/it] 69%|██████▉ | 11984/17285 [107:18:32<48:08:21, 32.69s/it] 69%|██████▉ | 11985/17285 [107:18:59<45:34:59, 30.96s/it] 69%|██████▉ | 11986/17285 [107:19:40<49:51:11, 33.87s/it] 69%|██████▉ | 11987/17285 [107:20:09<47:40:34, 32.40s/it] 69%|██████▉ | 11988/17285 [107:20:37<45:54:00, 31.20s/it] 69%|██████▉ | 11989/17285 [107:21:06<44:47:41, 30.45s/it] 69%|██████▉ | 11990/17285 [107:21:45<48:30:50, 32.98s/it] {'loss': 1.2786, 'learning_rate': 4.9250743028279486e-05, 'epoch': 2.08} + 69%|██████▉ | 11990/17285 [107:21:45<48:30:50, 32.98s/it] 69%|██████▉ | 11991/17285 [107:22:14<46:41:29, 31.75s/it] 69%|██████▉ | 11992/17285 [107:22:51<48:59:59, 33.33s/it] 69%|██████▉ | 11993/17285 [107:23:21<47:32:00, 32.34s/it] 69%|██████▉ | 11994/17285 [107:23:54<47:51:27, 32.56s/it] 69%|██████▉ | 11995/17285 [107:24:21<45:31:57, 30.99s/it] 69%|██████▉ | 11996/17285 [107:24:47<43:28:02, 29.59s/it] 69%|██████▉ | 11997/17285 [107:25:22<45:30:02, 30.98s/it] 69%|██████▉ | 11998/17285 [107:25:51<44:52:49, 30.56s/it] 69%|██████▉ | 11999/17285 [107:26:19<43:46:10, 29.81s/it] 69%|██████▉ | 12000/17285 [107:26:55<46:28:18, 31.66s/it] {'loss': 1.2634, 'learning_rate': 4.9085977758586906e-05, 'epoch': 2.08} + 69%|██████▉ | 12000/17285 [107:26:55<46:28:18, 31.66s/it][INFO|trainer.py:3081] 2023-08-27 11:21:32,889 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-27 11:21:32,891 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-27 11:21:32,891 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-9000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-12000 +[INFO|tokenization_utils_base.py:2210] 2023-08-27 11:22:58,162 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-12000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-27 11:22:58,166 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-12000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-12000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-12000 + 69%|██████▉ | 12001/17285 [107:29:01<87:57:08, 59.92s/it] 69%|██████▉ | 12002/17285 [107:29:34<76:07:09, 51.87s/it] 69%|██████▉ | 12003/17285 [107:30:12<69:48:39, 47.58s/it] 69%|██████▉ | 12004/17285 [107:30:38<60:27:09, 41.21s/it] 69%|██████▉ | 12005/17285 [107:31:05<54:22:34, 37.07s/it] 69%|██████▉ | 12006/17285 [107:31:37<51:57:34, 35.43s/it] 69%|██████▉ | 12007/17285 [107:32:16<53:31:43, 36.51s/it] 69%|██████▉ | 12008/17285 [107:32:42<48:41:25, 33.22s/it] 69%|██████▉ | 12009/17285 [107:33:10<46:33:29, 31.77s/it] 69%|██████▉ | 12010/17285 [107:33:42<46:41:38, 31.87s/it] {'loss': 1.3052, 'learning_rate': 4.8921398865207045e-05, 'epoch': 2.08} + 69%|██████▉ | 12010/17285 [107:33:42<46:41:38, 31.87s/it] 69%|██████▉ | 12011/17285 [107:34:14<46:45:53, 31.92s/it] 69%|██████▉ | 12012/17285 [107:34:44<45:57:49, 31.38s/it] 69%|██████▉ | 12013/17285 [107:35:23<49:09:24, 33.57s/it] 70%|██████▉ | 12014/17285 [107:35:51<46:54:48, 32.04s/it] 70%|██████▉ | 12015/17285 [107:36:28<48:41:24, 33.26s/it] 70%|██████▉ | 12016/17285 [107:36:54<45:32:27, 31.12s/it] 70%|██████▉ | 12017/17285 [107:37:21<43:46:25, 29.91s/it] 70%|██████▉ | 12018/17285 [107:37:50<43:26:24, 29.69s/it] 70%|██████▉ | 12019/17285 [107:38:22<44:19:58, 30.31s/it] 70%|██████▉ | 12020/17285 [107:39:01<48:18:57, 33.04s/it] {'loss': 1.3028, 'learning_rate': 4.875700695059875e-05, 'epoch': 2.09} + 70%|██████▉ | 12020/17285 [107:39:01<48:18:57, 33.04s/it] 70%|██████▉ | 12021/17285 [107:39:36<49:09:36, 33.62s/it] 70%|██████▉ | 12022/17285 [107:40:11<49:42:39, 34.00s/it] 70%|██████▉ | 12023/17285 [107:40:47<50:43:57, 34.71s/it] 70%|██████▉ | 12024/17285 [107:41:17<48:41:43, 33.32s/it] 70%|██████▉ | 12025/17285 [107:41:48<47:34:04, 32.56s/it] 70%|██████▉ | 12026/17285 [107:42:21<47:44:04, 32.68s/it] 70%|██████▉ | 12027/17285 [107:42:46<44:06:06, 30.20s/it] 70%|██████▉ | 12028/17285 [107:43:15<43:41:26, 29.92s/it] 70%|██████▉ | 12029/17285 [107:43:42<42:17:16, 28.96s/it] 70%|██████▉ | 12030/17285 [107:44:13<43:32:19, 29.83s/it] {'loss': 1.3132, 'learning_rate': 4.859280261653654e-05, 'epoch': 2.09} + 70%|██████▉ | 12030/17285 [107:44:13<43:32:19, 29.83s/it] 70%|██████▉ | 12031/17285 [107:44:43<43:15:23, 29.64s/it] 70%|██████▉ | 12032/17285 [107:45:13<43:44:20, 29.98s/it] 70%|██████▉ | 12033/17285 [107:45:46<44:52:56, 30.76s/it] 70%|██████▉ | 12034/17285 [107:46:20<46:21:14, 31.78s/it] 70%|██████▉ | 12035/17285 [107:46:51<45:53:21, 31.47s/it] 70%|██████▉ | 12036/17285 [107:47:18<44:09:46, 30.29s/it] 70%|██████▉ | 12037/17285 [107:47:48<43:54:47, 30.12s/it] 70%|██████▉ | 12038/17285 [107:48:15<42:20:18, 29.05s/it] 70%|██████▉ | 12039/17285 [107:48:48<44:03:16, 30.23s/it] 70%|██████▉ | 12040/17285 [107:49:22<45:57:31, 31.54s/it] {'loss': 1.3153, 'learning_rate': 4.8428786464108225e-05, 'epoch': 2.09} + 70%|██████▉ | 12040/17285 [107:49:22<45:57:31, 31.54s/it] 70%|██████▉ | 12041/17285 [107:49:49<44:03:39, 30.25s/it] 70%|██████▉ | 12042/17285 [107:50:16<42:22:31, 29.10s/it] 70%|██████▉ | 12043/17285 [107:50:48<43:31:44, 29.89s/it] 70%|██████▉ | 12044/17285 [107:51:20<44:38:03, 30.66s/it] 70%|██████▉ | 12045/17285 [107:51:49<44:02:39, 30.26s/it] 70%|██████▉ | 12046/17285 [107:52:20<44:19:55, 30.46s/it] 70%|██████▉ | 12047/17285 [107:52:50<44:06:01, 30.31s/it] 70%|██████▉ | 12048/17285 [107:53:25<46:11:36, 31.75s/it] 70%|██████▉ | 12049/17285 [107:53:52<44:03:44, 30.30s/it] 70%|██████▉ | 12050/17285 [107:54:24<44:40:17, 30.72s/it] {'loss': 1.3391, 'learning_rate': 4.826495909371276e-05, 'epoch': 2.09} + 70%|██████▉ | 12050/17285 [107:54:24<44:40:17, 30.72s/it] 70%|██████▉ | 12051/17285 [107:54:53<43:46:25, 30.11s/it] 70%|██████▉ | 12052/17285 [107:55:23<43:55:59, 30.22s/it] 70%|██████▉ | 12053/17285 [107:56:01<47:04:20, 32.39s/it] 70%|██████▉ | 12054/17285 [107:56:32<46:43:48, 32.16s/it] 70%|██████▉ | 12055/17285 [107:57:01<45:19:21, 31.20s/it] 70%|██████▉ | 12056/17285 [107:57:27<43:07:57, 29.70s/it] 70%|██████▉ | 12057/17285 [107:57:55<42:16:06, 29.11s/it] 70%|██████▉ | 12058/17285 [107:58:28<43:50:01, 30.19s/it] 70%|██████▉ | 12059/17285 [107:59:01<45:06:48, 31.08s/it] 70%|██████▉ | 12060/17285 [107:59:37<47:27:22, 32.70s/it] {'loss': 1.2821, 'learning_rate': 4.810132110505804e-05, 'epoch': 2.09} + 70%|██████▉ | 12060/17285 [107:59:38<47:27:22, 32.70s/it] 70%|██████▉ | 12061/17285 [108:00:09<46:55:10, 32.33s/it] 70%|██████▉ | 12062/17285 [108:00:44<48:14:16, 33.25s/it] 70%|██████▉ | 12063/17285 [108:01:19<48:52:48, 33.70s/it] 70%|██████▉ | 12064/17285 [108:02:09<55:50:18, 38.50s/it] 70%|██████▉ | 12065/17285 [108:02:43<53:44:40, 37.07s/it] 70%|██████▉ | 12066/17285 [108:03:18<53:03:45, 36.60s/it] 70%|██████▉ | 12067/17285 [108:03:49<50:30:04, 34.84s/it] 70%|██████▉ | 12068/17285 [108:04:18<48:07:43, 33.21s/it] 70%|██████▉ | 12069/17285 [108:04:50<47:36:34, 32.86s/it] 70%|██████▉ | 12070/17285 [108:05:20<46:05:51, 31.82s/it] {'loss': 1.2542, 'learning_rate': 4.793787309715871e-05, 'epoch': 2.09} + 70%|██████▉ | 12070/17285 [108:05:20<46:05:51, 31.82s/it] 70%|██████▉ | 12071/17285 [108:05:53<46:45:44, 32.29s/it] 70%|██████▉ | 12072/17285 [108:06:26<46:58:01, 32.43s/it] 70%|██████▉ | 12073/17285 [108:07:01<47:57:24, 33.12s/it] 70%|██████▉ | 12074/17285 [108:07:39<50:11:15, 34.67s/it] 70%|██████▉ | 12075/17285 [108:08:08<47:51:19, 33.07s/it][2023-08-27 12:03:19,911] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 70%|██████▉ | 12076/17285 [108:08:42<48:17:49, 33.38s/it] 70%|██████▉ | 12077/17285 [108:09:12<46:33:06, 32.18s/it] 70%|██████▉ | 12078/17285 [108:09:42<45:33:46, 31.50s/it] 70%|██████▉ | 12079/17285 [108:10:12<45:12:35, 31.26s/it] 70%|██████▉ | 12080/17285 [108:10:39<43:26:07, 30.04s/it] {'loss': 1.2344, 'learning_rate': 4.779093281812042e-05, 'epoch': 2.1} + 70%|██████▉ | 12080/17285 [108:10:39<43:26:07, 30.04s/it] 70%|██████▉ | 12081/17285 [108:11:19<47:29:32, 32.85s/it] 70%|██████▉ | 12082/17285 [108:11:47<45:19:03, 31.36s/it] 70%|██████▉ | 12083/17285 [108:12:28<49:40:13, 34.37s/it] 70%|██████▉ | 12084/17285 [108:12:56<46:45:00, 32.36s/it] 70%|██████▉ | 12085/17285 [108:13:28<46:35:47, 32.26s/it] 70%|██████▉ | 12086/17285 [108:13:58<45:52:55, 31.77s/it] 70%|██████▉ | 12087/17285 [108:14:28<44:54:29, 31.10s/it] 70%|██████▉ | 12088/17285 [108:14:57<43:52:29, 30.39s/it] 70%|██████▉ | 12089/17285 [108:15:25<43:06:51, 29.87s/it] 70%|██████▉ | 12090/17285 [108:15:58<44:10:56, 30.62s/it] {'loss': 1.2916, 'learning_rate': 4.7627847421449165e-05, 'epoch': 2.1} + 70%|██████▉ | 12090/17285 [108:15:58<44:10:56, 30.62s/it] 70%|██████▉ | 12091/17285 [108:16:25<42:34:33, 29.51s/it] 70%|██████▉ | 12092/17285 [108:16:53<41:53:50, 29.04s/it] 70%|██████▉ | 12093/17285 [108:17:27<44:11:14, 30.64s/it] 70%|██████▉ | 12094/17285 [108:17:58<44:24:12, 30.79s/it] 70%|██████▉ | 12095/17285 [108:18:26<43:10:47, 29.95s/it] 70%|██████▉ | 12096/17285 [108:18:53<41:41:18, 28.92s/it] 70%|██████▉ | 12097/17285 [108:19:26<43:48:15, 30.40s/it] 70%|██████▉ | 12098/17285 [108:20:00<45:04:26, 31.28s/it] 70%|██████▉ | 12099/17285 [108:20:27<43:04:37, 29.90s/it] 70%|███████ | 12100/17285 [108:20:57<43:10:58, 29.98s/it] {'loss': 1.2703, 'learning_rate': 4.746495373873521e-05, 'epoch': 2.1} + 70%|███████ | 12100/17285 [108:20:57<43:10:58, 29.98s/it] 70%|███████ | 12101/17285 [108:21:22<41:20:43, 28.71s/it] 70%|███████ | 12102/17285 [108:21:55<43:06:32, 29.94s/it] 70%|███████ | 12103/17285 [108:22:27<43:41:40, 30.36s/it] 70%|███████ | 12104/17285 [108:22:57<43:38:47, 30.33s/it] 70%|███████ | 12105/17285 [108:23:29<44:37:46, 31.02s/it] 70%|███████ | 12106/17285 [108:23:57<43:19:41, 30.12s/it] 70%|███████ | 12107/17285 [108:24:32<45:21:21, 31.53s/it] 70%|████��██ | 12108/17285 [108:25:05<45:42:24, 31.78s/it] 70%|███████ | 12109/17285 [108:25:39<46:53:07, 32.61s/it] 70%|███████ | 12110/17285 [108:26:10<46:10:40, 32.12s/it] {'loss': 1.3033, 'learning_rate': 4.730225236626855e-05, 'epoch': 2.1} + 70%|███████ | 12110/17285 [108:26:10<46:10:40, 32.12s/it] 70%|███████ | 12111/17285 [108:26:51<49:46:42, 34.64s/it] 70%|███████ | 12112/17285 [108:27:26<49:55:02, 34.74s/it] 70%|███████ | 12113/17285 [108:27:55<47:21:49, 32.97s/it] 70%|███████ | 12114/17285 [108:28:30<48:38:36, 33.87s/it] 70%|███████ | 12115/17285 [108:29:00<46:52:31, 32.64s/it] 70%|███████ | 12116/17285 [108:29:28<44:56:10, 31.30s/it] 70%|███████ | 12117/17285 [108:30:03<46:14:55, 32.22s/it] 70%|███████ | 12118/17285 [108:30:45<50:24:09, 35.12s/it] 70%|███████ | 12119/17285 [108:31:17<49:23:40, 34.42s/it] 70%|███████ | 12120/17285 [108:31:55<50:51:19, 35.45s/it] {'loss': 1.2804, 'learning_rate': 4.713974389963527e-05, 'epoch': 2.1} + 70%|███████ | 12120/17285 [108:31:55<50:51:19, 35.45s/it] 70%|███████ | 12121/17285 [108:32:25<48:24:54, 33.75s/it] 70%|███████ | 12122/17285 [108:32:55<46:57:01, 32.74s/it] 70%|███████ | 12123/17285 [108:33:27<46:17:39, 32.29s/it] 70%|███████ | 12124/17285 [108:33:56<45:04:19, 31.44s/it] 70%|███████ | 12125/17285 [108:34:32<46:56:28, 32.75s/it] 70%|███████ | 12126/17285 [108:35:01<45:11:34, 31.54s/it] 70%|███████ | 12127/17285 [108:35:27<43:00:25, 30.02s/it] 70%|███████ | 12128/17285 [108:35:57<42:55:25, 29.96s/it] 70%|███████ | 12129/17285 [108:36:24<41:34:47, 29.03s/it] 70%|███████ | 12130/17285 [108:36:54<42:08:34, 29.43s/it] {'loss': 1.317, 'learning_rate': 4.697742893371525e-05, 'epoch': 2.11} + 70%|███████ | 12130/17285 [108:36:54<42:08:34, 29.43s/it] 70%|███████ | 12131/17285 [108:37:28<44:03:14, 30.77s/it] 70%|███████ | 12132/17285 [108:38:00<44:21:31, 30.99s/it] 70%|███████ | 12133/17285 [108:38:32<45:06:51, 31.52s/it] 70%|███████ | 12134/17285 [108:39:03<44:39:15, 31.21s/it] 70%|███████ | 12135/17285 [108:39:42<47:55:15, 33.50s/it] 70%|███████ | 12136/17285 [108:40:11<46:15:08, 32.34s/it] 70%|███████ | 12137/17285 [108:40:41<45:10:15, 31.59s/it] 70%|███████ | 12138/17285 [108:41:15<46:08:26, 32.27s/it] 70%|███████ | 12139/17285 [108:41:41<43:29:46, 30.43s/it] 70%|███████ | 12140/17285 [108:42:19<46:47:11, 32.74s/it] {'loss': 1.3142, 'learning_rate': 4.6815308062680086e-05, 'epoch': 2.11} + 70%|███████ | 12140/17285 [108:42:19<46:47:11, 32.74s/it] 70%|███████ | 12141/17285 [108:42:49<45:27:20, 31.81s/it] 70%|███████ | 12142/17285 [108:43:20<45:12:21, 31.64s/it] 70%|███████ | 12143/17285 [108:43:55<46:32:43, 32.59s/it] 70%|███████ | 12144/17285 [108:44:25<45:22:40, 31.78s/it] 70%|███████ | 12145/17285 [108:44:59<46:16:56, 32.42s/it] 70%|███████ | 12146/17285 [108:45:30<45:43:17, 32.03s/it] 70%|███████ | 12147/17285 [108:46:01<45:31:48, 31.90s/it] 70%|███████ | 12148/17285 [108:46:35<46:04:10, 32.29s/it] 70%|███████ | 12149/17285 [108:47:10<47:26:36, 33.25s/it] 70%|███████ | 12150/17285 [108:47:44<47:52:07, 33.56s/it] {'loss': 1.2592, 'learning_rate': 4.665338187999084e-05, 'epoch': 2.11} + 70%|███████ | 12150/17285 [108:47:44<47:52:07, 33.56s/it] 70%|███████ | 12151/17285 [108:48:19<48:14:53, 33.83s/it] 70%|███████ | 12152/17285 [108:48:50<47:14:01, 33.13s/it] 70%|███████ | 12153/17285 [108:49:31<50:23:52, 35.35s/it] 70%|███████ | 12154/17285 [108:50:08<51:09:26, 35.89s/it] 70%|███████ | 12155/17285 [108:50:37<48:21:51, 33.94s/it] 70%|███████ | 12156/17285 [108:51:15<49:48:24, 34.96s/it] 70%|███████ | 12157/17285 [108:51:59<53:37:15, 37.64s/it] 70%|███████ | 12158/17285 [108:52:34<52:25:43, 36.81s/it] 70%|███████ | 12159/17285 [108:52:58<47:06:16, 33.08s/it] 70%|███████ | 12160/17285 [108:53:25<44:22:53, 31.18s/it] {'loss': 1.2689, 'learning_rate': 4.649165097839591e-05, 'epoch': 2.11} + 70%|███████ | 12160/17285 [108:53:25<44:22:53, 31.18s/it] 70%|███████ | 12161/17285 [108:53:54<43:41:25, 30.70s/it] 70%|███████ | 12162/17285 [108:54:24<43:23:37, 30.49s/it] 70%|███████ | 12163/17285 [108:54:49<41:06:02, 28.89s/it] 70%|███████ | 12164/17285 [108:55:21<42:20:51, 29.77s/it] 70%|███████ | 12165/17285 [108:55:51<42:23:27, 29.81s/it] 70%|███████ | 12166/17285 [108:56:18<41:08:59, 28.94s/it] 70%|███████ | 12167/17285 [108:56:50<42:25:25, 29.84s/it] 70%|███████ | 12168/17285 [108:57:24<43:59:21, 30.95s/it] 70%|███████ | 12169/17285 [108:58:04<47:51:52, 33.68s/it] 70%|███████ | 12170/17285 [108:58:40<48:58:39, 34.47s/it] {'loss': 1.2734, 'learning_rate': 4.6330115949928876e-05, 'epoch': 2.11} + 70%|███████ | 12170/17285 [108:58:40<48:58:39, 34.47s/it] 70%|███████ | 12171/17285 [108:59:07<45:56:05, 32.34s/it] 70%|███████ | 12172/17285 [108:59:38<45:07:56, 31.78s/it] 70%|███████ | 12173/17285 [109:00:12<46:13:00, 32.55s/it][2023-08-27 12:55:24,376] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 70%|███████ | 12174/17285 [109:00:47<47:04:45, 33.16s/it] 70%|███████ | 12175/17285 [109:01:17<45:41:48, 32.19s/it] 70%|███████ | 12176/17285 [109:01:51<46:41:48, 32.90s/it] 70%|███████ | 12177/17285 [109:02:22<45:57:13, 32.39s/it] 70%|███████ | 12178/17285 [109:03:00<48:12:19, 33.98s/it] 70%|███████ | 12179/17285 [109:03:32<47:11:54, 33.28s/it] 70%|███████ | 12180/17285 [109:04:06<47:33:31, 33.54s/it] {'loss': 1.2713, 'learning_rate': 4.618490238457079e-05, 'epoch': 2.11} + 70%|███████ | 12180/17285 [109:04:06<47:33:31, 33.54s/it] 70%|███████ | 12181/17285 [109:04:42<48:34:59, 34.27s/it] 70%|███████ | 12182/17285 [109:05:15<47:56:42, 33.82s/it] 70%|███████ | 12183/17285 [109:05:46<47:05:31, 33.23s/it] 70%|███████ | 12184/17285 [109:06:17<45:46:50, 32.31s/it] 70%|███████ | 12185/17285 [109:06:45<44:01:18, 31.07s/it] 71%|███████ | 12186/17285 [109:07:16<43:52:49, 30.98s/it] 71%|███████ | 12187/17285 [109:07:47<43:57:35, 31.04s/it] 71%|███████ | 12188/17285 [109:08:18<43:51:47, 30.98s/it] 71%|███████ | 12189/17285 [109:08:45<42:16:09, 29.86s/it] 71%|███████ | 12190/17285 [109:09:15<42:26:19, 29.99s/it] {'loss': 1.3216, 'learning_rate': 4.602374114352934e-05, 'epoch': 2.12} + 71%|███████ | 12190/17285 [109:09:15<42:26:19, 29.99s/it] 71%|███████ | 12191/17285 [109:09:43<41:29:01, 29.32s/it] 71%|███████ | 12192/17285 [109:10:10<40:32:45, 28.66s/it] 71%|███████ | 12193/17285 [109:10:45<43:03:38, 30.44s/it] 71%|███████ | 12194/17285 [109:11:16<43:19:38, 30.64s/it] 71%|███████ | 12195/17285 [109:11:52<45:44:10, 32.35s/it] 71%|███████ | 12196/17285 [109:12:23<45:04:07, 31.88s/it] 71%|███████ | 12197/17285 [109:12:50<43:15:04, 30.60s/it] 71%|███████ | 12198/17285 [109:13:27<45:41:33, 32.34s/it] 71%|███████ | 12199/17285 [109:14:02<46:53:30, 33.19s/it] 71%|███████ | 12200/17285 [109:14:41<49:15:36, 34.87s/it] {'loss': 1.2775, 'learning_rate': 4.586277748845055e-05, 'epoch': 2.12} + 71%|███████ | 12200/17285 [109:14:41<49:15:36, 34.87s/it] 71%|███████ | 12201/17285 [109:15:24<52:36:09, 37.25s/it] 71%|███████ | 12202/17285 [109:15:55<50:08:07, 35.51s/it] 71%|███████ | 12203/17285 [109:16:21<46:14:23, 32.76s/it] 71%|███████ | 12204/17285 [109:16:50<44:37:47, 31.62s/it] 71%|███████ | 12205/17285 [109:17:20<43:59:03, 31.17s/it] 71%|███████ | 12206/17285 [109:17:51<43:44:57, 31.01s/it] 71%|███████ | 12207/17285 [109:18:22<43:38:05, 30.93s/it] 71%|███████ | 12208/17285 [109:18:56<45:09:07, 32.02s/it] 71%|███████ | 12209/17285 [109:19:25<43:49:38, 31.08s/it] 71%|███████ | 12210/17285 [109:19:59<45:07:37, 32.01s/it] {'loss': 1.2749, 'learning_rate': 4.570201200855939e-05, 'epoch': 2.12} + 71%|███████ | 12210/17285 [109:20:00<45:07:37, 32.01s/it] 71%|███████ | 12211/17285 [109:20:36<46:54:53, 33.29s/it] 71%|███████ | 12212/17285 [109:21:03<44:26:19, 31.54s/it] 71%|███████ | 12213/17285 [109:21:39<46:20:48, 32.90s/it] 71%|███████ | 12214/17285 [109:22:10<45:33:38, 32.34s/it] 71%|███████ | 12215/17285 [109:22:46<46:52:15, 33.28s/it] 71%|███████ | 12216/17285 [109:23:16<45:31:47, 32.34s/it] 71%|███████ | 12217/17285 [109:23:45<44:16:46, 31.45s/it] 71%|███████ | 12218/17285 [109:24:16<43:47:11, 31.11s/it] 71%|███████ | 12219/17285 [109:24:55<47:06:08, 33.47s/it] 71%|███████ | 12220/17285 [109:25:23<44:53:22, 31.91s/it] {'loss': 1.2809, 'learning_rate': 4.554144529235537e-05, 'epoch': 2.12} + 71%|███████ | 12220/17285 [109:25:23<44:53:22, 31.91s/it] 71%|███████ | 12221/17285 [109:26:00<47:00:21, 33.42s/it] 71%|███████ | 12222/17285 [109:26:25<43:23:39, 30.86s/it] 71%|███████ | 12223/17285 [109:27:08<48:40:15, 34.61s/it] 71%|███████ | 12224/17285 [109:27:47<50:32:55, 35.96s/it] 71%|███████ | 12225/17285 [109:28:18<48:33:16, 34.54s/it] 71%|███████ | 12226/17285 [109:28:55<49:20:05, 35.11s/it] 71%|███████ | 12227/17285 [109:29:22<45:53:56, 32.67s/it] 71%|███████ | 12228/17285 [109:30:00<48:01:31, 34.19s/it] 71%|███████ | 12229/17285 [109:30:27<45:06:38, 32.12s/it] 71%|███████ | 12230/17285 [109:31:10<49:51:32, 35.51s/it] {'loss': 1.2817, 'learning_rate': 4.538107792761041e-05, 'epoch': 2.12} + 71%|███████ | 12230/17285 [109:31:10<49:51:32, 35.51s/it] 71%|███████ | 12231/17285 [109:31:35<45:13:02, 32.21s/it] 71%|███████ | 12232/17285 [109:32:08<45:41:40, 32.56s/it] 71%|███████ | 12233/17285 [109:32:36<43:32:09, 31.02s/it] 71%|███████ | 12234/17285 [109:33:19<48:43:17, 34.73s/it] 71%|███████ | 12235/17285 [109:33:48<46:26:35, 33.11s/it] 71%|███████ | 12236/17285 [109:34:22<46:48:13, 33.37s/it] 71%|███████ | 12237/17285 [109:34:51<44:38:11, 31.83s/it] 71%|███████ | 12238/17285 [109:35:32<48:40:30, 34.72s/it] 71%|███████ | 12239/17285 [109:36:03<47:08:41, 33.63s/it] 71%|███████ | 12240/17285 [109:36:33<45:37:14, 32.55s/it] {'loss': 1.2324, 'learning_rate': 4.522091050136663e-05, 'epoch': 2.12} + 71%|███████ | 12240/17285 [109:36:33<45:37:14, 32.55s/it] 71%|███████ | 12241/17285 [109:37:00<43:14:24, 30.86s/it] 71%|███████ | 12242/17285 [109:37:32<43:50:02, 31.29s/it] 71%|███████ | 12243/17285 [109:38:11<47:00:22, 33.56s/it] 71%|███████ | 12244/17285 [109:38:50<49:07:04, 35.08s/it] 71%|███████ | 12245/17285 [109:39:20<47:15:11, 33.75s/it] 71%|███████ | 12246/17285 [109:39:55<47:38:59, 34.04s/it] 71%|███████ | 12247/17285 [109:40:22<44:43:10, 31.96s/it] 71%|███████ | 12248/17285 [109:41:02<48:11:36, 34.44s/it] 71%|███████ | 12249/17285 [109:41:37<48:15:21, 34.50s/it] 71%|███████ | 12250/17285 [109:42:14<49:19:04, 35.26s/it] {'loss': 1.2649, 'learning_rate': 4.50609435999344e-05, 'epoch': 2.13} + 71%|███████ | 12250/17285 [109:42:14<49:19:04, 35.26s/it] 71%|███████ | 12251/17285 [109:42:42<46:19:19, 33.13s/it] 71%|███████ | 12252/17285 [109:43:13<45:08:59, 32.29s/it] 71%|███████ | 12253/17285 [109:43:38<42:01:34, 30.07s/it] 71%|███████ | 12254/17285 [109:44:17<45:47:09, 32.76s/it] 71%|███████ | 12255/17285 [109:44:51<46:24:47, 33.22s/it] 71%|███████ | 12256/17285 [109:45:21<45:17:10, 32.42s/it] 71%|███████ | 12257/17285 [109:45:58<46:53:56, 33.58s/it] 71%|███████ | 12258/17285 [109:46:34<47:55:38, 34.32s/it] 71%|███████ | 12259/17285 [109:47:08<48:00:40, 34.39s/it] 71%|███████ | 12260/17285 [109:47:37<45:43:41, 32.76s/it] {'loss': 1.2493, 'learning_rate': 4.4901177808889936e-05, 'epoch': 2.13} + 71%|███████ | 12260/17285 [109:47:37<45:43:41, 32.76s/it] 71%|███████ | 12261/17285 [109:48:10<45:31:47, 32.62s/it] 71%|███████ | 12262/17285 [109:48:39<44:18:40, 31.76s/it] 71%|███████ | 12263/17285 [109:49:15<45:49:57, 32.85s/it] 71%|███████ | 12264/17285 [109:49:43<44:00:41, 31.56s/it] 71%|███████ | 12265/17285 [109:50:15<44:11:12, 31.69s/it] 71%|█��█████ | 12266/17285 [109:50:43<42:27:59, 30.46s/it] 71%|███████ | 12267/17285 [109:51:20<45:21:21, 32.54s/it] 71%|███████ | 12268/17285 [109:51:57<47:14:29, 33.90s/it] 71%|███████ | 12269/17285 [109:52:31<47:09:32, 33.85s/it] 71%|███████ | 12270/17285 [109:53:00<45:09:39, 32.42s/it] {'loss': 1.2946, 'learning_rate': 4.474161371307322e-05, 'epoch': 2.13} + 71%|███████ | 12270/17285 [109:53:00<45:09:39, 32.42s/it] 71%|███████ | 12271/17285 [109:53:31<44:42:20, 32.10s/it] 71%|███████ | 12272/17285 [109:54:05<45:22:57, 32.59s/it] 71%|███████ | 12273/17285 [109:54:36<44:47:51, 32.18s/it] 71%|███████ | 12274/17285 [109:55:06<43:40:32, 31.38s/it] 71%|███████ | 12275/17285 [109:55:30<40:36:34, 29.18s/it] 71%|███████ | 12276/17285 [109:56:04<42:49:27, 30.78s/it] 71%|███████ | 12277/17285 [109:56:37<43:22:27, 31.18s/it] 71%|███████ | 12278/17285 [109:57:04<41:42:55, 29.99s/it] 71%|███████ | 12279/17285 [109:57:30<39:58:54, 28.75s/it] 71%|███████ | 12280/17285 [109:58:12<45:50:07, 32.97s/it] {'loss': 1.2523, 'learning_rate': 4.458225189658598e-05, 'epoch': 2.13} + 71%|███████ | 12280/17285 [109:58:12<45:50:07, 32.97s/it] 71%|███████ | 12281/17285 [109:58:41<44:06:15, 31.73s/it] 71%|███████ | 12282/17285 [109:59:17<45:54:47, 33.04s/it] 71%|███████ | 12283/17285 [109:59:44<43:08:18, 31.05s/it] 71%|███████ | 12284/17285 [110:00:17<43:52:28, 31.58s/it] 71%|███████ | 12285/17285 [110:00:41<40:43:05, 29.32s/it] 71%|███████ | 12286/17285 [110:01:08<39:53:56, 28.73s/it] 71%|███████ | 12287/17285 [110:01:39<40:41:06, 29.30s/it] 71%|███████ | 12288/17285 [110:02:13<42:45:11, 30.80s/it] 71%|███████ | 12289/17285 [110:02:47<44:04:52, 31.76s/it] 71%|███████ | 12290/17285 [110:03:22<45:15:24, 32.62s/it] {'loss': 1.2769, 'learning_rate': 4.44230929427895e-05, 'epoch': 2.13} + 71%|███████ | 12290/17285 [110:03:22<45:15:24, 32.62s/it] 71%|███████ | 12291/17285 [110:03:55<45:26:25, 32.76s/it] 71%|███████ | 12292/17285 [110:04:26<44:40:41, 32.21s/it] 71%|███████ | 12293/17285 [110:04:56<43:50:25, 31.62s/it] 71%|███████ | 12294/17285 [110:05:24<42:18:44, 30.52s/it] 71%|███████ | 12295/17285 [110:05:52<41:11:01, 29.71s/it] 71%|███████ | 12296/17285 [110:06:34<46:23:25, 33.47s/it] 71%|███████ | 12297/17285 [110:06:58<42:31:22, 30.69s/it] 71%|███████ | 12298/17285 [110:07:27<41:52:06, 30.22s/it] 71%|███████ | 12299/17285 [110:07:58<42:01:41, 30.35s/it] 71%|███████ | 12300/17285 [110:08:22<39:31:41, 28.55s/it] {'loss': 1.2823, 'learning_rate': 4.426413743430241e-05, 'epoch': 2.13} + 71%|███████ | 12300/17285 [110:08:22<39:31:41, 28.55s/it] 71%|███████ | 12301/17285 [110:08:51<39:38:03, 28.63s/it] 71%|███████ | 12302/17285 [110:09:29<43:31:27, 31.44s/it] 71%|███████ | 12303/17285 [110:10:03<44:36:49, 32.24s/it] 71%|███████ | 12304/17285 [110:10:29<41:53:43, 30.28s/it] 71%|███████ | 12305/17285 [110:11:01<42:50:36, 30.97s/it] 71%|███████ | 12306/17285 [110:11:37<44:48:17, 32.40s/it] 71%|███████ | 12307/17285 [110:12:07<43:46:37, 31.66s/it] 71%|███████ | 12308/17285 [110:12:37<42:51:13, 31.00s/it] 71%|███████ | 12309/17285 [110:13:06<42:06:46, 30.47s/it] 71%|███████ | 12310/17285 [110:13:40<43:32:31, 31.51s/it] {'loss': 1.2536, 'learning_rate': 4.410538595299864e-05, 'epoch': 2.14} + 71%|███████ | 12310/17285 [110:13:40<43:32:31, 31.51s/it] 71%|███████ | 12311/17285 [110:14:08<42:16:54, 30.60s/it] 71%|███████ | 12312/17285 [110:14:46<45:17:52, 32.79s/it][2023-08-27 14:09:54,852] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 71%|███████ | 12313/17285 [110:15:17<44:34:50, 32.28s/it] 71%|███████ | 12314/17285 [110:16:02<49:48:52, 36.08s/it] 71%|███████ | 12315/17285 [110:16:30<46:34:32, 33.74s/it] 71%|███████▏ | 12316/17285 [110:17:02<45:47:33, 33.18s/it] 71%|███████▏ | 12317/17285 [110:17:38<47:02:00, 34.08s/it] 71%|███████▏ | 12318/17285 [110:18:07<44:49:27, 32.49s/it] 71%|███████▏ | 12319/17285 [110:18:38<43:56:48, 31.86s/it] 71%|███████▏ | 12320/17285 [110:19:17<47:00:33, 34.09s/it] {'loss': 1.2686, 'learning_rate': 4.3962684543383956e-05, 'epoch': 2.14} + 71%|███████▏ | 12320/17285 [110:19:17<47:00:33, 34.09s/it] 71%|███████▏ | 12321/17285 [110:19:55<48:46:26, 35.37s/it] 71%|███████▏ | 12322/17285 [110:20:24<46:09:40, 33.48s/it] 71%|███████▏ | 12323/17285 [110:20:52<43:44:33, 31.74s/it] 71%|███████▏ | 12324/17285 [110:21:26<44:46:15, 32.49s/it] 71%|███████▏ | 12325/17285 [110:22:00<45:10:51, 32.79s/it] 71%|███████▏ | 12326/17285 [110:22:32<44:45:52, 32.50s/it] 71%|███████▏ | 12327/17285 [110:23:05<45:03:52, 32.72s/it] 71%|███████▏ | 12328/17285 [110:23:31<42:11:28, 30.64s/it] 71%|███████▏ | 12329/17285 [110:24:06<44:11:15, 32.10s/it] 71%|███████▏ | 12330/17285 [110:24:41<45:16:13, 32.89s/it] {'loss': 1.2826, 'learning_rate': 4.380432231411452e-05, 'epoch': 2.14} + 71%|███████▏ | 12330/17285 [110:24:41<45:16:13, 32.89s/it] 71%|███████▏ | 12331/17285 [110:25:21<48:23:14, 35.16s/it] 71%|███████▏ | 12332/17285 [110:25:59<49:29:39, 35.97s/it] 71%|███████▏ | 12333/17285 [110:26:36<50:01:26, 36.37s/it] 71%|███████▏ | 12334/17285 [110:27:05<46:49:03, 34.04s/it] 71%|███████▏ | 12335/17285 [110:27:39<46:35:44, 33.89s/it] 71%|███████▏ | 12336/17285 [110:28:06<43:53:36, 31.93s/it] 71%|███████▏ | 12337/17285 [110:28:33<41:44:16, 30.37s/it] 71%|███████▏ | 12338/17285 [110:29:09<44:19:37, 32.26s/it] 71%|███████▏ | 12339/17285 [110:29:41<44:12:21, 32.18s/it] 71%|███████▏ | 12340/17285 [110:30:13<44:08:36, 32.14s/it] {'loss': 1.2906, 'learning_rate': 4.364616579523162e-05, 'epoch': 2.14} + 71%|███████▏ | 12340/17285 [110:30:13<44:08:36, 32.14s/it] 71%|███████▏ | 12341/17285 [110:30:43<43:03:31, 31.35s/it] 71%|███████▏ | 12342/17285 [110:31:18<44:25:09, 32.35s/it] 71%|███████▏ | 12343/17285 [110:31:52<45:25:31, 33.09s/it] 71%|███████▏ | 12344/17285 [110:32:18<42:28:10, 30.94s/it] 71%|███████▏ | 12345/17285 [110:32:50<42:36:16, 31.05s/it] 71%|███████▏ | 12346/17285 [110:33:23<43:37:51, 31.80s/it] 71%|███████▏ | 12347/17285 [110:33:54<43:09:15, 31.46s/it] 71%|███████▏ | 12348/17285 [110:34:22<41:37:00, 30.35s/it] 71%|███████▏ | 12349/17285 [110:34:47<39:33:29, 28.85s/it] 71%|███████▏ | 12350/17285 [110:35:14<38:56:24, 28.41s/it] {'loss': 1.2777, 'learning_rate': 4.348821556568439e-05, 'epoch': 2.14} + 71%|███████▏ | 12350/17285 [110:35:14<38:56:24, 28.41s/it] 71%|███████▏ | 12351/17285 [110:35:45<39:42:20, 28.97s/it] 71%|███████▏ | 12352/17285 [110:36:26<44:58:17, 32.82s/it] 71%|███████▏ | 12353/17285 [110:36:59<44:53:28, 32.77s/it] 71%|███████▏ | 12354/17285 [110:37:29<43:31:33, 31.78s/it] 71%|███████▏ | 12355/17285 [110:37:54<40:59:28, 29.93s/it][2023-08-27 14:33:02,273] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 71%|███████▏ | 12356/17285 [110:38:25<41:11:47, 30.09s/it] 71%|███████▏ | 12357/17285 [110:39:00<43:32:59, 31.81s/it] 71%|███████▏ | 12358/17285 [110:39:38<46:06:10, 33.69s/it] 72%|███████▏ | 12359/17285 [110:40:10<45:10:53, 33.02s/it] 72%|███████▏ | 12360/17285 [110:40:35<41:59:22, 30.69s/it] {'loss': 1.2865, 'learning_rate': 4.3346237214366844e-05, 'epoch': 2.15} + 72%|███████▏ | 12360/17285 [110:40:35<41:59:22, 30.69s/it] 72%|███████▏ | 12361/17285 [110:41:07<42:15:08, 30.89s/it] 72%|███████▏ | 12362/17285 [110:41:38<42:39:18, 31.19s/it] 72%|███████▏ | 12363/17285 [110:42:10<42:41:34, 31.23s/it] 72%|███████▏ | 12364/17285 [110:42:37<40:57:05, 29.96s/it] 72%|███████▏ | 12365/17285 [110:43:13<43:28:17, 31.81s/it] 72%|███████▏ | 12366/17285 [110:43:41<41:45:42, 30.56s/it] 72%|███████▏ | 12367/17285 [110:44:16<43:52:31, 32.12s/it] 72%|███████▏ | 12368/17285 [110:44:54<46:08:15, 33.78s/it] 72%|███████▏ | 12369/17285 [110:45:30<47:12:06, 34.57s/it] 72%|███████▏ | 12370/17285 [110:46:08<48:15:43, 35.35s/it] {'loss': 1.2784, 'learning_rate': 4.3188680526855985e-05, 'epoch': 2.15} + 72%|███████▏ | 12370/17285 [110:46:08<48:15:43, 35.35s/it] 72%|███████▏ | 12371/17285 [110:46:34<44:27:48, 32.57s/it] 72%|███████▏ | 12372/17285 [110:47:01<42:15:27, 30.96s/it] 72%|███████▏ | 12373/17285 [110:47:37<44:18:32, 32.47s/it] 72%|███████▏ | 12374/17285 [110:48:09<44:16:25, 32.45s/it] 72%|███████▏ | 12375/17285 [110:48:37<42:17:13, 31.00s/it] 72%|███████▏ | 12376/17285 [110:49:12<43:46:42, 32.10s/it] 72%|███████▏ | 12377/17285 [110:49:45<44:20:26, 32.52s/it] 72%|███████▏ | 12378/17285 [110:50:22<46:04:32, 33.80s/it] 72%|███████▏ | 12379/17285 [110:51:00<48:00:41, 35.23s/it] 72%|███████▏ | 12380/17285 [110:51:38<49:10:17, 36.09s/it] {'loss': 1.2866, 'learning_rate': 4.303133180335535e-05, 'epoch': 2.15} + 72%|███████▏ | 12380/17285 [110:51:38<49:10:17, 36.09s/it] 72%|███████▏ | 12381/17285 [110:52:07<45:57:12, 33.73s/it] 72%|███████▏ | 12382/17285 [110:52:37<44:38:30, 32.78s/it] 72%|███████▏ | 12383/17285 [110:53:03<41:35:45, 30.55s/it] 72%|███████▏ | 12384/17285 [110:53:27<38:57:13, 28.61s/it] 72%|███████▏ | 12385/17285 [110:53:54<38:32:38, 28.32s/it] 72%|███████▏ | 12386/17285 [110:54:30<41:30:09, 30.50s/it] 72%|███████▏ | 12387/17285 [110:55:02<42:01:53, 30.89s/it] 72%|███████▏ | 12388/17285 [110:55:26<39:29:22, 29.03s/it] 72%|███████▏ | 12389/17285 [110:55:56<39:35:02, 29.11s/it] 72%|███████▏ | 12390/17285 [110:56:26<40:08:18, 29.52s/it] {'loss': 1.2954, 'learning_rate': 4.287419161985704e-05, 'epoch': 2.15} + 72%|███████▏ | 12390/17285 [110:56:26<40:08:18, 29.52s/it] 72%|███████▏ | 12391/17285 [110:57:03<43:09:15, 31.74s/it] 72%|███████▏ | 12392/17285 [110:57:41<45:35:29, 33.54s/it] 72%|███████▏ | 12393/17285 [110:58:15<45:37:57, 33.58s/it] 72%|███████▏ | 12394/17285 [110:58:48<45:25:32, 33.44s/it] 72%|███████▏ | 12395/17285 [110:59:16<43:26:28, 31.98s/it] 72%|███████▏ | 12396/17285 [110:59:46<42:43:42, 31.46s/it] 72%|███████▏ | 12397/17285 [111:00:20<43:40:28, 32.17s/it] 72%|███████▏ | 12398/17285 [111:00:59<46:27:51, 34.23s/it] 72%|███████▏ | 12399/17285 [111:01:30<44:57:42, 33.13s/it] 72%|███████▏ | 12400/17285 [111:02:08<47:00:41, 34.65s/it] {'loss': 1.2677, 'learning_rate': 4.2717260551589775e-05, 'epoch': 2.15} + 72%|███████▏ | 12400/17285 [111:02:08<47:00:41, 34.65s/it] 72%|███████▏ | 12401/17285 [111:02:39<45:24:44, 33.47s/it] 72%|███████▏ | 12402/17285 [111:03:10<44:25:47, 32.76s/it] 72%|███████▏ | 12403/17285 [111:03:38<42:32:18, 31.37s/it] 72%|███████▏ | 12404/17285 [111:04:06<41:07:55, 30.34s/it] 72%|███████▏ | 12405/17285 [111:04:38<41:53:40, 30.91s/it] 72%|███████▏ | 12406/17285 [111:05:14<43:54:35, 32.40s/it] 72%|███████▏ | 12407/17285 [111:05:46<43:42:10, 32.25s/it] 72%|███████▏ | 12408/17285 [111:06:17<43:13:42, 31.91s/it] 72%|███████▏ | 12409/17285 [111:06:59<47:21:53, 34.97s/it] 72%|███████▏ | 12410/17285 [111:07:36<48:01:47, 35.47s/it] {'loss': 1.2825, 'learning_rate': 4.2560539173016813e-05, 'epoch': 2.15} + 72%|███████▏ | 12410/17285 [111:07:36<48:01:47, 35.47s/it] 72%|███████▏ | 12411/17285 [111:08:13<48:41:57, 35.97s/it] 72%|███████▏ | 12412/17285 [111:08:47<47:43:34, 35.26s/it] 72%|███████▏ | 12413/17285 [111:09:15<44:55:56, 33.20s/it] 72%|███████▏ | 12414/17285 [111:09:45<43:42:10, 32.30s/it] 72%|███████▏ | 12415/17285 [111:10:26<47:15:15, 34.93s/it] 72%|███████▏ | 12416/17285 [111:10:58<45:54:58, 33.95s/it] 72%|███████▏ | 12417/17285 [111:11:30<45:01:01, 33.29s/it] 72%|███████▏ | 12418/17285 [111:12:01<44:09:40, 32.67s/it] 72%|███████▏ | 12419/17285 [111:12:39<46:15:56, 34.23s/it] 72%|███████▏ | 12420/17285 [111:13:13<46:20:38, 34.29s/it] {'loss': 1.2749, 'learning_rate': 4.240402805783377e-05, 'epoch': 2.16} + 72%|███████▏ | 12420/17285 [111:13:13<46:20:38, 34.29s/it] 72%|███████▏ | 12421/17285 [111:13:39<42:50:12, 31.70s/it] 72%|███████▏ | 12422/17285 [111:14:14<44:19:37, 32.81s/it] 72%|███████▏ | 12423/17285 [111:14:41<42:03:00, 31.14s/it] 72%|███████▏ | 12424/17285 [111:15:07<39:42:24, 29.41s/it] 72%|███████▏ | 12425/17285 [111:15:35<39:18:44, 29.12s/it] 72%|███████▏ | 12426/17285 [111:16:17<44:12:57, 32.76s/it] 72%|███████▏ | 12427/17285 [111:16:51<44:59:53, 33.35s/it] 72%|███████▏ | 12428/17285 [111:17:27<46:03:18, 34.14s/it] 72%|███████▏ | 12429/17285 [111:17:57<44:16:45, 32.83s/it] 72%|███████▏ | 12430/17285 [111:18:35<46:29:43, 34.48s/it] {'loss': 1.2797, 'learning_rate': 4.224772777896659e-05, 'epoch': 2.16} + 72%|███████▏ | 12430/17285 [111:18:35<46:29:43, 34.48s/it] 72%|███████▏ | 12431/17285 [111:19:09<46:13:50, 34.29s/it] 72%|███████▏ | 12432/17285 [111:19:37<43:45:59, 32.47s/it] 72%|███████▏ | 12433/17285 [111:20:07<42:30:30, 31.54s/it] 72%|███████▏ | 12434/17285 [111:20:41<43:37:13, 32.37s/it] 72%|███████▏ | 12435/17285 [111:21:10<42:15:51, 31.37s/it] 72%|███████▏ | 12436/17285 [111:21:41<41:57:34, 31.15s/it] 72%|███████▏ | 12437/17285 [111:22:14<42:59:46, 31.93s/it] 72%|███████▏ | 12438/17285 [111:22:42<41:04:03, 30.50s/it] 72%|███████▏ | 12439/17285 [111:23:13<41:13:46, 30.63s/it] 72%|███████▏ | 12440/17285 [111:23:45<41:57:42, 31.18s/it] {'loss': 1.2819, 'learning_rate': 4.209163890856951e-05, 'epoch': 2.16} + 72%|███████▏ | 12440/17285 [111:23:45<41:57:42, 31.18s/it] 72%|███████▏ | 12441/17285 [111:24:12<40:22:08, 30.00s/it] 72%|███████▏ | 12442/17285 [111:24:43<40:36:13, 30.18s/it] 72%|███████▏ | 12443/17285 [111:25:17<42:16:44, 31.43s/it] 72%|███████▏ | 12444/17285 [111:25:46<41:06:56, 30.58s/it] 72%|███████▏ | 12445/17285 [111:26:22<43:14:26, 32.16s/it] 72%|███████▏ | 12446/17285 [111:26:53<42:51:13, 31.88s/it] 72%|███████▏ | 12447/17285 [111:27:26<43:22:11, 32.27s/it] 72%|███████▏ | 12448/17285 [111:27:57<42:51:19, 31.90s/it] 72%|███████▏ | 12449/17285 [111:28:27<42:02:03, 31.29s/it] 72%|███████▏ | 12450/17285 [111:29:00<42:34:20, 31.70s/it] {'loss': 1.2607, 'learning_rate': 4.193576201802268e-05, 'epoch': 2.16} + 72%|███████▏ | 12450/17285 [111:29:00<42:34:20, 31.70s/it] 72%|███████▏ | 12451/17285 [111:29:40<45:57:14, 34.22s/it] 72%|███████▏ | 12452/17285 [111:30:17<47:05:12, 35.07s/it] 72%|███████▏ | 12453/17285 [111:30:45<44:26:10, 33.11s/it] 72%|███████▏ | 12454/17285 [111:31:14<42:28:33, 31.65s/it] 72%|███████▏ | 12455/17285 [111:31:42<41:19:01, 30.80s/it] 72%|███████▏ | 12456/17285 [111:32:12<40:38:07, 30.29s/it] 72%|███████▏ | 12457/17285 [111:32:45<41:58:01, 31.29s/it] 72%|███████▏ | 12458/17285 [111:33:18<42:26:53, 31.66s/it] 72%|███████▏ | 12459/17285 [111:33:51<42:56:43, 32.04s/it] 72%|███████▏ | 12460/17285 [111:34:20<41:52:09, 31.24s/it] {'loss': 1.2767, 'learning_rate': 4.1780097677930485e-05, 'epoch': 2.16} + 72%|███████▏ | 12460/17285 [111:34:20<41:52:09, 31.24s/it] 72%|███████▏ | 12461/17285 [111:34:51<41:43:50, 31.14s/it] 72%|███████▏ | 12462/17285 [111:35:24<42:29:35, 31.72s/it] 72%|███████▏ | 12463/17285 [111:35:53<41:33:33, 31.03s/it] 72%|███████▏ | 12464/17285 [111:36:24<41:15:47, 30.81s/it] 72%|███████▏ | 12465/17285 [111:37:08<46:46:42, 34.94s/it] 72%|███████▏ | 12466/17285 [111:37:44<47:04:42, 35.17s/it] 72%|███████▏ | 12467/17285 [111:38:18<46:43:19, 34.91s/it] 72%|███████▏ | 12468/17285 [111:38:50<45:27:25, 33.97s/it] 72%|███████▏ | 12469/17285 [111:39:17<42:38:51, 31.88s/it] 72%|███████▏ | 12470/17285 [111:39:44<40:41:28, 30.42s/it] {'loss': 1.2333, 'learning_rate': 4.162464645811913e-05, 'epoch': 2.16} + 72%|███████▏ | 12470/17285 [111:39:44<40:41:28, 30.42s/it] 72%|█��█████▏ | 12471/17285 [111:40:14<40:30:35, 30.29s/it] 72%|███████▏ | 12472/17285 [111:40:50<42:55:13, 32.10s/it] 72%|███████▏ | 12473/17285 [111:41:23<42:59:54, 32.17s/it] 72%|███████▏ | 12474/17285 [111:41:57<43:56:48, 32.88s/it] 72%|███████▏ | 12475/17285 [111:42:28<43:09:40, 32.30s/it] 72%|███████▏ | 12476/17285 [111:42:58<42:00:02, 31.44s/it] 72%|███████▏ | 12477/17285 [111:43:27<41:17:07, 30.91s/it] 72%|███████▏ | 12478/17285 [111:43:59<41:32:08, 31.11s/it] 72%|███████▏ | 12479/17285 [111:44:33<42:54:17, 32.14s/it] 72%|███████▏ | 12480/17285 [111:45:02<41:25:23, 31.04s/it] {'loss': 1.2667, 'learning_rate': 4.146940892763472e-05, 'epoch': 2.17} + 72%|███████▏ | 12480/17285 [111:45:02<41:25:23, 31.04s/it] 72%|███████▏ | 12481/17285 [111:45:37<43:03:59, 32.27s/it] 72%|███████▏ | 12482/17285 [111:46:08<42:22:02, 31.76s/it] 72%|███████▏ | 12483/17285 [111:46:41<43:12:18, 32.39s/it] 72%|███████▏ | 12484/17285 [111:47:21<46:12:45, 34.65s/it] 72%|███████▏ | 12485/17285 [111:47:50<43:57:30, 32.97s/it] 72%|███████▏ | 12486/17285 [111:48:17<41:22:51, 31.04s/it] 72%|███████▏ | 12487/17285 [111:48:51<42:34:35, 31.95s/it] 72%|███████▏ | 12488/17285 [111:49:18<40:42:42, 30.55s/it] 72%|███████▏ | 12489/17285 [111:49:48<40:24:52, 30.34s/it] 72%|███████▏ | 12490/17285 [111:50:17<39:39:05, 29.77s/it] {'loss': 1.3182, 'learning_rate': 4.131438565474112e-05, 'epoch': 2.17} + 72%|███████▏ | 12490/17285 [111:50:17<39:39:05, 29.77s/it] 72%|███████▏ | 12491/17285 [111:50:47<40:03:36, 30.08s/it] 72%|███████▏ | 12492/17285 [111:51:26<43:22:17, 32.58s/it] 72%|███████▏ | 12493/17285 [111:51:53<41:24:19, 31.11s/it] 72%|███████▏ | 12494/17285 [111:52:20<39:38:36, 29.79s/it] 72%|███████▏ | 12495/17285 [111:52:58<42:45:45, 32.14s/it] 72%|███████▏ | 12496/17285 [111:53:27<41:31:50, 31.22s/it] 72%|███████▏ | 12497/17285 [111:53:59<41:55:17, 31.52s/it] 72%|███████▏ | 12498/17285 [111:54:31<42:14:55, 31.77s/it] 72%|███████▏ | 12499/17285 [111:55:03<42:12:03, 31.74s/it] 72%|███████▏ | 12500/17285 [111:55:36<42:41:28, 32.12s/it] {'loss': 1.2889, 'learning_rate': 4.11595772069178e-05, 'epoch': 2.17} + 72%|███████▏ | 12500/17285 [111:55:36<42:41:28, 32.12s/it] 72%|███████▏ | 12501/17285 [111:56:14<44:55:51, 33.81s/it] 72%|███████▏ | 12502/17285 [111:56:44<43:37:24, 32.83s/it] 72%|███████▏ | 12503/17285 [111:57:22<45:23:29, 34.17s/it] 72%|███████▏ | 12504/17285 [111:57:53<44:12:20, 33.29s/it] 72%|███████▏ | 12505/17285 [111:58:30<45:36:12, 34.35s/it] 72%|███████▏ | 12506/17285 [111:59:01<44:19:14, 33.39s/it] 72%|███████▏ | 12507/17285 [111:59:28<41:44:39, 31.45s/it] 72%|███████▏ | 12508/17285 [112:00:04<43:40:08, 32.91s/it] 72%|███████▏ | 12509/17285 [112:00:45<46:57:09, 35.39s/it] 72%|███████▏ | 12510/17285 [112:01:11<43:06:32, 32.50s/it] {'loss': 1.3046, 'learning_rate': 4.100498415085804e-05, 'epoch': 2.17} + 72%|███████▏ | 12510/17285 [112:01:11<43:06:32, 32.50s/it] 72%|███████▏ | 12511/17285 [112:01:40<41:49:20, 31.54s/it] 72%|███████▏ | 12512/17285 [112:02:15<42:53:40, 32.35s/it] 72%|███████▏ | 12513/17285 [112:02:52<44:48:37, 33.81s/it] 72%|███████▏ | 12514/17285 [112:03:24<44:04:04, 33.25s/it] 72%|███████▏ | 12515/17285 [112:03:58<44:21:10, 33.47s/it] 72%|███████▏ | 12516/17285 [112:04:38<46:53:08, 35.39s/it] 72%|███████▏ | 12517/17285 [112:05:10<45:31:56, 34.38s/it] 72%|███████▏ | 12518/17285 [112:05:38<43:10:45, 32.61s/it] 72%|███████▏ | 12519/17285 [112:06:04<40:22:11, 30.49s/it] 72%|███████▏ | 12520/17285 [112:06:40<42:37:30, 32.20s/it] {'loss': 1.2576, 'learning_rate': 4.085060705246642e-05, 'epoch': 2.17} + 72%|███████▏ | 12520/17285 [112:06:40<42:37:30, 32.20s/it] 72%|███████▏ | 12521/17285 [112:07:09<41:22:42, 31.27s/it] 72%|███████▏ | 12522/17285 [112:07:40<41:12:51, 31.15s/it] 72%|███████▏ | 12523/17285 [112:08:08<40:05:20, 30.31s/it] 72%|███████▏ | 12524/17285 [112:08:44<42:08:43, 31.87s/it] 72%|███████▏ | 12525/17285 [112:09:22<44:40:32, 33.79s/it] 72%|███████▏ | 12526/17285 [112:09:56<44:52:13, 33.94s/it] 72%|███████▏ | 12527/17285 [112:10:31<45:21:01, 34.31s/it] 72%|███████▏ | 12528/17285 [112:11:11<47:23:14, 35.86s/it] 72%|███████▏ | 12529/17285 [112:11:45<46:30:10, 35.20s/it] 72%|███████▏ | 12530/17285 [112:12:24<48:02:18, 36.37s/it] {'loss': 1.2588, 'learning_rate': 4.069644647685712e-05, 'epoch': 2.17} + 72%|███████▏ | 12530/17285 [112:12:24<48:02:18, 36.37s/it] 72%|███████▏ | 12531/17285 [112:12:50<44:12:08, 33.47s/it] 73%|███████▎ | 12532/17285 [112:13:24<44:21:05, 33.59s/it] 73%|███████▎ | 12533/17285 [112:13:54<42:58:49, 32.56s/it] 73%|███████▎ | 12534/17285 [112:14:25<42:11:46, 31.97s/it] 73%|███████▎ | 12535/17285 [112:14:54<41:03:53, 31.12s/it] 73%|███████▎ | 12536/17285 [112:15:22<39:35:45, 30.02s/it] 73%|███████▎ | 12537/17285 [112:15:48<38:03:11, 28.85s/it] 73%|███████▎ | 12538/17285 [112:16:22<40:13:10, 30.50s/it] 73%|███████▎ | 12539/17285 [112:16:53<40:17:35, 30.56s/it] 73%|███████▎ | 12540/17285 [112:17:26<41:17:41, 31.33s/it] {'loss': 1.2901, 'learning_rate': 4.0542502988351686e-05, 'epoch': 2.18} + 73%|███████▎ | 12540/17285 [112:17:26<41:17:41, 31.33s/it] 73%|███████▎ | 12541/17285 [112:18:00<42:18:17, 32.10s/it] 73%|███████▎ | 12542/17285 [112:18:32<42:23:06, 32.17s/it] 73%|███████▎ | 12543/17285 [112:19:06<42:53:31, 32.56s/it] 73%|███████▎ | 12544/17285 [112:19:36<42:02:41, 31.93s/it] 73%|███████▎ | 12545/17285 [112:20:04<40:37:22, 30.85s/it] 73%|███████▎ | 12546/17285 [112:20:30<38:23:39, 29.17s/it] 73%|███████▎ | 12547/17285 [112:21:03<39:52:57, 30.30s/it] 73%|███████▎ | 12548/17285 [112:21:29<38:13:26, 29.05s/it] 73%|███████▎ | 12549/17285 [112:21:58<38:07:55, 28.99s/it] 73%|███████▎ | 12550/17285 [112:22:25<37:37:16, 28.60s/it] {'loss': 1.3228, 'learning_rate': 4.038877715047699e-05, 'epoch': 2.18} + 73%|███████▎ | 12550/17285 [112:22:25<37:37:16, 28.60s/it] 73%|███████▎ | 12551/17285 [112:22:59<39:39:47, 30.16s/it] 73%|███████▎ | 12552/17285 [112:23:28<39:00:30, 29.67s/it] 73%|███████▎ | 12553/17285 [112:24:03<41:14:52, 31.38s/it] 73%|███████▎ | 12554/17285 [112:24:30<39:37:57, 30.16s/it] 73%|███████▎ | 12555/17285 [112:24:57<38:15:47, 29.12s/it] 73%|███████▎ | 12556/17285 [112:25:28<39:03:07, 29.73s/it][2023-08-27 16:20:29,998] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 73%|███████▎ | 12557/17285 [112:25:52<36:50:49, 28.06s/it] 73%|███████▎ | 12558/17285 [112:26:31<40:51:51, 31.12s/it] 73%|███████▎ | 12559/17285 [112:27:00<40:00:36, 30.48s/it] 73%|███████▎ | 12560/17285 [112:27:32<40:41:29, 31.00s/it] {'loss': 1.2813, 'learning_rate': 4.0250610452792004e-05, 'epoch': 2.18} + 73%|███████▎ | 12560/17285 [112:27:32<40:41:29, 31.00s/it] 73%|███████▎ | 12561/17285 [112:28:07<42:17:21, 32.23s/it] 73%|███████▎ | 12562/17285 [112:28:38<41:46:59, 31.85s/it][2023-08-27 16:23:56,342] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 73%|███████▎ | 12563/17285 [112:29:19<45:18:14, 34.54s/it] 73%|███████▎ | 12564/17285 [112:29:49<43:40:59, 33.31s/it] 73%|███████▎ | 12565/17285 [112:30:18<42:01:19, 32.05s/it] 73%|███████▎ | 12566/17285 [112:30:50<41:57:52, 32.01s/it] 73%|███████▎ | 12567/17285 [112:31:27<43:58:12, 33.55s/it] 73%|███████▎ | 12568/17285 [112:31:55<41:40:54, 31.81s/it] 73%|███████▎ | 12569/17285 [112:32:25<40:47:58, 31.14s/it] 73%|███████▎ | 12570/17285 [112:32:59<41:53:39, 31.99s/it] {'loss': 1.3074, 'learning_rate': 4.011262091761672e-05, 'epoch': 2.18} + 73%|███████▎ | 12570/17285 [112:32:59<41:53:39, 31.99s/it] 73%|███████▎ | 12571/17285 [112:33:29<41:21:39, 31.59s/it] 73%|███████▎ | 12572/17285 [112:34:09<44:29:15, 33.98s/it] 73%|███████▎ | 12573/17285 [112:34:44<44:52:33, 34.29s/it] 73%|███████▎ | 12574/17285 [112:35:18<44:38:33, 34.11s/it] 73%|███████▎ | 12575/17285 [112:35:43<41:08:38, 31.45s/it] 73%|███████▎ | 12576/17285 [112:36:16<42:01:18, 32.13s/it][2023-08-27 16:31:18,940] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 73%|███████▎ | 12577/17285 [112:36:41<39:08:47, 29.93s/it] 73%|███████▎ | 12578/17285 [112:37:21<43:07:18, 32.98s/it] 73%|███████▎ | 12579/17285 [112:37:55<43:16:12, 33.10s/it] 73%|███████▎ | 12580/17285 [112:38:26<42:31:50, 32.54s/it] {'loss': 1.2753, 'learning_rate': 3.997480895410295e-05, 'epoch': 2.18} + 73%|███████▎ | 12580/17285 [112:38:26<42:31:50, 32.54s/it] 73%|███████▎ | 12581/17285 [112:38:56<41:21:23, 31.65s/it] 73%|███████▎ | 12582/17285 [112:39:24<40:01:32, 30.64s/it] 73%|███████▎ | 12583/17285 [112:39:55<40:24:27, 30.94s/it] 73%|███████▎ | 12584/17285 [112:40:34<43:33:35, 33.36s/it] 73%|███████▎ | 12585/17285 [112:41:08<43:48:43, 33.56s/it] 73%|███████▎ | 12586/17285 [112:41:42<43:43:44, 33.50s/it] 73%|███████▎ | 12587/17285 [112:42:12<42:32:53, 32.60s/it] 73%|███████▎ | 12588/17285 [112:42:44<42:20:03, 32.45s/it] 73%|███████▎ | 12589/17285 [112:43:18<42:41:21, 32.73s/it] 73%|███████▎ | 12590/17285 [112:43:48<41:37:32, 31.92s/it] {'loss': 1.2519, 'learning_rate': 3.9821893310242744e-05, 'epoch': 2.19} + 73%|███████▎ | 12590/17285 [112:43:48<41:37:32, 31.92s/it] 73%|███████▎ | 12591/17285 [112:44:17<40:41:02, 31.20s/it] 73%|███████▎ | 12592/17285 [112:44:42<38:13:59, 29.33s/it] 73%|███████▎ | 12593/17285 [112:45:14<39:12:38, 30.08s/it] 73%|███████▎ | 12594/17285 [112:45:43<38:44:13, 29.73s/it] 73%|███████▎ | 12595/17285 [112:46:13<38:50:25, 29.81s/it] 73%|███████▎ | 12596/17285 [112:46:41<38:03:02, 29.21s/it] 73%|███████▎ | 12597/17285 [112:47:19<41:28:38, 31.85s/it] 73%|███████▎ | 12598/17285 [112:47:51<41:22:31, 31.78s/it] 73%|███████▎ | 12599/17285 [112:48:18<39:41:27, 30.49s/it] 73%|███████▎ | 12600/17285 [112:48:54<41:39:14, 32.01s/it] {'loss': 1.331, 'learning_rate': 3.966919795488333e-05, 'epoch': 2.19} + 73%|███████▎ | 12600/17285 [112:48:54<41:39:14, 32.01s/it] 73%|███████▎ | 12601/17285 [112:49:20<39:26:18, 30.31s/it] 73%|███████▎ | 12602/17285 [112:49:45<37:15:36, 28.64s/it] 73%|███████▎ | 12603/17285 [112:50:19<39:26:26, 30.33s/it] 73%|███████▎ | 12604/17285 [112:50:48<39:05:27, 30.06s/it] 73%|███████▎ | 12605/17285 [112:51:24<41:12:00, 31.69s/it] 73%|███████▎ | 12606/17285 [112:51:58<42:14:37, 32.50s/it] 73%|███████▎ | 12607/17285 [112:52:33<42:59:09, 33.08s/it] 73%|███████▎ | 12608/17285 [112:53:02<41:29:22, 31.94s/it] 73%|███████▎ | 12609/17285 [112:53:26<38:34:42, 29.70s/it] 73%|███████▎ | 12610/17285 [112:54:06<42:32:47, 32.76s/it] {'loss': 1.3126, 'learning_rate': 3.9516723446982664e-05, 'epoch': 2.19} + 73%|███████▎ | 12610/17285 [112:54:06<42:32:47, 32.76s/it] 73%|███████▎ | 12611/17285 [112:54:36<41:26:41, 31.92s/it] 73%|███████▎ | 12612/17285 [112:55:07<40:47:07, 31.42s/it] 73%|███████▎ | 12613/17285 [112:55:41<42:07:59, 32.47s/it] 73%|███████▎ | 12614/17285 [112:56:12<41:30:32, 31.99s/it] 73%|███████▎ | 12615/17285 [112:56:48<42:47:21, 32.99s/it] 73%|███████▎ | 12616/17285 [112:57:16<40:48:41, 31.47s/it] 73%|███████▎ | 12617/17285 [112:57:47<40:41:28, 31.38s/it] 73%|███████▎ | 12618/17285 [112:58:22<42:12:51, 32.56s/it] 73%|███████▎ | 12619/17285 [112:59:03<45:33:28, 35.15s/it] 73%|███████▎ | 12620/17285 [112:59:30<42:21:37, 32.69s/it] {'loss': 1.2616, 'learning_rate': 3.936447034469024e-05, 'epoch': 2.19} + 73%|███████▎ | 12620/17285 [112:59:30<42:21:37, 32.69s/it] 73%|███████▎ | 12621/17285 [113:00:02<41:59:39, 32.41s/it] 73%|███████▎ | 12622/17285 [113:00:34<41:45:38, 32.24s/it] 73%|███████▎ | 12623/17285 [113:00:59<38:52:27, 30.02s/it] 73%|███████▎ | 12624/17285 [113:01:31<39:57:25, 30.86s/it] 73%|███████▎ | 12625/17285 [113:02:08<42:11:06, 32.59s/it] 73%|███████▎ | 12626/17285 [113:02:47<44:26:08, 34.34s/it] 73%|███████▎ | 12627/17285 [113:03:12<41:03:47, 31.74s/it] 73%|███████▎ | 12628/17285 [113:03:44<41:16:40, 31.91s/it] 73%|███████▎ | 12629/17285 [113:04:21<43:01:31, 33.27s/it] 73%|███████▎ | 12630/17285 [113:04:47<40:20:54, 31.20s/it] {'loss': 1.2957, 'learning_rate': 3.92124392053451e-05, 'epoch': 2.19} + 73%|███████▎ | 12630/17285 [113:04:47<40:20:54, 31.20s/it] 73%|███████▎ | 12631/17285 [113:05:16<39:11:16, 30.31s/it] 73%|███████▎ | 12632/17285 [113:05:44<38:38:19, 29.89s/it] 73%|███████▎ | 12633/17285 [113:06:18<39:57:55, 30.93s/it] 73%|███████▎ | 12634/17285 [113:06:49<40:00:58, 30.97s/it] 73%|███████▎ | 12635/17285 [113:07:15<38:03:17, 29.46s/it] 73%|███████▎ | 12636/17285 [113:07:48<39:36:52, 30.68s/it] 73%|███████▎ | 12637/17285 [113:08:20<39:58:58, 30.97s/it] 73%|███████▎ | 12638/17285 [113:08:52<40:31:27, 31.39s/it] 73%|███████▎ | 12639/17285 [113:09:23<40:11:26, 31.14s/it] 73%|███████▎ | 12640/17285 [113:10:03<43:40:23, 33.85s/it] {'loss': 1.309, 'learning_rate': 3.9060630585473746e-05, 'epoch': 2.19} + 73%|███████▎ | 12640/17285 [113:10:03<43:40:23, 33.85s/it] 73%|███████▎ | 12641/17285 [113:10:43<46:08:45, 35.77s/it] 73%|███████▎ | 12642/17285 [113:11:16<44:52:00, 34.79s/it] 73%|███████▎ | 12643/17285 [113:11:49<44:19:21, 34.37s/it] 73%|███████▎ | 12644/17285 [113:12:18<42:17:16, 32.80s/it] 73%|███████▎ | 12645/17285 [113:12:49<41:21:23, 32.09s/it] 73%|███████▎ | 12646/17285 [113:13:20<40:59:05, 31.81s/it] 73%|███████▎ | 12647/17285 [113:13:53<41:31:26, 32.23s/it] 73%|███████▎ | 12648/17285 [113:14:23<40:45:09, 31.64s/it] 73%|███████▎ | 12649/17285 [113:14:57<41:26:46, 32.18s/it] 73%|███████▎ | 12650/17285 [113:15:24<39:20:21, 30.55s/it] {'loss': 1.2873, 'learning_rate': 3.890904504078814e-05, 'epoch': 2.2} + 73%|███████▎ | 12650/17285 [113:15:24<39:20:21, 30.55s/it] 73%|███████▎ | 12651/17285 [113:15:53<38:59:39, 30.29s/it] 73%|███████▎ | 12652/17285 [113:16:19<37:19:15, 29.00s/it] 73%|███████▎ | 12653/17285 [113:16:53<38:57:43, 30.28s/it] 73%|███████▎ | 12654/17285 [113:17:23<38:50:41, 30.20s/it] 73%|███████▎ | 12655/17285 [113:17:54<39:22:00, 30.61s/it] 73%|███████▎ | 12656/17285 [113:18:22<38:14:45, 29.74s/it] 73%|███████▎ | 12657/17285 [113:18:47<36:36:46, 28.48s/it] 73%|███████▎ | 12658/17285 [113:19:13<35:36:57, 27.71s/it] 73%|███████▎ | 12659/17285 [113:19:52<39:59:33, 31.12s/it] 73%|███████▎ | 12660/17285 [113:20:28<41:45:53, 32.51s/it] {'loss': 1.283, 'learning_rate': 3.8757683126183654e-05, 'epoch': 2.2} + 73%|███████▎ | 12660/17285 [113:20:28<41:45:53, 32.51s/it] 73%|███████▎ | 12661/17285 [113:20:54<39:13:31, 30.54s/it] 73%|███████▎ | 12662/17285 [113:21:21<37:59:31, 29.58s/it] 73%|███████▎ | 12663/17285 [113:21:52<38:11:02, 29.74s/it] 73%|███████▎ | 12664/17285 [113:22:28<40:41:21, 31.70s/it] 73%|███████▎ | 12665/17285 [113:23:01<41:04:12, 32.00s/it] 73%|███████▎ | 12666/17285 [113:23:32<40:52:31, 31.86s/it] 73%|███████▎ | 12667/17285 [113:24:06<41:46:51, 32.57s/it] 73%|███████▎ | 12668/17285 [113:24:42<42:54:29, 33.46s/it] 73%|███████▎ | 12669/17285 [113:25:10<41:03:06, 32.02s/it] 73%|███████▎ | 12670/17285 [113:25:47<42:44:37, 33.34s/it] {'loss': 1.3069, 'learning_rate': 3.8606545395737005e-05, 'epoch': 2.2} + 73%|███████▎ | 12670/17285 [113:25:47<42:44:37, 33.34s/it] 73%|███████▎ | 12671/17285 [113:26:16<41:10:33, 32.13s/it] 73%|███████▎ | 12672/17285 [113:26:50<41:58:55, 32.76s/it] 73%|███████▎ | 12673/17285 [113:27:25<42:40:08, 33.31s/it] 73%|███████▎ | 12674/17285 [113:28:05<45:11:28, 35.28s/it] 73%|███████▎ | 12675/17285 [113:28:44<46:40:04, 36.44s/it] 73%|███████▎ | 12676/17285 [113:29:14<44:02:16, 34.40s/it] 73%|███████▎ | 12677/17285 [113:29:40<40:48:55, 31.89s/it] 73%|███████▎ | 12678/17285 [113:30:09<39:56:45, 31.21s/it] 73%|███████▎ | 12679/17285 [113:30:49<43:07:46, 33.71s/it] 73%|███████▎ | 12680/17285 [113:31:25<44:00:13, 34.40s/it] {'loss': 1.2527, 'learning_rate': 3.84556324027043e-05, 'epoch': 2.2} + 73%|███████▎ | 12680/17285 [113:31:25<44:00:13, 34.40s/it] 73%|███████▎ | 12681/17285 [113:31:57<43:03:26, 33.67s/it] 73%|███████▎ | 12682/17285 [113:32:26<41:10:53, 32.21s/it] 73%|███████▎ | 12683/17285 [113:32:51<38:32:43, 30.15s/it] 73%|███████▎ | 12684/17285 [113:33:25<40:10:14, 31.43s/it] 73%|███████▎ | 12685/17285 [113:34:03<42:38:48, 33.38s/it] 73%|███████▎ | 12686/17285 [113:34:36<42:20:03, 33.14s/it] 73%|███████▎ | 12687/17285 [113:35:10<42:40:11, 33.41s/it] 73%|███████▎ | 12688/17285 [113:35:46<43:44:38, 34.26s/it] 73%|███████▎ | 12689/17285 [113:36:10<39:50:35, 31.21s/it] 73%|███████▎ | 12690/17285 [113:36:43<40:31:33, 31.75s/it] {'loss': 1.2922, 'learning_rate': 3.8304944699518954e-05, 'epoch': 2.2} + 73%|███████▎ | 12690/17285 [113:36:43<40:31:33, 31.75s/it] 73%|███████▎ | 12691/17285 [113:37:16<40:53:29, 32.04s/it][2023-08-27 17:32:34,273] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 73%|███████▎ | 12692/17285 [113:37:57<44:08:15, 34.60s/it] 73%|███████▎ | 12693/17285 [113:38:27<42:38:02, 33.42s/it] 73%|███████▎ | 12694/17285 [113:38:56<40:59:12, 32.14s/it] 73%|███████▎ | 12695/17285 [113:39:25<39:36:38, 31.07s/it] 73%|███████▎ | 12696/17285 [113:39:58<40:15:41, 31.58s/it] 73%|███████▎ | 12697/17285 [113:40:30<40:33:29, 31.82s/it] 73%|███████▎ | 12698/17285 [113:41:06<42:15:18, 33.16s/it] 73%|███████▎ | 12699/17285 [113:41:33<39:34:41, 31.07s/it] 73%|███████▎ | 12700/17285 [113:42:00<38:00:13, 29.84s/it] {'loss': 1.2795, 'learning_rate': 3.816951884539331e-05, 'epoch': 2.2} + 73%|███████▎ | 12700/17285 [113:42:00<38:00:13, 29.84s/it] 73%|███████▎ | 12701/17285 [113:42:32<38:57:28, 30.60s/it] 73%|███████▎ | 12702/17285 [113:43:02<38:50:05, 30.51s/it] 73%|███████▎ | 12703/17285 [113:43:33<38:44:47, 30.44s/it] 73%|███████▎ | 12704/17285 [113:44:01<38:00:24, 29.87s/it] 74%|███████▎ | 12705/17285 [113:44:39<41:14:04, 32.41s/it] 74%|███████▎ | 12706/17285 [113:45:04<38:21:47, 30.16s/it] 74%|███████▎ | 12707/17285 [113:45:41<40:43:16, 32.02s/it] 74%|███████▎ | 12708/17285 [113:46:10<39:29:00, 31.06s/it] 74%|███████▎ | 12709/17285 [113:46:37<38:12:34, 30.06s/it] 74%|███████▎ | 12710/17285 [113:47:13<40:14:06, 31.66s/it] {'loss': 1.285, 'learning_rate': 3.801926071191671e-05, 'epoch': 2.21} + 74%|███████▎ | 12710/17285 [113:47:13<40:14:06, 31.66s/it] 74%|███████▎ | 12711/17285 [113:47:45<40:33:23, 31.92s/it] 74%|███████▎ | 12712/17285 [113:48:15<39:34:58, 31.16s/it] 74%|███████▎ | 12713/17285 [113:48:48<40:36:44, 31.98s/it] 74%|███████▎ | 12714/17285 [113:49:19<40:07:11, 31.60s/it] 74%|███████▎ | 12715/17285 [113:49:54<41:14:46, 32.49s/it] 74%|███████▎ | 12716/17285 [113:50:23<39:58:42, 31.50s/it] 74%|███████▎ | 12717/17285 [113:51:05<43:53:17, 34.59s/it] 74%|███████▎ | 12718/17285 [113:51:32<41:12:25, 32.48s/it] 74%|███████▎ | 12719/17285 [113:52:07<42:09:27, 33.24s/it] 74%|███████▎ | 12720/17285 [113:52:38<41:16:12, 32.55s/it] {'loss': 1.2804, 'learning_rate': 3.786922946567352e-05, 'epoch': 2.21} + 74%|███████▎ | 12720/17285 [113:52:38<41:16:12, 32.55s/it] 74%|███████▎ | 12721/17285 [113:53:09<40:36:20, 32.03s/it] 74%|███████▎ | 12722/17285 [113:53:34<37:54:28, 29.91s/it] 74%|███████▎ | 12723/17285 [113:54:09<39:48:22, 31.41s/it] 74%|███████▎ | 12724/17285 [113:54:35<37:44:06, 29.78s/it] 74%|███████▎ | 12725/17285 [113:55:11<40:17:25, 31.81s/it] 74%|███████▎ | 12726/17285 [113:55:39<38:33:26, 30.45s/it] 74%|███████▎ | 12727/17285 [113:56:24<44:21:15, 35.03s/it] 74%|███████▎ | 12728/17285 [113:56:48<39:57:53, 31.57s/it] 74%|███████▎ | 12729/17285 [113:57:19<39:39:07, 31.33s/it] 74%|███████▎ | 12730/17285 [113:57:49<39:05:20, 30.89s/it] {'loss': 1.318, 'learning_rate': 3.771942565586933e-05, 'epoch': 2.21} + 74%|███████▎ | 12730/17285 [113:57:49<39:05:20, 30.89s/it] 74%|███████▎ | 12731/17285 [113:58:17<37:58:27, 30.02s/it] 74%|███████▎ | 12732/17285 [113:58:51<39:47:57, 31.47s/it] 74%|███████▎ | 12733/17285 [113:59:36<44:38:33, 35.31s/it] 74%|███████▎ | 12734/17285 [114:00:10<44:19:39, 35.06s/it] 74%|███████▎ | 12735/17285 [114:00:40<42:18:59, 33.48s/it] 74%|███████▎ | 12736/17285 [114:01:14<42:24:32, 33.56s/it] 74%|███████▎ | 12737/17285 [114:01:54<44:46:23, 35.44s/it] 74%|███████▎ | 12738/17285 [114:02:25<43:09:18, 34.17s/it] 74%|███████▎ | 12739/17285 [114:02:54<41:19:55, 32.73s/it] 74%|███████▎ | 12740/17285 [114:03:28<41:35:58, 32.95s/it] {'loss': 1.3102, 'learning_rate': 3.7569849830877333e-05, 'epoch': 2.21} + 74%|███████▎ | 12740/17285 [114:03:28<41:35:58, 32.95s/it] 74%|███████▎ | 12741/17285 [114:03:59<40:52:31, 32.38s/it] 74%|███████▎ | 12742/17285 [114:04:27<39:15:28, 31.11s/it] 74%|███████▎ | 12743/17285 [114:05:11<44:23:00, 35.18s/it] 74%|███████▎ | 12744/17285 [114:05:42<42:47:56, 33.93s/it] 74%|███████▎ | 12745/17285 [114:06:15<42:22:36, 33.60s/it] 74%|███████▎ | 12746/17285 [114:06:44<40:36:11, 32.20s/it] 74%|███████▎ | 12747/17285 [114:07:14<39:30:37, 31.34s/it] 74%|███████▍ | 12748/17285 [114:07:44<39:13:56, 31.13s/it] 74%|███████▍ | 12749/17285 [114:08:18<40:23:27, 32.06s/it] 74%|███████▍ | 12750/17285 [114:08:55<41:55:52, 33.29s/it] {'loss': 1.3083, 'learning_rate': 3.742050253823604e-05, 'epoch': 2.21} + 74%|███████▍ | 12750/17285 [114:08:55<41:55:52, 33.29s/it] 74%|███████▍ | 12751/17285 [114:09:25<41:01:47, 32.58s/it] 74%|███████▍ | 12752/17285 [114:09:59<41:33:30, 33.00s/it] 74%|███████▍ | 12753/17285 [114:10:29<40:06:59, 31.87s/it][2023-08-27 18:05:36,137] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 74%|███████▍ | 12754/17285 [114:10:58<39:18:38, 31.23s/it] 74%|███████▍ | 12755/17285 [114:11:32<40:03:32, 31.83s/it] 74%|███████▍ | 12756/17285 [114:12:08<41:36:49, 33.08s/it] 74%|███████▍ | 12757/17285 [114:12:38<40:41:14, 32.35s/it] 74%|███████▍ | 12758/17285 [114:13:14<41:56:35, 33.35s/it] 74%|███████▍ | 12759/17285 [114:13:48<42:18:29, 33.65s/it] 74%|███████▍ | 12760/17285 [114:14:13<38:47:54, 30.87s/it] {'loss': 1.2854, 'learning_rate': 3.7286285821885306e-05, 'epoch': 2.21} + 74%|███████▍ | 12760/17285 [114:14:13<38:47:54, 30.87s/it] 74%|███████▍ | 12761/17285 [114:14:50<41:03:09, 32.67s/it] 74%|███████▍ | 12762/17285 [114:15:21<40:28:02, 32.21s/it] 74%|███████▍ | 12763/17285 [114:15:53<40:21:43, 32.13s/it] 74%|███████▍ | 12764/17285 [114:16:21<39:02:05, 31.08s/it] 74%|███████▍ | 12765/17285 [114:16:49<37:43:27, 30.05s/it] 74%|███████▍ | 12766/17285 [114:17:24<39:27:27, 31.43s/it] 74%|███████▍ | 12767/17285 [114:17:48<36:58:08, 29.46s/it] 74%|███████▍ | 12768/17285 [114:18:14<35:25:49, 28.24s/it] 74%|███████▍ | 12769/17285 [114:18:49<37:53:24, 30.20s/it] 74%|███████▍ | 12770/17285 [114:19:24<39:56:47, 31.85s/it] {'loss': 1.305, 'learning_rate': 3.713737424618142e-05, 'epoch': 2.22} + 74%|███████▍ | 12770/17285 [114:19:24<39:56:47, 31.85s/it] 74%|███████▍ | 12771/17285 [114:19:55<39:20:17, 31.37s/it] 74%|███████▍ | 12772/17285 [114:20:28<39:56:32, 31.86s/it] 74%|███████▍ | 12773/17285 [114:20:58<39:23:33, 31.43s/it] 74%|███████▍ | 12774/17285 [114:21:32<40:14:21, 32.11s/it] 74%|███████▍ | 12775/17285 [114:21:59<38:32:53, 30.77s/it] 74%|███████▍ | 12776/17285 [114:22:34<40:09:56, 32.07s/it] 74%|███████▍ | 12777/17285 [114:23:08<40:32:44, 32.38s/it] 74%|███████▍ | 12778/17285 [114:23:43<41:43:23, 33.33s/it] 74%|███████▍ | 12779/17285 [114:24:22<43:40:32, 34.89s/it] 74%|███████▍ | 12780/17285 [114:24:46<39:48:18, 31.81s/it] {'loss': 1.2948, 'learning_rate': 3.6988692785952173e-05, 'epoch': 2.22} + 74%|███████▍ | 12780/17285 [114:24:46<39:48:18, 31.81s/it] 74%|███████▍ | 12781/17285 [114:25:19<40:09:28, 32.10s/it] 74%|███████▍ | 12782/17285 [114:25:51<40:03:38, 32.03s/it] 74%|███████▍ | 12783/17285 [114:26:25<40:42:10, 32.55s/it] 74%|███████▍ | 12784/17285 [114:26:52<38:50:24, 31.07s/it] 74%|███████▍ | 12785/17285 [114:27:21<37:46:24, 30.22s/it] 74%|███████▍ | 12786/17285 [114:27:52<38:15:49, 30.62s/it] 74%|███████▍ | 12787/17285 [114:28:18<36:37:41, 29.32s/it] 74%|███████▍ | 12788/17285 [114:28:51<37:55:33, 30.36s/it] 74%|███████▍ | 12789/17285 [114:29:28<40:09:48, 32.16s/it] 74%|███████▍ | 12790/17285 [114:29:58<39:27:17, 31.60s/it] {'loss': 1.2779, 'learning_rate': 3.68402419854622e-05, 'epoch': 2.22} + 74%|███████▍ | 12790/17285 [114:29:58<39:27:17, 31.60s/it] 74%|███████▍ | 12791/17285 [114:30:23<36:58:02, 29.61s/it] 74%|███████▍ | 12792/17285 [114:30:54<37:32:35, 30.08s/it] 74%|███████▍ | 12793/17285 [114:31:26<38:16:08, 30.67s/it] 74%|███████▍ | 12794/17285 [114:31:54<37:12:25, 29.83s/it] 74%|███████▍ | 12795/17285 [114:32:29<39:03:05, 31.31s/it] 74%|███████▍ | 12796/17285 [114:32:57<37:53:58, 30.39s/it] 74%|███████▍ | 12797/17285 [114:33:28<38:17:08, 30.71s/it] 74%|███████▍ | 12798/17285 [114:33:54<36:15:43, 29.09s/it] 74%|███████▍ | 12799/17285 [114:34:34<40:20:46, 32.38s/it] 74%|███████▍ | 12800/17285 [114:35:05<39:45:11, 31.91s/it] {'loss': 1.3145, 'learning_rate': 3.6692022388131795e-05, 'epoch': 2.22} + 74%|███████▍ | 12800/17285 [114:35:05<39:45:11, 31.91s/it] 74%|███████▍ | 12801/17285 [114:35:38<40:23:01, 32.42s/it] 74%|███████▍ | 12802/17285 [114:36:04<38:03:51, 30.57s/it] 74%|███████▍ | 12803/17285 [114:36:34<37:45:02, 30.32s/it] 74%|███████▍ | 12804/17285 [114:37:15<41:39:55, 33.47s/it] 74%|███████▍ | 12805/17285 [114:37:47<41:12:45, 33.12s/it] 74%|███████▍ | 12806/17285 [114:38:16<39:37:09, 31.84s/it] 74%|███████▍ | 12807/17285 [114:38:46<38:56:06, 31.30s/it] 74%|███████▍ | 12808/17285 [114:39:24<41:21:11, 33.25s/it] 74%|███████▍ | 12809/17285 [114:39:57<41:11:59, 33.14s/it] 74%|███████▍ | 12810/17285 [114:40:24<38:53:18, 31.28s/it] {'loss': 1.2673, 'learning_rate': 3.654403453653494e-05, 'epoch': 2.22} + 74%|███████▍ | 12810/17285 [114:40:24<38:53:18, 31.28s/it] 74%|███████▍ | 12811/17285 [114:40:58<39:48:57, 32.04s/it] 74%|███████▍ | 12812/17285 [114:41:29<39:35:10, 31.86s/it] 74%|███████▍ | 12813/17285 [114:41:58<38:31:11, 31.01s/it] 74%|███████▍ | 12814/17285 [114:42:27<37:46:11, 30.41s/it] 74%|███████▍ | 12815/17285 [114:42:54<36:39:07, 29.52s/it] 74%|███████▍ | 12816/17285 [114:43:23<36:09:20, 29.13s/it] 74%|███████▍ | 12817/17285 [114:43:50<35:23:23, 28.51s/it] 74%|███████▍ | 12818/17285 [114:44:29<39:14:31, 31.63s/it] 74%|███████▍ | 12819/17285 [114:45:00<39:12:48, 31.61s/it] 74%|███████▍ | 12820/17285 [114:45:34<40:01:01, 32.26s/it] {'loss': 1.2883, 'learning_rate': 3.639627897239718e-05, 'epoch': 2.23} + 74%|███████▍ | 12820/17285 [114:45:34<40:01:01, 32.26s/it] 74%|███████▍ | 12821/17285 [114:46:05<39:21:50, 31.75s/it] 74%|███████▍ | 12822/17285 [114:46:34<38:37:46, 31.16s/it] 74%|███████▍ | 12823/17285 [114:46:59<36:16:39, 29.27s/it] 74%|███████▍ | 12824/17285 [114:47:27<35:35:49, 28.73s/it] 74%|███████▍ | 12825/17285 [114:48:03<38:29:19, 31.07s/it] 74%|███████▍ | 12826/17285 [114:48:45<42:16:55, 34.14s/it] 74%|███████▍ | 12827/17285 [114:49:15<41:01:49, 33.13s/it] 74%|███████▍ | 12828/17285 [114:49:50<41:29:16, 33.51s/it] 74%|███████▍ | 12829/17285 [114:50:17<39:06:33, 31.60s/it] 74%|███████▍ | 12830/17285 [114:50:50<39:44:14, 32.11s/it] {'loss': 1.2678, 'learning_rate': 3.6248756236593863e-05, 'epoch': 2.23} + 74%|███████▍ | 12830/17285 [114:50:50<39:44:14, 32.11s/it] 74%|███████▍ | 12831/17285 [114:51:33<43:34:04, 35.21s/it] 74%|███████▍ | 12832/17285 [114:52:14<45:47:11, 37.02s/it] 74%|███████▍ | 12833/17285 [114:52:47<44:21:23, 35.87s/it] 74%|███████▍ | 12834/17285 [114:53:17<42:20:57, 34.25s/it] 74%|███████▍ | 12835/17285 [114:53:52<42:24:07, 34.30s/it] 74%|███████▍ | 12836/17285 [114:54:36<45:52:04, 37.11s/it] 74%|███████▍ | 12837/17285 [114:55:08<44:05:26, 35.68s/it] 74%|███████▍ | 12838/17285 [114:55:42<43:28:26, 35.19s/it] 74%|███████▍ | 12839/17285 [114:56:16<43:06:09, 34.90s/it] 74%|███████▍ | 12840/17285 [114:56:58<45:50:07, 37.12s/it] {'loss': 1.2771, 'learning_rate': 3.6101466869147995e-05, 'epoch': 2.23} + 74%|███████▍ | 12840/17285 [114:56:58<45:50:07, 37.12s/it] 74%|███████▍ | 12841/17285 [114:57:38<46:49:23, 37.93s/it] 74%|███████▍ | 12842/17285 [114:58:04<42:18:28, 34.28s/it] 74%|███████▍ | 12843/17285 [114:58:30<39:08:09, 31.72s/it] 74%|███████▍ | 12844/17285 [114:59:08<41:42:03, 33.80s/it] 74%|███████▍ | 12845/17285 [114:59:45<42:42:54, 34.63s/it] 74%|███████▍ | 12846/17285 [115:00:16<41:13:53, 33.44s/it] 74%|███████▍ | 12847/17285 [115:00:41<38:13:21, 31.01s/it] 74%|███████▍ | 12848/17285 [115:01:07<36:20:11, 29.48s/it] 74%|███████▍ | 12849/17285 [115:01:38<37:02:51, 30.07s/it] 74%|███████▍ | 12850/17285 [115:02:05<35:46:04, 29.03s/it] {'loss': 1.3139, 'learning_rate': 3.5954411409228294e-05, 'epoch': 2.23} + 74%|███████▍ | 12850/17285 [115:02:05<35:46:04, 29.03s/it] 74%|███████▍ | 12851/17285 [115:02:38<37:21:10, 30.33s/it] 74%|███████▍ | 12852/17285 [115:03:07<36:33:38, 29.69s/it] 74%|███████▍ | 12853/17285 [115:03:33<35:23:25, 28.75s/it] 74%|███████▍ | 12854/17285 [115:04:01<35:09:16, 28.56s/it] 74%|███████▍ | 12855/17285 [115:04:31<35:27:26, 28.81s/it] 74%|███████▍ | 12856/17285 [115:05:03<36:48:32, 29.92s/it] 74%|███████▍ | 12857/17285 [115:05:29<35:27:10, 28.82s/it] 74%|███████▍ | 12858/17285 [115:06:05<37:51:17, 30.78s/it] 74%|███████▍ | 12859/17285 [115:06:40<39:26:32, 32.08s/it] 74%|███████▍ | 12860/17285 [115:07:12<39:20:02, 32.00s/it] {'loss': 1.2914, 'learning_rate': 3.580759039514729e-05, 'epoch': 2.23} + 74%|███████▍ | 12860/17285 [115:07:12<39:20:02, 32.00s/it] 74%|███████▍ | 12861/17285 [115:07:38<37:17:34, 30.35s/it] 74%|███████▍ | 12862/17285 [115:08:07<36:39:20, 29.84s/it] 74%|███████▍ | 12863/17285 [115:08:38<37:12:57, 30.30s/it] 74%|███████▍ | 12864/17285 [115:09:09<37:18:20, 30.38s/it] 74%|███████▍ | 12865/17285 [115:09:40<37:34:53, 30.61s/it] 74%|███████▍ | 12866/17285 [115:10:11<37:44:37, 30.75s/it] 74%|███████▍ | 12867/17285 [115:10:50<40:42:30, 33.17s/it] 74%|███████▍ | 12868/17285 [115:11:22<40:15:01, 32.81s/it] 74%|███████▍ | 12869/17285 [115:11:49<38:12:59, 31.15s/it] 74%|███████▍ | 12870/17285 [115:12:15<36:22:59, 29.67s/it] {'loss': 1.2685, 'learning_rate': 3.566100436435924e-05, 'epoch': 2.23} + 74%|███████▍ | 12870/17285 [115:12:15<36:22:59, 29.67s/it] 74%|███████▍ | 12871/17285 [115:12:39<34:22:05, 28.03s/it] 74%|███████▍ | 12872/17285 [115:13:11<35:27:59, 28.93s/it] 74%|███████▍ | 12873/17285 [115:13:41<36:11:30, 29.53s/it] 74%|███████▍ | 12874/17285 [115:14:13<37:02:44, 30.23s/it] 74%|███████▍ | 12875/17285 [115:14:47<38:14:56, 31.22s/it] 74%|███████▍ | 12876/17285 [115:15:15<37:16:40, 30.44s/it] 74%|███████▍ | 12877/17285 [115:15:46<37:19:55, 30.49s/it] 75%|███████▍ | 12878/17285 [115:16:22<39:25:36, 32.21s/it] 75%|███████▍ | 12879/17285 [115:16:59<41:15:16, 33.71s/it] 75%|███████▍ | 12880/17285 [115:17:33<41:11:17, 33.66s/it] {'loss': 1.2932, 'learning_rate': 3.551465385345826e-05, 'epoch': 2.24} + 75%|███████▍ | 12880/17285 [115:17:33<41:11:17, 33.66s/it] 75%|███████▍ | 12881/17285 [115:18:10<42:28:09, 34.72s/it][2023-08-27 19:13:24,697] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 75%|███████▍ | 12882/17285 [115:18:47<43:13:36, 35.34s/it] 75%|███████▍ | 12883/17285 [115:19:32<46:50:42, 38.31s/it] 75%|███████▍ | 12884/17285 [115:20:02<43:41:29, 35.74s/it] 75%|███████▍ | 12885/17285 [115:20:31<41:08:10, 33.66s/it] 75%|███████▍ | 12886/17285 [115:21:00<39:40:24, 32.47s/it] 75%|███████▍ | 12887/17285 [115:21:30<38:37:22, 31.61s/it] 75%|███████▍ | 12888/17285 [115:22:02<38:49:24, 31.79s/it] 75%|███████▍ | 12889/17285 [115:22:30<37:28:05, 30.68s/it] 75%|███████▍ | 12890/17285 [115:23:07<39:36:11, 32.44s/it] {'loss': 1.2751, 'learning_rate': 3.5383140205951094e-05, 'epoch': 2.24} + 75%|███████▍ | 12890/17285 [115:23:07<39:36:11, 32.44s/it] 75%|███████▍ | 12891/17285 [115:23:41<40:03:52, 32.82s/it] 75%|███████▍ | 12892/17285 [115:24:09<38:17:11, 31.38s/it] 75%|███████▍ | 12893/17285 [115:24:49<41:24:07, 33.94s/it] 75%|███████▍ | 12894/17285 [115:25:22<41:02:37, 33.65s/it] 75%|███████▍ | 12895/17285 [115:25:55<40:48:09, 33.46s/it] 75%|███████▍ | 12896/17285 [115:26:23<39:02:45, 32.03s/it] 75%|███████▍ | 12897/17285 [115:27:05<42:44:26, 35.07s/it] 75%|███████▍ | 12898/17285 [115:27:40<42:31:06, 34.89s/it] 75%|███████▍ | 12899/17285 [115:28:20<44:23:30, 36.44s/it] 75%|███████▍ | 12900/17285 [115:28:54<43:29:51, 35.71s/it] {'loss': 1.2775, 'learning_rate': 3.5237238658062945e-05, 'epoch': 2.24} + 75%|███████▍ | 12900/17285 [115:28:54<43:29:51, 35.71s/it] 75%|███████▍ | 12901/17285 [115:29:28<42:44:24, 35.10s/it] 75%|███████▍ | 12902/17285 [115:29:57<40:38:21, 33.38s/it] 75%|███████▍ | 12903/17285 [115:30:21<37:05:11, 30.47s/it] 75%|███████▍ | 12904/17285 [115:30:49<36:14:18, 29.78s/it] 75%|███████▍ | 12905/17285 [115:31:21<37:05:44, 30.49s/it] 75%|███████▍ | 12906/17285 [115:32:01<40:28:03, 33.27s/it] 75%|███████▍ | 12907/17285 [115:32:30<38:50:20, 31.94s/it] 75%|███████▍ | 12908/17285 [115:33:05<39:58:28, 32.88s/it] 75%|███████▍ | 12909/17285 [115:33:35<38:52:17, 31.98s/it] 75%|███████▍ | 12910/17285 [115:34:09<39:41:28, 32.66s/it] {'loss': 1.2826, 'learning_rate': 3.5091574181302256e-05, 'epoch': 2.24} + 75%|███████▍ | 12910/17285 [115:34:09<39:41:28, 32.66s/it] 75%|███████▍ | 12911/17285 [115:34:44<40:27:13, 33.30s/it] 75%|███████▍ | 12912/17285 [115:35:20<41:26:55, 34.12s/it] 75%|███████▍ | 12913/17285 [115:36:00<43:38:49, 35.94s/it] 75%|███████▍ | 12914/17285 [115:36:32<42:09:22, 34.72s/it] 75%|███████▍ | 12915/17285 [115:37:12<44:06:21, 36.33s/it] 75%|███████▍ | 12916/17285 [115:37:48<43:56:12, 36.20s/it] 75%|███████▍ | 12917/17285 [115:38:19<42:18:14, 34.87s/it] 75%|███████▍ | 12918/17285 [115:38:51<40:58:31, 33.78s/it] 75%|███████▍ | 12919/17285 [115:39:20<39:13:31, 32.34s/it] 75%|███████▍ | 12920/17285 [115:39:46<37:04:47, 30.58s/it] {'loss': 1.2661, 'learning_rate': 3.494614730888971e-05, 'epoch': 2.24} + 75%|███████▍ | 12920/17285 [115:39:46<37:04:47, 30.58s/it] 75%|███████▍ | 12921/17285 [115:40:27<40:50:36, 33.69s/it] 75%|███████▍ | 12922/17285 [115:40:58<39:42:02, 32.76s/it] 75%|███████▍ | 12923/17285 [115:41:23<37:10:14, 30.68s/it] 75%|███████▍ | 12924/17285 [115:41:59<39:01:37, 32.22s/it] 75%|███████▍ | 12925/17285 [115:42:29<38:09:22, 31.51s/it] 75%|███████▍ | 12926/17285 [115:43:07<40:36:06, 33.53s/it] 75%|███████▍ | 12927/17285 [115:43:38<39:35:55, 32.71s/it] 75%|███████▍ | 12928/17285 [115:44:12<40:02:39, 33.09s/it] 75%|███████▍ | 12929/17285 [115:44:38<37:28:28, 30.97s/it] 75%|███████▍ | 12930/17285 [115:45:08<36:58:07, 30.56s/it] {'loss': 1.27, 'learning_rate': 3.480095857317618e-05, 'epoch': 2.24} + 75%|███████▍ | 12930/17285 [115:45:08<36:58:07, 30.56s/it] 75%|███████▍ | 12931/17285 [115:45:46<39:39:10, 32.79s/it] 75%|███████▍ | 12932/17285 [115:46:22<40:50:46, 33.78s/it] 75%|███████▍ | 12933/17285 [115:46:52<39:25:56, 32.62s/it] 75%|███████▍ | 12934/17285 [115:47:23<38:56:53, 32.23s/it] 75%|███████▍ | 12935/17285 [115:47:53<38:06:04, 31.53s/it] 75%|███████▍ | 12936/17285 [115:48:21<36:59:05, 30.62s/it] 75%|███████▍ | 12937/17285 [115:48:57<38:36:29, 31.97s/it] 75%|███████▍ | 12938/17285 [115:49:21<35:50:18, 29.68s/it] 75%|███████▍ | 12939/17285 [115:49:56<37:36:33, 31.15s/it] 75%|███████▍ | 12940/17285 [115:50:22<35:56:10, 29.77s/it] {'loss': 1.3088, 'learning_rate': 3.4656008505640814e-05, 'epoch': 2.25} + 75%|███████▍ | 12940/17285 [115:50:22<35:56:10, 29.77s/it] 75%|███████▍ | 12941/17285 [115:51:05<40:37:18, 33.66s/it] 75%|███████▍ | 12942/17285 [115:51:34<38:50:33, 32.20s/it] 75%|███████▍ | 12943/17285 [115:52:05<38:32:44, 31.96s/it] 75%|███████▍ | 12944/17285 [115:52:43<40:44:45, 33.79s/it] 75%|███████▍ | 12945/17285 [115:53:18<41:16:48, 34.24s/it] 75%|███████▍ | 12946/17285 [115:54:00<43:46:51, 36.32s/it] 75%|███████▍ | 12947/17285 [115:54:34<42:59:48, 35.68s/it] 75%|███████▍ | 12948/17285 [115:55:10<43:08:31, 35.81s/it] 75%|███████▍ | 12949/17285 [115:55:36<39:33:40, 32.85s/it] 75%|███████▍ | 12950/17285 [115:56:02<37:12:49, 30.90s/it] {'loss': 1.263, 'learning_rate': 3.4511297636889095e-05, 'epoch': 2.25} + 75%|███████▍ | 12950/17285 [115:56:02<37:12:49, 30.90s/it] 75%|███████▍ | 12951/17285 [115:56:32<36:57:35, 30.70s/it] 75%|███████▍ | 12952/17285 [115:57:08<38:55:21, 32.34s/it] 75%|███████▍ | 12953/17285 [115:57:45<40:27:18, 33.62s/it] 75%|███████▍ | 12954/17285 [115:58:15<38:58:37, 32.40s/it] 75%|███████▍ | 12955/17285 [115:58:45<38:13:29, 31.78s/it] 75%|███████▍ | 12956/17285 [115:59:12<36:21:03, 30.23s/it] 75%|███████▍ | 12957/17285 [115:59:50<39:16:50, 32.67s/it] 75%|███████▍ | 12958/17285 [116:00:27<40:59:57, 34.11s/it] 75%|███████▍ | 12959/17285 [116:01:03<41:39:11, 34.66s/it] 75%|███████▍ | 12960/17285 [116:01:30<38:47:40, 32.29s/it] {'loss': 1.2896, 'learning_rate': 3.4366826496650886e-05, 'epoch': 2.25} + 75%|███████▍ | 12960/17285 [116:01:30<38:47:40, 32.29s/it] 75%|███████▍ | 12961/17285 [116:02:02<38:46:49, 32.29s/it] 75%|███████▍ | 12962/17285 [116:02:33<38:09:54, 31.78s/it] 75%|███████▍ | 12963/17285 [116:03:03<37:34:31, 31.30s/it] 75%|███████▌ | 12964/17285 [116:03:32<36:35:41, 30.49s/it] 75%|███████▌ | 12965/17285 [116:04:03<36:47:26, 30.66s/it] 75%|███████▌ | 12966/17285 [116:04:42<39:47:13, 33.16s/it] 75%|███████▌ | 12967/17285 [116:05:08<37:12:49, 31.03s/it] 75%|███████▌ | 12968/17285 [116:05:36<36:14:49, 30.23s/it] 75%|███████▌ | 12969/17285 [116:06:04<35:19:41, 29.47s/it] 75%|███████▌ | 12970/17285 [116:06:30<33:54:40, 28.29s/it] {'loss': 1.2919, 'learning_rate': 3.422259561377853e-05, 'epoch': 2.25} + 75%|███████▌ | 12970/17285 [116:06:30<33:54:40, 28.29s/it] 75%|███████▌ | 12971/17285 [116:06:58<34:02:39, 28.41s/it] 75%|███████▌ | 12972/17285 [116:07:30<35:21:36, 29.51s/it] 75%|███████▌ | 12973/17285 [116:08:08<38:10:09, 31.87s/it] 75%|███████▌ | 12974/17285 [116:08:44<39:42:57, 33.17s/it] 75%|███████▌ | 12975/17285 [116:09:23<41:52:14, 34.97s/it] 75%|███████▌ | 12976/17285 [116:09:55<40:47:32, 34.08s/it] 75%|███████▌ | 12977/17285 [116:10:36<43:25:06, 36.28s/it] 75%|███████▌ | 12978/17285 [116:11:12<43:01:26, 35.96s/it] 75%|███████▌ | 12979/17285 [116:11:49<43:28:44, 36.35s/it] 75%|███████▌ | 12980/17285 [116:12:22<42:12:18, 35.29s/it] {'loss': 1.2451, 'learning_rate': 3.4078605516244785e-05, 'epoch': 2.25} + 75%|███████▌ | 12980/17285 [116:12:22<42:12:18, 35.29s/it] 75%|███████▌ | 12981/17285 [116:12:48<38:48:32, 32.46s/it] 75%|███████▌ | 12982/17285 [116:13:17<37:34:16, 31.43s/it] 75%|███████▌ | 12983/17285 [116:13:46<36:45:54, 30.77s/it] 75%|███████▌ | 12984/17285 [116:14:11<34:46:37, 29.11s/it][2023-08-27 20:09:15,972] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 75%|███████▌ | 12985/17285 [116:14:38<34:05:06, 28.54s/it][2023-08-27 20:09:42,666] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 75%|███████▌ | 12986/17285 [116:15:05<33:25:02, 27.98s/it] 75%|███████▌ | 12987/17285 [116:15:34<33:52:32, 28.37s/it] 75%|███████▌ | 12988/17285 [116:16:02<33:42:39, 28.24s/it] 75%|███████▌ | 12989/17285 [116:16:33<34:47:06, 29.15s/it] 75%|███████▌ | 12990/17285 [116:17:07<36:28:49, 30.58s/it] {'loss': 1.3136, 'learning_rate': 3.396358715789669e-05, 'epoch': 2.25} + 75%|███████▌ | 12990/17285 [116:17:07<36:28:49, 30.58s/it] 75%|███████▌ | 12991/17285 [116:17:49<40:26:38, 33.91s/it] 75%|███████▌ | 12992/17285 [116:18:21<39:39:02, 33.25s/it] 75%|███████▌ | 12993/17285 [116:18:54<39:48:26, 33.39s/it] 75%|███████▌ | 12994/17285 [116:19:30<40:37:55, 34.09s/it] 75%|███████▌ | 12995/17285 [116:20:01<39:24:57, 33.08s/it] 75%|███████▌ | 12996/17285 [116:20:29<37:44:00, 31.67s/it] 75%|███████▌ | 12997/17285 [116:21:02<38:09:57, 32.04s/it][2023-08-27 20:16:08,602] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 75%|███████▌ | 12998/17285 [116:21:31<36:57:42, 31.04s/it] 75%|███████▌ | 12999/17285 [116:21:57<35:11:57, 29.57s/it] 75%|███████▌ | 13000/17285 [116:22:24<34:14:52, 28.77s/it] {'loss': 1.2834, 'learning_rate': 3.383437644428432e-05, 'epoch': 2.26} + 75%|███████▌ | 13000/17285 [116:22:24<34:14:52, 28.77s/it][INFO|trainer.py:3081] 2023-08-27 20:17:01,683 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-27 20:17:01,684 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-27 20:17:01,684 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-10000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-13000 +[INFO|tokenization_utils_base.py:2210] 2023-08-27 20:18:26,915 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-13000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-27 20:18:26,918 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-13000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-13000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-13000 + 75%|███████▌ | 13001/17285 [116:24:32<69:48:38, 58.66s/it] 75%|███████▌ | 13002/17285 [116:25:07<61:02:33, 51.31s/it] 75%|███████▌ | 13003/17285 [116:25:39<54:13:15, 45.59s/it] 75%|███████▌ | 13004/17285 [116:26:10<49:00:17, 41.21s/it] 75%|███████▌ | 13005/17285 [116:26:45<46:48:29, 39.37s/it] 75%|███████▌ | 13006/17285 [116:27:17<44:02:56, 37.06s/it] 75%|███████▌ | 13007/17285 [116:27:46<41:16:09, 34.73s/it] 75%|███████▌ | 13008/17285 [116:28:21<41:28:59, 34.92s/it] 75%|███████▌ | 13009/17285 [116:28:57<41:52:48, 35.26s/it] 75%|███████▌ | 13010/17285 [116:29:32<41:39:37, 35.08s/it] {'loss': 1.2585, 'learning_rate': 3.369103909748521e-05, 'epoch': 2.26} + 75%|███████▌ | 13010/17285 [116:29:32<41:39:37, 35.08s/it] 75%|███████▌ | 13011/17285 [116:29:58<38:38:05, 32.54s/it] 75%|███████▌ | 13012/17285 [116:30:34<39:47:14, 33.52s/it] 75%|███████▌ | 13013/17285 [116:31:10<40:38:31, 34.25s/it] 75%|███████▌ | 13014/17285 [116:31:41<39:15:00, 33.08s/it] 75%|███████▌ | 13015/17285 [116:32:12<38:39:57, 32.60s/it] 75%|███████▌ | 13016/17285 [116:32:46<38:57:55, 32.86s/it] 75%|███████▌ | 13017/17285 [116:33:17<38:37:01, 32.57s/it] 75%|███████▌ | 13018/17285 [116:33:48<37:59:30, 32.05s/it] 75%|███████▌ | 13019/17285 [116:34:14<35:51:40, 30.26s/it] 75%|███████▌ | 13020/17285 [116:34:48<36:53:19, 31.14s/it] {'loss': 1.27, 'learning_rate': 3.354794448184514e-05, 'epoch': 2.26} + 75%|███████▌ | 13020/17285 [116:34:48<36:53:19, 31.14s/it] 75%|███████▌ | 13021/17285 [116:35:20<37:10:51, 31.39s/it] 75%|███████▌ | 13022/17285 [116:35:50<36:59:29, 31.24s/it] 75%|███████▌ | 13023/17285 [116:36:21<36:54:23, 31.17s/it] 75%|███████▌ | 13024/17285 [116:36:50<36:06:04, 30.50s/it] 75%|███████▌ | 13025/17285 [116:37:17<34:37:11, 29.26s/it] 75%|███████▌ | 13026/17285 [116:37:49<35:50:05, 30.29s/it] 75%|███████▌ | 13027/17285 [116:38:17<34:55:54, 29.53s/it] 75%|███████▌ | 13028/17285 [116:38:48<35:25:10, 29.95s/it] 75%|███████▌ | 13029/17285 [116:39:17<35:07:08, 29.71s/it] 75%|███████▌ | 13030/17285 [116:39:44<34:04:03, 28.82s/it] {'loss': 1.2923, 'learning_rate': 3.340509312117752e-05, 'epoch': 2.26} + 75%|███████▌ | 13030/17285 [116:39:44<34:04:03, 28.82s/it] 75%|███████▌ | 13031/17285 [116:40:13<34:12:22, 28.95s/it] 75%|███████▌ | 13032/17285 [116:40:46<35:29:15, 30.04s/it] 75%|███████▌ | 13033/17285 [116:41:17<35:49:34, 30.33s/it] 75%|███████▌ | 13034/17285 [116:41:44<34:35:19, 29.29s/it] 75%|███████▌ | 13035/17285 [116:42:12<34:09:10, 28.93s/it] 75%|███████▌ | 13036/17285 [116:42:51<37:48:53, 32.04s/it] 75%|███████▌ | 13037/17285 [116:43:23<37:39:07, 31.91s/it] 75%|███████▌ | 13038/17285 [116:43:59<39:01:53, 33.09s/it] 75%|███████▌ | 13039/17285 [116:44:22<35:45:11, 30.31s/it] 75%|███████▌ | 13040/17285 [116:44:52<35:38:16, 30.22s/it] {'loss': 1.2711, 'learning_rate': 3.32624855384053e-05, 'epoch': 2.26} + 75%|███████▌ | 13040/17285 [116:44:52<35:38:16, 30.22s/it] 75%|███████▌ | 13041/17285 [116:45:22<35:19:18, 29.96s/it] 75%|███████▌ | 13042/17285 [116:45:54<36:01:16, 30.56s/it] 75%|███████▌ | 13043/17285 [116:46:26<36:34:30, 31.04s/it] 75%|███████▌ | 13044/17285 [116:46:59<37:15:31, 31.63s/it] 75%|███████▌ | 13045/17285 [116:47:30<37:12:18, 31.59s/it] 75%|███████▌ | 13046/17285 [116:48:06<38:31:57, 32.72s/it] 75%|███████▌ | 13047/17285 [116:48:41<39:21:11, 33.43s/it] 75%|███████▌ | 13048/17285 [116:49:16<39:53:37, 33.90s/it] 75%|███████▌ | 13049/17285 [116:49:48<39:24:46, 33.50s/it] 75%|███████▌ | 13050/17285 [116:50:17<37:49:29, 32.15s/it] {'loss': 1.2891, 'learning_rate': 3.3120122255559e-05, 'epoch': 2.26} + 75%|███████▌ | 13050/17285 [116:50:17<37:49:29, 32.15s/it] 76%|███████▌ | 13051/17285 [116:50:47<36:50:07, 31.32s/it] 76%|███████▌ | 13052/17285 [116:51:21<37:43:54, 32.09s/it] 76%|███████▌ | 13053/17285 [116:51:56<38:49:50, 33.03s/it] 76%|███████▌ | 13054/17285 [116:52:25<37:35:44, 31.99s/it] 76%|███████▌ | 13055/17285 [116:52:54<36:19:14, 30.91s/it] 76%|███████▌ | 13056/17285 [116:53:26<36:52:40, 31.39s/it] 76%|███████▌ | 13057/17285 [116:53:53<35:13:43, 30.00s/it] 76%|███████▌ | 13058/17285 [116:54:25<36:02:21, 30.69s/it] 76%|███████▌ | 13059/17285 [116:55:01<37:49:09, 32.22s/it] 76%|███████▌ | 13060/17285 [116:55:35<38:21:14, 32.68s/it] {'loss': 1.2635, 'learning_rate': 3.2978003793774914e-05, 'epoch': 2.27} + 76%|███████▌ | 13060/17285 [116:55:35<38:21:14, 32.68s/it] 76%|███████▌ | 13061/17285 [116:56:11<39:40:33, 33.81s/it] 76%|███████▌ | 13062/17285 [116:56:40<37:54:45, 32.32s/it] 76%|███████▌ | 13063/17285 [116:57:16<39:00:31, 33.26s/it] 76%|███████▌ | 13064/17285 [116:57:46<37:56:08, 32.35s/it] 76%|███████▌ | 13065/17285 [116:58:24<40:03:40, 34.18s/it] 76%|███████▌ | 13066/17285 [116:59:02<41:14:32, 35.19s/it] 76%|███████▌ | 13067/17285 [116:59:34<40:07:20, 34.24s/it] 76%|███████▌ | 13068/17285 [117:00:07<39:44:20, 33.92s/it] 76%|███████▌ | 13069/17285 [117:00:42<40:04:07, 34.21s/it] 76%|███████▌ | 13070/17285 [117:01:14<39:15:14, 33.53s/it] {'loss': 1.2926, 'learning_rate': 3.283613067329311e-05, 'epoch': 2.27} + 76%|███████▌ | 13070/17285 [117:01:14<39:15:14, 33.53s/it] 76%|███████▌ | 13071/17285 [117:01:39<36:26:02, 31.13s/it] 76%|███████▌ | 13072/17285 [117:02:10<36:03:24, 30.81s/it] 76%|███████▌ | 13073/17285 [117:02:41<36:15:12, 30.99s/it] 76%|███████▌ | 13074/17285 [117:03:17<38:11:27, 32.65s/it] 76%|███████▌ | 13075/17285 [117:03:43<35:43:43, 30.55s/it] 76%|███████▌ | 13076/17285 [117:04:15<36:21:12, 31.09s/it] 76%|███████▌ | 13077/17285 [117:04:46<36:12:08, 30.97s/it] 76%|███████▌ | 13078/17285 [117:05:27<39:32:29, 33.84s/it] 76%|███████▌ | 13079/17285 [117:05:52<36:31:01, 31.26s/it] 76%|███████▌ | 13080/17285 [117:06:27<37:49:55, 32.39s/it] {'loss': 1.2621, 'learning_rate': 3.269450341345558e-05, 'epoch': 2.27} + 76%|███████▌ | 13080/17285 [117:06:27<37:49:55, 32.39s/it] 76%|███████▌ | 13081/17285 [117:06:53<35:32:18, 30.43s/it] 76%|███████▌ | 13082/17285 [117:07:32<38:27:15, 32.94s/it] 76%|███████▌ | 13083/17285 [117:08:03<37:57:26, 32.52s/it] 76%|███████▌ | 13084/17285 [117:08:33<37:10:19, 31.85s/it] 76%|███████▌ | 13085/17285 [117:09:05<37:13:47, 31.91s/it] 76%|███████▌ | 13086/17285 [117:09:35<36:20:24, 31.16s/it] 76%|███████▌ | 13087/17285 [117:10:05<35:49:44, 30.73s/it] 76%|███████▌ | 13088/17285 [117:10:34<35:30:49, 30.46s/it] 76%|███████▌ | 13089/17285 [117:11:07<36:22:46, 31.21s/it] 76%|███████▌ | 13090/17285 [117:11:40<36:55:58, 31.69s/it] {'loss': 1.291, 'learning_rate': 3.2553122532704325e-05, 'epoch': 2.27} + 76%|███████▌ | 13090/17285 [117:11:40<36:55:58, 31.69s/it] 76%|███████▌ | 13091/17285 [117:12:20<39:37:04, 34.01s/it] 76%|███████▌ | 13092/17285 [117:12:54<39:53:42, 34.25s/it] 76%|███████▌ | 13093/17285 [117:13:26<39:00:54, 33.51s/it] 76%|███████▌ | 13094/17285 [117:14:01<39:35:36, 34.01s/it] 76%|███████▌ | 13095/17285 [117:14:28<36:59:20, 31.78s/it] 76%|███████▌ | 13096/17285 [117:15:01<37:31:38, 32.25s/it] 76%|███████▌ | 13097/17285 [117:15:31<36:32:34, 31.41s/it] 76%|███████▌ | 13098/17285 [117:16:02<36:34:13, 31.44s/it] 76%|███████▌ | 13099/17285 [117:16:30<35:25:36, 30.47s/it] 76%|███████▌ | 13100/17285 [117:17:06<37:20:25, 32.12s/it] {'loss': 1.2658, 'learning_rate': 3.241198854857938e-05, 'epoch': 2.27} + 76%|███████▌ | 13100/17285 [117:17:06<37:20:25, 32.12s/it] 76%|███████▌ | 13101/17285 [117:17:36<36:26:13, 31.35s/it] 76%|███████▌ | 13102/17285 [117:18:03<34:57:03, 30.08s/it] 76%|███████▌ | 13103/17285 [117:18:36<35:45:16, 30.78s/it] 76%|███████▌ | 13104/17285 [117:19:07<35:54:27, 30.92s/it] 76%|███████▌ | 13105/17285 [117:19:42<37:17:00, 32.11s/it] 76%|███████▌ | 13106/17285 [117:20:18<38:42:42, 33.35s/it] 76%|███████▌ | 13107/17285 [117:20:48<37:41:01, 32.47s/it] 76%|███████▌ | 13108/17285 [117:21:24<38:48:42, 33.45s/it] 76%|███████▌ | 13109/17285 [117:21:53<37:09:22, 32.03s/it] 76%|███████▌ | 13110/17285 [117:22:32<39:31:09, 34.08s/it] {'loss': 1.2671, 'learning_rate': 3.227110197771703e-05, 'epoch': 2.28} + 76%|███████▌ | 13110/17285 [117:22:32<39:31:09, 34.08s/it] 76%|███████▌ | 13111/17285 [117:23:03<38:39:35, 33.34s/it] 76%|███████▌ | 13112/17285 [117:23:36<38:22:31, 33.11s/it] 76%|███████▌ | 13113/17285 [117:24:07<37:46:19, 32.59s/it] 76%|███████▌ | 13114/17285 [117:24:38<37:03:47, 31.99s/it] 76%|███████▌ | 13115/17285 [117:25:05<35:22:14, 30.54s/it] 76%|███████▌ | 13116/17285 [117:25:36<35:23:49, 30.57s/it] 76%|███████▌ | 13117/17285 [117:26:08<36:03:19, 31.14s/it] 76%|███████▌ | 13118/17285 [117:26:34<34:10:27, 29.52s/it] 76%|███████▌ | 13119/17285 [117:27:02<33:43:54, 29.15s/it] 76%|███████▌ | 13120/17285 [117:27:40<36:45:32, 31.77s/it] {'loss': 1.2686, 'learning_rate': 3.213046333584792e-05, 'epoch': 2.28} + 76%|███████▌ | 13120/17285 [117:27:40<36:45:32, 31.77s/it] 76%|███████▌ | 13121/17285 [117:28:11<36:19:11, 31.40s/it] 76%|███████▌ | 13122/17285 [117:28:46<37:35:20, 32.51s/it] 76%|███████▌ | 13123/17285 [117:29:14<36:17:24, 31.39s/it] 76%|███████▌ | 13124/17285 [117:29:41<34:32:10, 29.88s/it] 76%|███████▌ | 13125/17285 [117:30:29<40:50:54, 35.35s/it] 76%|███████▌ | 13126/17285 [117:31:04<40:56:06, 35.43s/it] 76%|███████▌ | 13127/17285 [117:31:35<39:19:52, 34.05s/it] 76%|███████▌ | 13128/17285 [117:32:12<40:09:51, 34.78s/it] 76%|███████▌ | 13129/17285 [117:32:54<42:43:13, 37.01s/it] 76%|███████▌ | 13130/17285 [117:33:22<39:25:13, 34.15s/it] {'loss': 1.2723, 'learning_rate': 3.1990073137795066e-05, 'epoch': 2.28} + 76%|███████▌ | 13130/17285 [117:33:22<39:25:13, 34.15s/it] 76%|███████▌ | 13131/17285 [117:34:02<41:26:51, 35.92s/it] 76%|███████▌ | 13132/17285 [117:34:34<40:04:44, 34.74s/it] 76%|███████▌ | 13133/17285 [117:35:03<38:16:06, 33.18s/it] 76%|███████▌ | 13134/17285 [117:35:51<43:16:31, 37.53s/it] 76%|███████▌ | 13135/17285 [117:36:27<42:57:50, 37.27s/it][2023-08-27 21:31:33,470] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 76%|███████▌ | 13136/17285 [117:36:56<39:52:25, 34.60s/it] 76%|███████▌ | 13137/17285 [117:37:28<39:05:23, 33.93s/it] 76%|███████▌ | 13138/17285 [117:38:03<39:16:08, 34.09s/it] 76%|███████▌ | 13139/17285 [117:38:37<39:15:24, 34.09s/it] 76%|███████▌ | 13140/17285 [117:39:13<39:53:17, 34.64s/it] {'loss': 1.3037, 'learning_rate': 3.186393480377876e-05, 'epoch': 2.28} + 76%|███████▌ | 13140/17285 [117:39:13<39:53:17, 34.64s/it] 76%|███████▌ | 13141/17285 [117:39:39<36:55:20, 32.08s/it] 76%|███████▌ | 13142/17285 [117:40:14<37:59:56, 33.02s/it] 76%|███████▌ | 13143/17285 [117:40:48<38:24:51, 33.39s/it] 76%|███████▌ | 13144/17285 [117:41:14<35:37:06, 30.97s/it] 76%|███████▌ | 13145/17285 [117:41:39<33:39:28, 29.27s/it] 76%|███████▌ | 13146/17285 [117:42:05<32:32:58, 28.31s/it] 76%|███████▌ | 13147/17285 [117:42:33<32:19:46, 28.13s/it] 76%|███████▌ | 13148/17285 [117:43:02<32:48:24, 28.55s/it] 76%|███████▌ | 13149/17285 [117:43:38<35:17:37, 30.72s/it] 76%|███████▌ | 13150/17285 [117:44:11<35:57:11, 31.30s/it] {'loss': 1.284, 'learning_rate': 3.172401806405554e-05, 'epoch': 2.28} + 76%|███████▌ | 13150/17285 [117:44:11<35:57:11, 31.30s/it] 76%|███████▌ | 13151/17285 [117:44:39<34:55:54, 30.42s/it] 76%|███████▌ | 13152/17285 [117:45:10<35:09:54, 30.63s/it] 76%|███████▌ | 13153/17285 [117:45:42<35:32:22, 30.96s/it] 76%|███████▌ | 13154/17285 [117:46:14<35:59:35, 31.37s/it] 76%|███████▌ | 13155/17285 [117:46:45<35:41:45, 31.12s/it] 76%|███████▌ | 13156/17285 [117:47:11<33:56:47, 29.60s/it] 76%|███████▌ | 13157/17285 [117:47:46<36:01:04, 31.41s/it] 76%|███████▌ | 13158/17285 [117:48:14<34:52:11, 30.42s/it] 76%|███████▌ | 13159/17285 [117:48:49<36:20:58, 31.72s/it] 76%|███████▌ | 13160/17285 [117:49:17<35:01:53, 30.57s/it] {'loss': 1.2582, 'learning_rate': 3.1584351255985664e-05, 'epoch': 2.28} + 76%|███████▌ | 13160/17285 [117:49:17<35:01:53, 30.57s/it] 76%|███████▌ | 13161/17285 [117:49:51<36:20:28, 31.72s/it] 76%|███████▌ | 13162/17285 [117:50:25<37:02:45, 32.35s/it] 76%|███████▌ | 13163/17285 [117:50:59<37:35:47, 32.84s/it] 76%|███████▌ | 13164/17285 [117:51:27<35:51:22, 31.32s/it] 76%|███████▌ | 13165/17285 [117:51:55<34:49:54, 30.44s/it] 76%|███████▌ | 13166/17285 [117:52:25<34:31:34, 30.18s/it] 76%|███████▌ | 13167/17285 [117:52:49<32:28:25, 28.39s/it] 76%|███████▌ | 13168/17285 [117:53:20<33:09:33, 29.00s/it] 76%|███████▌ | 13169/17285 [117:53:58<36:22:34, 31.82s/it] 76%|███████▌ | 13170/17285 [117:54:29<36:04:13, 31.56s/it] {'loss': 1.3076, 'learning_rate': 3.144493489083469e-05, 'epoch': 2.29} + 76%|███████▌ | 13170/17285 [117:54:29<36:04:13, 31.56s/it] 76%|███████▌ | 13171/17285 [117:54:54<33:58:04, 29.72s/it] 76%|███████▌ | 13172/17285 [117:55:23<33:27:01, 29.28s/it] 76%|███████▌ | 13173/17285 [117:55:59<35:55:37, 31.45s/it] 76%|███████▌ | 13174/17285 [117:56:27<34:49:13, 30.49s/it] 76%|███████▌ | 13175/17285 [117:56:59<35:02:46, 30.70s/it] 76%|███████▌ | 13176/17285 [117:57:31<35:29:51, 31.10s/it] 76%|███████▌ | 13177/17285 [117:58:06<36:51:21, 32.30s/it] 76%|███████▌ | 13178/17285 [117:58:36<36:03:01, 31.60s/it] 76%|███████▌ | 13179/17285 [117:59:02<34:07:32, 29.92s/it] 76%|███████▋ | 13180/17285 [117:59:28<32:47:31, 28.76s/it] {'loss': 1.3193, 'learning_rate': 3.130576947895139e-05, 'epoch': 2.29} + 76%|███████▋ | 13180/17285 [117:59:28<32:47:31, 28.76s/it] 76%|███████▋ | 13181/17285 [117:59:57<33:03:32, 29.00s/it] 76%|███████▋ | 13182/17285 [118:00:28<33:41:41, 29.56s/it] 76%|███████▋ | 13183/17285 [118:00:57<33:15:23, 29.19s/it] 76%|███████▋ | 13184/17285 [118:01:30<34:44:24, 30.50s/it] 76%|███████▋ | 13185/17285 [118:02:05<36:20:52, 31.92s/it] 76%|███████▋ | 13186/17285 [118:02:39<36:48:18, 32.32s/it] 76%|███████▋ | 13187/17285 [118:03:05<34:52:58, 30.64s/it] 76%|███████▋ | 13188/17285 [118:03:35<34:38:43, 30.44s/it] 76%|███████▋ | 13189/17285 [118:04:02<33:31:24, 29.46s/it] 76%|███████▋ | 13190/17285 [118:04:36<34:56:58, 30.73s/it] {'loss': 1.2829, 'learning_rate': 3.1166855529765825e-05, 'epoch': 2.29} + 76%|███████▋ | 13190/17285 [118:04:36<34:56:58, 30.73s/it] 76%|███████▋ | 13191/17285 [118:05:07<34:57:00, 30.73s/it] 76%|███████▋ | 13192/17285 [118:05:42<36:32:23, 32.14s/it] 76%|███████▋ | 13193/17285 [118:06:16<37:02:00, 32.58s/it] 76%|███████▋ | 13194/17285 [118:06:49<37:16:47, 32.81s/it] 76%|███████▋ | 13195/17285 [118:07:31<40:28:50, 35.63s/it] 76%|███████▋ | 13196/17285 [118:07:57<37:04:56, 32.65s/it] 76%|███████▋ | 13197/17285 [118:08:33<38:16:06, 33.70s/it] 76%|███████▋ | 13198/17285 [118:09:01<36:20:47, 32.02s/it] 76%|███████▋ | 13199/17285 [118:09:35<36:57:46, 32.57s/it] 76%|███████▋ | 13200/17285 [118:10:15<39:17:43, 34.63s/it] {'loss': 1.3018, 'learning_rate': 3.102819355178763e-05, 'epoch': 2.29} + 76%|███████▋ | 13200/17285 [118:10:15<39:17:43, 34.63s/it] 76%|███████▋ | 13201/17285 [118:10:55<41:09:47, 36.28s/it] 76%|███████▋ | 13202/17285 [118:11:37<43:04:06, 37.97s/it] 76%|███████▋ | 13203/17285 [118:12:06<39:56:25, 35.22s/it] 76%|███████▋ | 13204/17285 [118:12:41<39:55:06, 35.21s/it] 76%|███████▋ | 13205/17285 [118:13:18<40:37:31, 35.85s/it] 76%|███████▋ | 13206/17285 [118:13:53<40:10:42, 35.46s/it] 76%|███████▋ | 13207/17285 [118:14:23<38:23:30, 33.89s/it] 76%|███████▋ | 13208/17285 [118:14:55<37:54:02, 33.47s/it] 76%|███████▋ | 13209/17285 [118:15:26<36:48:41, 32.51s/it] 76%|███████▋ | 13210/17285 [118:15:59<37:15:08, 32.91s/it] {'loss': 1.2842, 'learning_rate': 3.0889784052604066e-05, 'epoch': 2.29} + 76%|███████▋ | 13210/17285 [118:15:59<37:15:08, 32.91s/it] 76%|███████▋ | 13211/17285 [118:16:30<36:24:20, 32.17s/it] 76%|███████▋ | 13212/17285 [118:17:10<39:09:42, 34.61s/it] 76%|███████▋ | 13213/17285 [118:17:47<39:53:58, 35.27s/it] 76%|███████▋ | 13214/17285 [118:18:17<38:02:13, 33.64s/it] 76%|███████▋ | 13215/17285 [118:18:47<37:01:09, 32.74s/it] 76%|███████▋ | 13216/17285 [118:19:19<36:29:04, 32.28s/it] 76%|███████▋ | 13217/17285 [118:19:56<38:16:08, 33.87s/it] 76%|███████▋ | 13218/17285 [118:20:32<38:57:40, 34.49s/it] 76%|███████▋ | 13219/17285 [118:21:00<36:38:11, 32.44s/it] 76%|███████▋ | 13220/17285 [118:21:36<37:49:40, 33.50s/it] {'loss': 1.2816, 'learning_rate': 3.075162753887814e-05, 'epoch': 2.29} + 76%|███████▋ | 13220/17285 [118:21:36<37:49:40, 33.50s/it] 76%|███████▋ | 13221/17285 [118:22:01<35:08:45, 31.13s/it] 76%|███████▋ | 13222/17285 [118:22:32<34:58:13, 30.99s/it] 76%|███████▋ | 13223/17285 [118:23:02<34:31:04, 30.59s/it] 77%|███████▋ | 13224/17285 [118:23:37<36:10:13, 32.06s/it] 77%|███████▋ | 13225/17285 [118:24:10<36:15:31, 32.15s/it] 77%|███████▋ | 13226/17285 [118:24:34<33:46:38, 29.96s/it] 77%|███████▋ | 13227/17285 [118:25:15<37:28:22, 33.24s/it] 77%|███████▋ | 13228/17285 [118:25:47<36:57:55, 32.80s/it] 77%|███████▋ | 13229/17285 [118:26:13<34:29:32, 30.61s/it] 77%|███████▋ | 13230/17285 [118:26:46<35:33:00, 31.56s/it] {'loss': 1.2915, 'learning_rate': 3.061372451634678e-05, 'epoch': 2.3} + 77%|███████▋ | 13230/17285 [118:26:46<35:33:00, 31.56s/it] 77%|███████▋ | 13231/17285 [118:27:23<37:17:53, 33.12s/it] 77%|███████▋ | 13232/17285 [118:27:54<36:31:37, 32.44s/it] 77%|███████▋ | 13233/17285 [118:28:28<37:08:28, 33.00s/it] 77%|███████▋ | 13234/17285 [118:29:08<39:30:50, 35.11s/it] 77%|███████▋ | 13235/17285 [118:29:37<37:26:06, 33.28s/it] 77%|███████▋ | 13236/17285 [118:30:10<37:07:45, 33.01s/it][2023-08-27 22:25:31,920] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 77%|███████▋ | 13237/17285 [118:30:54<40:59:19, 36.45s/it] 77%|███████▋ | 13238/17285 [118:31:29<40:32:14, 36.06s/it][2023-08-27 22:26:42,806] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 77%|███████▋ | 13239/17285 [118:32:05<40:25:12, 35.96s/it] 77%|███████▋ | 13240/17285 [118:32:31<37:03:50, 32.99s/it] {'loss': 1.2235, 'learning_rate': 3.0503584951244668e-05, 'epoch': 2.3} + 77%|███████▋ | 13240/17285 [118:32:31<37:03:50, 32.99s/it] 77%|███████▋ | 13241/17285 [118:33:06<37:39:16, 33.52s/it] 77%|███████▋ | 13242/17285 [118:33:39<37:30:34, 33.40s/it] 77%|███████▋ | 13243/17285 [118:34:09<36:30:28, 32.52s/it] 77%|███████▋ | 13244/17285 [118:34:40<35:55:44, 32.01s/it] 77%|███████▋ | 13245/17285 [118:35:08<34:24:22, 30.66s/it] 77%|███████▋ | 13246/17285 [118:35:39<34:33:01, 30.80s/it] 77%|███████▋ | 13247/17285 [118:36:13<35:33:10, 31.70s/it] 77%|███████▋ | 13248/17285 [118:36:52<38:10:59, 34.05s/it] 77%|███████▋ | 13249/17285 [118:37:28<38:34:08, 34.40s/it] 77%|███████▋ | 13250/17285 [118:37:59<37:38:21, 33.58s/it] {'loss': 1.2766, 'learning_rate': 3.0366139484357482e-05, 'epoch': 2.3} + 77%|███████▋ | 13250/17285 [118:37:59<37:38:21, 33.58s/it] 77%|███████▋ | 13251/17285 [118:38:27<35:42:13, 31.86s/it] 77%|███████▋ | 13252/17285 [118:38:54<34:05:51, 30.44s/it] 77%|███████▋ | 13253/17285 [118:39:25<34:08:22, 30.48s/it] 77%|███████▋ | 13254/17285 [118:39:56<34:28:49, 30.79s/it] 77%|███████▋ | 13255/17285 [118:40:21<32:31:43, 29.06s/it] 77%|███████▋ | 13256/17285 [118:40:56<34:34:08, 30.89s/it] 77%|███████▋ | 13257/17285 [118:41:28<34:57:41, 31.25s/it] 77%|███████▋ | 13258/17285 [118:42:02<35:41:26, 31.91s/it] 77%|███████▋ | 13259/17285 [118:42:35<36:11:04, 32.36s/it] 77%|███████▋ | 13260/17285 [118:43:15<38:34:01, 34.49s/it] {'loss': 1.2729, 'learning_rate': 3.0228948919785782e-05, 'epoch': 2.3} + 77%|███████▋ | 13260/17285 [118:43:15<38:34:01, 34.49s/it] 77%|███████▋ | 13261/17285 [118:43:47<37:40:08, 33.70s/it] 77%|███████▋ | 13262/17285 [118:44:26<39:32:19, 35.38s/it] 77%|███████▋ | 13263/17285 [118:44:53<36:47:13, 32.93s/it] 77%|███████▋ | 13264/17285 [118:45:23<35:50:58, 32.10s/it] 77%|███████▋ | 13265/17285 [118:45:54<35:18:16, 31.62s/it] 77%|███████▋ | 13266/17285 [118:46:21<33:55:10, 30.38s/it] 77%|███████▋ | 13267/17285 [118:46:54<34:33:25, 30.96s/it] 77%|███████▋ | 13268/17285 [118:47:25<34:38:17, 31.04s/it] 77%|███████▋ | 13269/17285 [118:47:55<34:24:19, 30.84s/it] 77%|███████▋ | 13270/17285 [118:48:28<35:04:30, 31.45s/it] {'loss': 1.2784, 'learning_rate': 3.0092013759730564e-05, 'epoch': 2.3} + 77%|███████▋ | 13270/17285 [118:48:28<35:04:30, 31.45s/it] 77%|███████▋ | 13271/17285 [118:49:07<37:38:07, 33.75s/it] 77%|███████▋ | 13272/17285 [118:49:33<34:47:04, 31.20s/it] 77%|███████▋ | 13273/17285 [118:50:08<36:15:45, 32.54s/it] 77%|███████▋ | 13274/17285 [118:50:41<36:12:40, 32.50s/it] 77%|███████▋ | 13275/17285 [118:51:11<35:21:54, 31.75s/it] 77%|███████▋ | 13276/17285 [118:51:46<36:31:20, 32.80s/it] 77%|███████▋ | 13277/17285 [118:52:25<38:48:47, 34.86s/it] 77%|███████▋ | 13278/17285 [118:52:52<35:59:03, 32.33s/it] 77%|███████▋ | 13279/17285 [118:53:24<35:45:57, 32.14s/it] 77%|███████▋ | 13280/17285 [118:53:55<35:38:46, 32.04s/it] {'loss': 1.2827, 'learning_rate': 2.9955334505457845e-05, 'epoch': 2.3} + 77%|███████▋ | 13280/17285 [118:53:55<35:38:46, 32.04s/it] 77%|███████▋ | 13281/17285 [118:54:40<39:43:38, 35.72s/it] 77%|███████▋ | 13282/17285 [118:55:09<37:27:27, 33.69s/it] 77%|███████▋ | 13283/17285 [118:55:37<35:31:46, 31.96s/it] 77%|███████▋ | 13284/17285 [118:56:13<36:52:45, 33.18s/it] 77%|███████▋ | 13285/17285 [118:56:39<34:30:25, 31.06s/it] 77%|███████▋ | 13286/17285 [118:57:11<35:03:23, 31.56s/it] 77%|███████▋ | 13287/17285 [118:57:38<33:27:51, 30.13s/it] 77%|███████▋ | 13288/17285 [118:58:09<33:49:14, 30.46s/it] 77%|███████▋ | 13289/17285 [118:58:40<33:58:00, 30.60s/it] 77%|███████▋ | 13290/17285 [118:59:14<34:58:54, 31.52s/it] {'loss': 1.279, 'learning_rate': 2.981891165729691e-05, 'epoch': 2.31} + 77%|███████▋ | 13290/17285 [118:59:14<34:58:54, 31.52s/it] 77%|███████▋ | 13291/17285 [118:59:45<34:45:03, 31.32s/it] 77%|███████▋ | 13292/17285 [119:00:16<34:41:09, 31.27s/it] 77%|███████▋ | 13293/17285 [119:00:53<36:34:04, 32.98s/it] 77%|███████▋ | 13294/17285 [119:01:31<38:14:36, 34.50s/it] 77%|███████▋ | 13295/17285 [119:02:04<37:34:27, 33.90s/it] 77%|███████▋ | 13296/17285 [119:02:37<37:16:35, 33.64s/it] 77%|███████▋ | 13297/17285 [119:03:07<36:03:42, 32.55s/it] 77%|███████▋ | 13298/17285 [119:03:36<35:02:04, 31.63s/it] 77%|███████▋ | 13299/17285 [119:04:01<32:55:07, 29.73s/it] 77%|███████▋ | 13300/17285 [119:04:33<33:29:28, 30.26s/it] {'loss': 1.2917, 'learning_rate': 2.9682745714638417e-05, 'epoch': 2.31} + 77%|███████▋ | 13300/17285 [119:04:33<33:29:28, 30.26s/it] 77%|███████▋ | 13301/17285 [119:05:06<34:29:21, 31.17s/it] 77%|███████▋ | 13302/17285 [119:05:56<40:32:19, 36.64s/it] 77%|███████▋ | 13303/17285 [119:06:28<39:10:47, 35.42s/it] 77%|███████▋ | 13304/17285 [119:07:06<39:49:14, 36.01s/it] 77%|███████▋ | 13305/17285 [119:07:35<37:27:55, 33.89s/it] 77%|███████▋ | 13306/17285 [119:08:06<36:37:08, 33.13s/it] 77%|███████▋ | 13307/17285 [119:08:45<38:44:06, 35.05s/it] 77%|███████▋ | 13308/17285 [119:09:20<38:29:55, 34.85s/it] 77%|███████▋ | 13309/17285 [119:09:50<36:48:28, 33.33s/it] 77%|███████▋ | 13310/17285 [119:10:21<36:04:06, 32.67s/it] {'loss': 1.2764, 'learning_rate': 2.9546837175932596e-05, 'epoch': 2.31} + 77%|███████▋ | 13310/17285 [119:10:21<36:04:06, 32.67s/it] 77%|███████▋ | 13311/17285 [119:11:01<38:30:27, 34.88s/it] 77%|███████▋ | 13312/17285 [119:11:41<40:09:43, 36.39s/it] 77%|███████▋ | 13313/17285 [119:12:17<40:00:53, 36.27s/it] 77%|███████▋ | 13314/17285 [119:12:50<39:10:12, 35.51s/it] 77%|███████▋ | 13315/17285 [119:13:23<38:04:07, 34.52s/it] 77%|███████▋ | 13316/17285 [119:14:00<39:00:24, 35.38s/it] 77%|███████▋ | 13317/17285 [119:14:35<38:53:38, 35.29s/it] 77%|███████▋ | 13318/17285 [119:15:08<38:14:55, 34.71s/it] 77%|███████▋ | 13319/17285 [119:15:42<37:52:14, 34.38s/it] 77%|███████▋ | 13320/17285 [119:16:14<37:03:58, 33.65s/it] {'loss': 1.3066, 'learning_rate': 2.941118653868744e-05, 'epoch': 2.31} + 77%|███████▋ | 13320/17285 [119:16:14<37:03:58, 33.65s/it] 77%|███████▋ | 13321/17285 [119:16:48<37:17:43, 33.87s/it] 77%|███████▋ | 13322/17285 [119:17:18<35:46:11, 32.49s/it][2023-08-27 23:12:27,496] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 77%|███████▋ | 13323/17285 [119:17:50<35:39:10, 32.40s/it] 77%|███████▋ | 13324/17285 [119:18:19<34:28:46, 31.34s/it] 77%|███████▋ | 13325/17285 [119:18:56<36:17:18, 32.99s/it] 77%|███████▋ | 13326/17285 [119:19:39<39:41:51, 36.10s/it] 77%|███████▋ | 13327/17285 [119:20:04<36:02:20, 32.78s/it] 77%|███████▋ | 13328/17285 [119:20:30<33:48:18, 30.76s/it] 77%|███████▋ | 13329/17285 [119:20:58<33:01:51, 30.06s/it] 77%|███████▋ | 13330/17285 [119:21:30<33:35:06, 30.57s/it] {'loss': 1.2945, 'learning_rate': 2.9289321881345254e-05, 'epoch': 2.31} + 77%|███████▋ | 13330/17285 [119:21:30<33:35:06, 30.57s/it] 77%|███████▋ | 13331/17285 [119:21:59<32:56:15, 29.99s/it] 77%|███████▋ | 13332/17285 [119:22:35<35:01:19, 31.89s/it] 77%|███████▋ | 13333/17285 [119:23:10<36:06:59, 32.90s/it] 77%|███████▋ | 13334/17285 [119:23:41<35:25:50, 32.28s/it] 77%|███████▋ | 13335/17285 [119:24:14<35:34:47, 32.43s/it] 77%|███████▋ | 13336/17285 [119:24:49<36:28:07, 33.25s/it] 77%|███████▋ | 13337/17285 [119:25:16<34:15:55, 31.24s/it] 77%|███████▋ | 13338/17285 [119:25:51<35:28:25, 32.36s/it] 77%|███████▋ | 13339/17285 [119:26:21<34:40:52, 31.64s/it] 77%|███████▋ | 13340/17285 [119:26:49<33:36:49, 30.67s/it] {'loss': 1.2654, 'learning_rate': 2.9154162624127146e-05, 'epoch': 2.32} + 77%|███████▋ | 13340/17285 [119:26:49<33:36:49, 30.67s/it] 77%|███████▋ | 13341/17285 [119:27:16<32:30:42, 29.68s/it] 77%|███████▋ | 13342/17285 [119:27:54<35:11:23, 32.13s/it] 77%|███████▋ | 13343/17285 [119:28:23<34:05:20, 31.13s/it] 77%|███████▋ | 13344/17285 [119:28:57<35:08:12, 32.10s/it] 77%|███████▋ | 13345/17285 [119:29:25<33:30:22, 30.61s/it] 77%|███████▋ | 13346/17285 [119:30:03<35:56:39, 32.85s/it] 77%|███████▋ | 13347/17285 [119:30:43<38:19:03, 35.03s/it] 77%|███████▋ | 13348/17285 [119:31:14<37:06:05, 33.93s/it] 77%|███████▋ | 13349/17285 [119:31:48<37:09:30, 33.99s/it] 77%|███████▋ | 13350/17285 [119:32:26<38:28:40, 35.20s/it] {'loss': 1.2715, 'learning_rate': 2.9019262705797567e-05, 'epoch': 2.32} + 77%|███████▋ | 13350/17285 [119:32:26<38:28:40, 35.20s/it] 77%|███████▋ | 13351/17285 [119:32:55<36:16:53, 33.20s/it] 77%|███████▋ | 13352/17285 [119:33:39<39:53:01, 36.51s/it] 77%|███████▋ | 13353/17285 [119:34:10<38:12:48, 34.99s/it] 77%|███████▋ | 13354/17285 [119:34:46<38:23:13, 35.15s/it] 77%|███████▋ | 13355/17285 [119:35:12<35:16:25, 32.31s/it] 77%|███████▋ | 13356/17285 [119:35:51<37:37:35, 34.48s/it] 77%|███████▋ | 13357/17285 [119:36:22<36:23:30, 33.35s/it] 77%|███████▋ | 13358/17285 [119:36:47<33:41:48, 30.89s/it] 77%|███████▋ | 13359/17285 [119:37:22<34:59:04, 32.08s/it] 77%|███████▋ | 13360/17285 [119:37:48<33:04:53, 30.34s/it] {'loss': 1.3311, 'learning_rate': 2.888462262017233e-05, 'epoch': 2.32} + 77%|███████▋ | 13360/17285 [119:37:48<33:04:53, 30.34s/it] 77%|███████▋ | 13361/17285 [119:38:24<34:53:43, 32.01s/it] 77%|███████▋ | 13362/17285 [119:38:56<34:54:29, 32.03s/it] 77%|███████▋ | 13363/17285 [119:39:26<34:18:11, 31.49s/it] 77%|███████▋ | 13364/17285 [119:39:56<33:39:39, 30.91s/it] 77%|███████▋ | 13365/17285 [119:40:27<33:37:49, 30.89s/it] 77%|███████▋ | 13366/17285 [119:40:57<33:27:26, 30.73s/it] 77%|███████▋ | 13367/17285 [119:41:27<33:09:27, 30.47s/it] 77%|███████▋ | 13368/17285 [119:41:58<33:24:28, 30.70s/it] 77%|███████▋ | 13369/17285 [119:42:25<32:07:10, 29.53s/it] 77%|███████▋ | 13370/17285 [119:42:56<32:42:24, 30.08s/it] {'loss': 1.306, 'learning_rate': 2.875024286011615e-05, 'epoch': 2.32} + 77%|███████▋ | 13370/17285 [119:42:56<32:42:24, 30.08s/it] 77%|███████▋ | 13371/17285 [119:43:26<32:23:34, 29.79s/it] 77%|███████▋ | 13372/17285 [119:43:57<32:59:38, 30.35s/it] 77%|███████▋ | 13373/17285 [119:44:30<33:43:54, 31.04s/it] 77%|███████▋ | 13374/17285 [119:45:03<34:26:27, 31.70s/it] 77%|███████▋ | 13375/17285 [119:45:33<33:54:06, 31.21s/it] 77%|███████▋ | 13376/17285 [119:46:08<35:05:37, 32.32s/it] 77%|███████▋ | 13377/17285 [119:46:34<32:50:22, 30.25s/it] 77%|███████▋ | 13378/17285 [119:47:04<32:58:10, 30.38s/it] 77%|███████▋ | 13379/17285 [119:47:40<34:35:03, 31.87s/it] 77%|███████▋ | 13380/17285 [119:48:10<34:06:39, 31.45s/it] {'loss': 1.2865, 'learning_rate': 2.8616123917540673e-05, 'epoch': 2.32} + 77%|███████▋ | 13380/17285 [119:48:10<34:06:39, 31.45s/it] 77%|███████▋ | 13381/17285 [119:48:39<33:14:51, 30.66s/it] 77%|███████▋ | 13382/17285 [119:49:11<33:50:36, 31.22s/it] 77%|███████▋ | 13383/17285 [119:49:42<33:35:27, 30.99s/it] 77%|███████▋ | 13384/17285 [119:50:20<35:55:46, 33.16s/it] 77%|███████▋ | 13385/17285 [119:50:52<35:36:11, 32.86s/it] 77%|███████▋ | 13386/17285 [119:51:31<37:30:04, 34.63s/it] 77%|███████▋ | 13387/17285 [119:52:04<36:53:46, 34.08s/it] 77%|███████▋ | 13388/17285 [119:52:32<35:08:26, 32.46s/it] 77%|███████▋ | 13389/17285 [119:53:02<34:09:20, 31.56s/it] 77%|███████▋ | 13390/17285 [119:53:42<37:01:32, 34.22s/it] {'loss': 1.2676, 'learning_rate': 2.848226628340287e-05, 'epoch': 2.32} + 77%|███████▋ | 13390/17285 [119:53:42<37:01:32, 34.22s/it] 77%|███████▋ | 13391/17285 [119:54:12<35:28:59, 32.80s/it] 77%|███████▋ | 13392/17285 [119:54:47<36:14:40, 33.52s/it] 77%|███████▋ | 13393/17285 [119:55:18<35:15:50, 32.62s/it] 77%|███████▋ | 13394/17285 [119:55:53<36:06:37, 33.41s/it] 77%|███████▋ | 13395/17285 [119:56:32<37:55:51, 35.10s/it] 78%|███████▊ | 13396/17285 [119:57:10<38:52:09, 35.98s/it] 78%|███████▊ | 13397/17285 [119:57:44<38:25:29, 35.58s/it] 78%|███████▊ | 13398/17285 [119:58:14<36:29:05, 33.79s/it] 78%|███████▊ | 13399/17285 [119:58:50<37:06:06, 34.37s/it] 78%|███████▊ | 13400/17285 [119:59:18<35:01:06, 32.45s/it] {'loss': 1.2997, 'learning_rate': 2.8348670447703218e-05, 'epoch': 2.33} + 78%|███████▊ | 13400/17285 [119:59:18<35:01:06, 32.45s/it] 78%|███████▊ | 13401/17285 [119:59:46<33:47:41, 31.32s/it] 78%|███████▊ | 13402/17285 [120:00:17<33:23:47, 30.96s/it] 78%|███████▊ | 13403/17285 [120:00:52<34:53:41, 32.36s/it] 78%|███████▊ | 13404/17285 [120:01:31<37:06:30, 34.42s/it] 78%|███████▊ | 13405/17285 [120:02:02<35:57:12, 33.36s/it] 78%|███████▊ | 13406/17285 [120:02:32<34:46:46, 32.28s/it] 78%|███████▊ | 13407/17285 [120:03:04<34:48:30, 32.31s/it] 78%|███████▊ | 13408/17285 [120:03:37<34:44:26, 32.26s/it] 78%|███████▊ | 13409/17285 [120:04:05<33:35:00, 31.19s/it] 78%|███████▊ | 13410/17285 [120:04:35<33:02:54, 30.70s/it] {'loss': 1.2863, 'learning_rate': 2.8215336899483768e-05, 'epoch': 2.33} + 78%|███████▊ | 13410/17285 [120:04:35<33:02:54, 30.70s/it] 78%|███████▊ | 13411/17285 [120:05:02<31:45:44, 29.52s/it] 78%|███████▊ | 13412/17285 [120:05:32<32:01:53, 29.77s/it] 78%|███████▊ | 13413/17285 [120:06:05<33:02:19, 30.72s/it] 78%|███████▊ | 13414/17285 [120:06:36<33:01:01, 30.71s/it] 78%|███████▊ | 13415/17285 [120:07:06<32:49:59, 30.54s/it] 78%|███████▊ | 13416/17285 [120:07:31<31:13:42, 29.06s/it] 78%|███████▊ | 13417/17285 [120:08:04<32:31:08, 30.27s/it] 78%|███████▊ | 13418/17285 [120:08:33<31:47:58, 29.60s/it] 78%|███████▊ | 13419/17285 [120:09:01<31:21:02, 29.19s/it] 78%|███████▊ | 13420/17285 [120:09:44<35:46:30, 33.32s/it] {'loss': 1.2532, 'learning_rate': 2.808226612682646e-05, 'epoch': 2.33} + 78%|███████▊ | 13420/17285 [120:09:44<35:46:30, 33.32s/it] 78%|███████▊ | 13421/17285 [120:10:18<36:10:12, 33.70s/it] 78%|███████▊ | 13422/17285 [120:10:45<34:03:37, 31.74s/it] 78%|███████▊ | 13423/17285 [120:11:17<33:53:43, 31.60s/it] 78%|███████▊ | 13424/17285 [120:11:51<34:46:02, 32.42s/it] 78%|███████▊ | 13425/17285 [120:12:24<35:02:48, 32.69s/it] 78%|███████▊ | 13426/17285 [120:12:54<34:02:43, 31.76s/it] 78%|███████▊ | 13427/17285 [120:13:20<32:05:43, 29.95s/it] 78%|███████▊ | 13428/17285 [120:14:00<35:31:59, 33.17s/it] 78%|███████▊ | 13429/17285 [120:14:41<37:46:45, 35.27s/it] 78%|███████▊ | 13430/17285 [120:15:11<36:12:23, 33.81s/it] {'loss': 1.2524, 'learning_rate': 2.7949458616851343e-05, 'epoch': 2.33} + 78%|███████▊ | 13430/17285 [120:15:11<36:12:23, 33.81s/it] 78%|███████▊ | 13431/17285 [120:15:42<35:25:48, 33.09s/it] 78%|███████▊ | 13432/17285 [120:16:14<35:01:04, 32.72s/it] 78%|███████▊ | 13433/17285 [120:16:40<32:53:38, 30.74s/it] 78%|███████▊ | 13434/17285 [120:17:13<33:32:23, 31.35s/it] 78%|███████▊ | 13435/17285 [120:17:49<35:01:25, 32.75s/it] 78%|███████▊ | 13436/17285 [120:18:24<35:39:17, 33.35s/it] 78%|███████▊ | 13437/17285 [120:18:55<35:03:13, 32.79s/it] 78%|███████▊ | 13438/17285 [120:19:41<39:17:49, 36.77s/it] 78%|███████▊ | 13439/17285 [120:20:24<41:08:09, 38.50s/it] 78%|███████▊ | 13440/17285 [120:20:58<39:41:39, 37.16s/it] {'loss': 1.263, 'learning_rate': 2.781691485571475e-05, 'epoch': 2.33} + 78%|███████▊ | 13440/17285 [120:20:58<39:41:39, 37.16s/it] 78%|███████▊ | 13441/17285 [120:21:37<40:25:12, 37.85s/it] 78%|███████▊ | 13442/17285 [120:22:15<40:17:38, 37.75s/it] 78%|███████▊ | 13443/17285 [120:22:41<36:27:13, 34.16s/it] 78%|███████▊ | 13444/17285 [120:23:17<37:06:41, 34.78s/it] 78%|███████▊ | 13445/17285 [120:23:53<37:36:23, 35.26s/it] 78%|███████▊ | 13446/17285 [120:24:21<35:13:35, 33.03s/it] 78%|███████▊ | 13447/17285 [120:24:52<34:29:03, 32.35s/it] 78%|███████▊ | 13448/17285 [120:25:22<33:50:57, 31.76s/it] 78%|███████▊ | 13449/17285 [120:26:01<36:07:44, 33.91s/it] 78%|███████▊ | 13450/17285 [120:26:29<34:00:19, 31.92s/it] {'loss': 1.2607, 'learning_rate': 2.7684635328607477e-05, 'epoch': 2.33} + 78%|███████▊ | 13450/17285 [120:26:29<34:00:19, 31.92s/it] 78%|███████▊ | 13451/17285 [120:27:02<34:33:43, 32.45s/it] 78%|███████▊ | 13452/17285 [120:27:41<36:31:46, 34.31s/it] 78%|███████▊ | 13453/17285 [120:28:13<35:47:38, 33.63s/it] 78%|███████▊ | 13454/17285 [120:28:45<35:18:56, 33.19s/it] 78%|███████▊ | 13455/17285 [120:29:22<36:22:11, 34.19s/it] 78%|███████▊ | 13456/17285 [120:29:54<35:48:05, 33.66s/it] 78%|███████▊ | 13457/17285 [120:30:40<39:36:28, 37.25s/it] 78%|███████▊ | 13458/17285 [120:31:12<37:55:09, 35.67s/it] 78%|███████▊ | 13459/17285 [120:31:43<36:38:59, 34.48s/it] 78%|███████▊ | 13460/17285 [120:32:15<35:35:25, 33.50s/it] {'loss': 1.2505, 'learning_rate': 2.7552620519753137e-05, 'epoch': 2.34} + 78%|███████▊ | 13460/17285 [120:32:15<35:35:25, 33.50s/it] 78%|███████▊ | 13461/17285 [120:32:41<33:13:35, 31.28s/it] 78%|███████▊ | 13462/17285 [120:33:13<33:30:40, 31.56s/it] 78%|███████▊ | 13463/17285 [120:33:43<32:57:51, 31.05s/it] 78%|███████▊ | 13464/17285 [120:34:10<31:37:52, 29.80s/it] 78%|███████▊ | 13465/17285 [120:34:47<34:10:49, 32.21s/it] 78%|███████▊ | 13466/17285 [120:35:15<32:33:15, 30.69s/it] 78%|███████▊ | 13467/17285 [120:35:40<30:54:46, 29.15s/it] 78%|███████▊ | 13468/17285 [120:36:12<31:38:30, 29.84s/it] 78%|███████▊ | 13469/17285 [120:36:53<35:16:03, 33.27s/it] 78%|███████▊ | 13470/17285 [120:37:28<35:50:12, 33.82s/it] {'loss': 1.2687, 'learning_rate': 2.742087091240628e-05, 'epoch': 2.34} + 78%|███████▊ | 13470/17285 [120:37:28<35:50:12, 33.82s/it] 78%|███████▊ | 13471/17285 [120:38:00<35:24:44, 33.43s/it] 78%|███████▊ | 13472/17285 [120:38:43<38:10:42, 36.05s/it] 78%|███████▊ | 13473/17285 [120:39:16<37:28:08, 35.39s/it] 78%|███████▊ | 13474/17285 [120:39:52<37:32:47, 35.47s/it] 78%|███████▊ | 13475/17285 [120:40:19<34:54:50, 32.99s/it] 78%|███████▊ | 13476/17285 [120:40:50<34:02:14, 32.17s/it] 78%|███████▊ | 13477/17285 [120:41:22<34:08:52, 32.28s/it] 78%|███████▊ | 13478/17285 [120:41:48<32:00:14, 30.26s/it][2023-08-28 00:36:54,585] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 78%|███████▊ | 13479/17285 [120:42:17<31:39:44, 29.95s/it] 78%|███████▊ | 13480/17285 [120:42:44<30:44:49, 29.09s/it] {'loss': 1.2855, 'learning_rate': 2.7302523411710645e-05, 'epoch': 2.34} + 78%|███████▊ | 13480/17285 [120:42:44<30:44:49, 29.09s/it] 78%|███████▊ | 13481/17285 [120:43:23<33:53:38, 32.08s/it] 78%|███████▊ | 13482/17285 [120:43:54<33:29:59, 31.71s/it] 78%|███████▊ | 13483/17285 [120:44:24<32:54:19, 31.16s/it] 78%|███████▊ | 13484/17285 [120:44:57<33:37:15, 31.84s/it] 78%|███████▊ | 13485/17285 [120:45:28<33:18:35, 31.56s/it] 78%|███████▊ | 13486/17285 [120:46:01<33:44:10, 31.97s/it] 78%|███████▊ | 13487/17285 [120:46:26<31:28:12, 29.83s/it] 78%|███████▊ | 13488/17285 [120:47:00<32:42:30, 31.01s/it] 78%|███████▊ | 13489/17285 [120:47:28<31:54:29, 30.26s/it] 78%|███████▊ | 13490/17285 [120:48:05<33:59:07, 32.24s/it] {'loss': 1.2412, 'learning_rate': 2.7171279015116002e-05, 'epoch': 2.34} + 78%|███████▊ | 13490/17285 [120:48:05<33:59:07, 32.24s/it] 78%|███████▊ | 13491/17285 [120:48:37<33:46:22, 32.05s/it] 78%|███████▊ | 13492/17285 [120:49:11<34:35:08, 32.83s/it] 78%|███████▊ | 13493/17285 [120:49:44<34:39:08, 32.90s/it] 78%|███████▊ | 13494/17285 [120:50:18<34:52:17, 33.11s/it] 78%|███████▊ | 13495/17285 [120:50:51<34:57:22, 33.20s/it] 78%|███████▊ | 13496/17285 [120:51:16<32:17:33, 30.68s/it] 78%|███████▊ | 13497/17285 [120:51:43<31:02:27, 29.50s/it] 78%|███████▊ | 13498/17285 [120:52:16<32:09:58, 30.58s/it] 78%|███████▊ | 13499/17285 [120:52:46<32:08:05, 30.56s/it] 78%|███████▊ | 13500/17285 [120:53:20<33:13:16, 31.60s/it] {'loss': 1.2575, 'learning_rate': 2.7040301215970876e-05, 'epoch': 2.34} + 78%|███████▊ | 13500/17285 [120:53:20<33:13:16, 31.60s/it] 78%|███████▊ | 13501/17285 [120:53:59<35:28:22, 33.75s/it] 78%|███████▊ | 13502/17285 [120:54:30<34:38:41, 32.97s/it][2023-08-28 00:49:36,948] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 78%|███████▊ | 13503/17285 [120:54:59<33:20:19, 31.73s/it] 78%|███████▊ | 13504/17285 [120:55:36<34:55:04, 33.25s/it] 78%|███████▊ | 13505/17285 [120:56:05<33:32:06, 31.94s/it] 78%|███████▊ | 13506/17285 [120:56:38<33:53:12, 32.28s/it] 78%|███████▊ | 13507/17285 [120:57:08<33:06:11, 31.54s/it] 78%|███████▊ | 13508/17285 [120:57:36<32:03:53, 30.56s/it] 78%|███████▊ | 13509/17285 [120:58:05<31:32:17, 30.07s/it] 78%|███████▊ | 13510/17285 [120:58:40<33:09:51, 31.63s/it] {'loss': 1.3001, 'learning_rate': 2.6922649533852228e-05, 'epoch': 2.34} + 78%|███████▊ | 13510/17285 [120:58:40<33:09:51, 31.63s/it] 78%|███████▊ | 13511/17285 [120:59:10<32:31:00, 31.02s/it] 78%|███████▊ | 13512/17285 [120:59:38<31:35:03, 30.14s/it] 78%|███████▊ | 13513/17285 [121:00:19<34:51:32, 33.27s/it] 78%|███████▊ | 13514/17285 [121:00:52<34:52:06, 33.29s/it] 78%|███████▊ | 13515/17285 [121:01:18<32:32:59, 31.08s/it] 78%|███████▊ | 13516/17285 [121:01:51<33:21:36, 31.86s/it] 78%|███████▊ | 13517/17285 [121:02:27<34:32:27, 33.00s/it] 78%|███████▊ | 13518/17285 [121:03:04<35:50:12, 34.25s/it] 78%|███████▊ | 13519/17285 [121:03:38<35:32:59, 33.98s/it] 78%|███████▊ | 13520/17285 [121:04:03<32:40:10, 31.24s/it] {'loss': 1.2726, 'learning_rate': 2.6792179589961273e-05, 'epoch': 2.35} + 78%|███████▊ | 13520/17285 [121:04:03<32:40:10, 31.24s/it] 78%|███████▊ | 13521/17285 [121:04:35<33:00:12, 31.57s/it] 78%|███████▊ | 13522/17285 [121:05:06<32:47:03, 31.36s/it] 78%|███████▊ | 13523/17285 [121:05:41<34:06:43, 32.64s/it] 78%|███████▊ | 13524/17285 [121:06:16<34:37:27, 33.14s/it] 78%|███████▊ | 13525/17285 [121:06:47<34:07:03, 32.67s/it] 78%|███████▊ | 13526/17285 [121:07:17<33:15:08, 31.85s/it] 78%|███████▊ | 13527/17285 [121:07:51<33:43:37, 32.31s/it] 78%|███████▊ | 13528/17285 [121:08:31<36:15:10, 34.74s/it] 78%|███████▊ | 13529/17285 [121:09:10<37:35:21, 36.03s/it] 78%|███████▊ | 13530/17285 [121:09:38<35:04:39, 33.63s/it] {'loss': 1.2727, 'learning_rate': 2.66619776312545e-05, 'epoch': 2.35} + 78%|███████▊ | 13530/17285 [121:09:38<35:04:39, 33.63s/it] 78%|███████▊ | 13531/17285 [121:10:08<33:49:27, 32.44s/it] 78%|███████▊ | 13532/17285 [121:10:36<32:41:11, 31.35s/it] 78%|███████▊ | 13533/17285 [121:11:05<31:42:28, 30.42s/it] 78%|███████▊ | 13534/17285 [121:11:48<35:41:54, 34.26s/it] 78%|███████▊ | 13535/17285 [121:12:21<35:18:05, 33.89s/it] 78%|███████▊ | 13536/17285 [121:12:52<34:24:01, 33.03s/it] 78%|███████▊ | 13537/17285 [121:13:27<34:52:26, 33.50s/it] 78%|███████▊ | 13538/17285 [121:13:57<33:57:03, 32.62s/it] 78%|███████▊ | 13539/17285 [121:14:36<35:52:06, 34.47s/it] 78%|███████▊ | 13540/17285 [121:15:11<36:07:48, 34.73s/it] {'loss': 1.2466, 'learning_rate': 2.6532044134350288e-05, 'epoch': 2.35} + 78%|███████▊ | 13540/17285 [121:15:11<36:07:48, 34.73s/it] 78%|███████▊ | 13541/17285 [121:15:43<35:09:53, 33.81s/it] 78%|███████▊ | 13542/17285 [121:16:13<33:55:02, 32.62s/it] 78%|███████▊ | 13543/17285 [121:16:45<33:47:02, 32.50s/it] 78%|███████▊ | 13544/17285 [121:17:17<33:29:17, 32.23s/it] 78%|███████▊ | 13545/17285 [121:17:42<31:19:00, 30.14s/it] 78%|███████▊ | 13546/17285 [121:18:15<32:08:24, 30.95s/it] 78%|███████▊ | 13547/17285 [121:18:55<34:54:45, 33.62s/it] 78%|███████▊ | 13548/17285 [121:19:23<33:12:53, 32.00s/it] 78%|███████▊ | 13549/17285 [121:19:57<33:59:18, 32.75s/it] 78%|███████▊ | 13550/17285 [121:20:30<33:55:59, 32.71s/it] {'loss': 1.2975, 'learning_rate': 2.6402379574884418e-05, 'epoch': 2.35} + 78%|███████▊ | 13550/17285 [121:20:30<33:55:59, 32.71s/it] 78%|███████▊ | 13551/17285 [121:21:02<33:35:22, 32.38s/it] 78%|███████▊ | 13552/17285 [121:21:27<31:32:40, 30.42s/it] 78%|███████▊ | 13553/17285 [121:21:53<30:01:37, 28.96s/it] 78%|███████▊ | 13554/17285 [121:22:31<32:52:54, 31.73s/it] 78%|███████▊ | 13555/17285 [121:23:02<32:37:29, 31.49s/it] 78%|███████▊ | 13556/17285 [121:23:34<32:53:00, 31.75s/it] 78%|███████▊ | 13557/17285 [121:24:11<34:28:20, 33.29s/it] 78%|███████▊ | 13558/17285 [121:24:40<33:01:06, 31.89s/it] 78%|███████▊ | 13559/17285 [121:25:10<32:35:02, 31.48s/it] 78%|███████▊ | 13560/17285 [121:25:46<33:43:00, 32.59s/it] {'loss': 1.2745, 'learning_rate': 2.627298442750803e-05, 'epoch': 2.35} + 78%|███████▊ | 13560/17285 [121:25:46<33:43:00, 32.59s/it] 78%|███████▊ | 13561/17285 [121:26:22<34:51:35, 33.70s/it] 78%|███████▊ | 13562/17285 [121:26:59<36:01:02, 34.83s/it] 78%|███████▊ | 13563/17285 [121:27:35<36:08:37, 34.96s/it] 78%|███████▊ | 13564/17285 [121:28:13<37:12:14, 35.99s/it] 78%|███████▊ | 13565/17285 [121:28:55<38:54:42, 37.66s/it] 78%|███████▊ | 13566/17285 [121:29:22<35:47:48, 34.65s/it] 78%|███████▊ | 13567/17285 [121:29:53<34:35:49, 33.50s/it] 78%|███████▊ | 13568/17285 [121:30:30<35:31:35, 34.41s/it] 79%|███████▊ | 13569/17285 [121:30:55<32:38:11, 31.62s/it] 79%|███████▊ | 13570/17285 [121:31:26<32:36:01, 31.59s/it] {'loss': 1.2988, 'learning_rate': 2.614385916588613e-05, 'epoch': 2.36} + 79%|███████▊ | 13570/17285 [121:31:26<32:36:01, 31.59s/it] 79%|███████▊ | 13571/17285 [121:31:53<31:04:32, 30.12s/it] 79%|███████▊ | 13572/17285 [121:32:18<29:30:00, 28.60s/it] 79%|███████▊ | 13573/17285 [121:32:49<30:13:10, 29.31s/it] 79%|███████▊ | 13574/17285 [121:33:23<31:42:23, 30.76s/it] 79%|███████▊ | 13575/17285 [121:33:53<31:35:54, 30.66s/it] 79%|███████▊ | 13576/17285 [121:34:24<31:38:43, 30.72s/it] 79%|███████▊ | 13577/17285 [121:35:00<33:02:03, 32.07s/it] 79%|███████▊ | 13578/17285 [121:35:27<31:33:42, 30.65s/it] 79%|███████▊ | 13579/17285 [121:36:01<32:29:58, 31.57s/it] 79%|███████▊ | 13580/17285 [121:36:30<31:49:09, 30.92s/it] {'loss': 1.2541, 'learning_rate': 2.6015004262695798e-05, 'epoch': 2.36} + 79%|███████▊ | 13580/17285 [121:36:30<31:49:09, 30.92s/it] 79%|███████▊ | 13581/17285 [121:36:57<30:44:52, 29.88s/it] 79%|███████▊ | 13582/17285 [121:37:28<30:47:33, 29.94s/it] 79%|███████▊ | 13583/17285 [121:38:04<32:45:48, 31.86s/it] 79%|███████▊ | 13584/17285 [121:38:39<33:39:16, 32.74s/it] 79%|███████▊ | 13585/17285 [121:39:10<33:21:22, 32.45s/it] 79%|███████▊ | 13586/17285 [121:39:44<33:36:25, 32.71s/it] 79%|███████▊ | 13587/17285 [121:40:15<33:08:05, 32.26s/it] 79%|███████▊ | 13588/17285 [121:40:39<30:41:48, 29.89s/it] 79%|███████▊ | 13589/17285 [121:41:04<29:12:31, 28.45s/it] 79%|███████▊ | 13590/17285 [121:41:36<30:14:07, 29.46s/it] {'loss': 1.2596, 'learning_rate': 2.5886420189624407e-05, 'epoch': 2.36} + 79%|███████▊ | 13590/17285 [121:41:36<30:14:07, 29.46s/it] 79%|███████▊ | 13591/17285 [121:42:07<30:46:19, 29.99s/it] 79%|███████▊ | 13592/17285 [121:42:35<30:03:32, 29.30s/it] 79%|███████▊ | 13593/17285 [121:43:08<31:01:38, 30.25s/it] 79%|███████▊ | 13594/17285 [121:43:47<33:42:34, 32.88s/it] 79%|███████▊ | 13595/17285 [121:44:21<34:02:55, 33.22s/it] 79%|███████▊ | 13596/17285 [121:44:54<34:07:53, 33.31s/it] 79%|███████▊ | 13597/17285 [121:45:38<37:22:32, 36.48s/it] 79%|███████▊ | 13598/17285 [121:46:15<37:29:41, 36.61s/it] 79%|███████▊ | 13599/17285 [121:46:41<34:07:56, 33.34s/it] 79%|███████▊ | 13600/17285 [121:47:17<35:03:06, 34.24s/it] {'loss': 1.267, 'learning_rate': 2.5758107417367915e-05, 'epoch': 2.36} + 79%|███████▊ | 13600/17285 [121:47:17<35:03:06, 34.24s/it] 79%|███████▊ | 13601/17285 [121:47:49<34:27:01, 33.66s/it] 79%|███████▊ | 13602/17285 [121:48:27<35:36:42, 34.81s/it] 79%|███████▊ | 13603/17285 [121:49:02<35:42:23, 34.91s/it] 79%|███████▊ | 13604/17285 [121:49:36<35:28:09, 34.69s/it] 79%|███████▊ | 13605/17285 [121:50:12<35:50:40, 35.07s/it] 79%|████��██▊ | 13606/17285 [121:50:47<35:54:20, 35.13s/it] 79%|███████▊ | 13607/17285 [121:51:17<34:14:12, 33.51s/it] 79%|███████▊ | 13608/17285 [121:51:43<31:57:42, 31.29s/it] 79%|███████▊ | 13609/17285 [121:52:11<30:46:39, 30.14s/it] 79%|███████▊ | 13610/17285 [121:52:42<31:14:58, 30.61s/it] {'loss': 1.2716, 'learning_rate': 2.5630066415629195e-05, 'epoch': 2.36} + 79%|███████▊ | 13610/17285 [121:52:42<31:14:58, 30.61s/it] 79%|███████▊ | 13611/17285 [121:53:15<31:43:48, 31.09s/it] 79%|███████▉ | 13612/17285 [121:53:59<35:42:46, 35.00s/it] 79%|███████▉ | 13613/17285 [121:54:27<33:45:28, 33.10s/it] 79%|███████▉ | 13614/17285 [121:54:59<33:22:31, 32.73s/it] 79%|███████▉ | 13615/17285 [121:55:32<33:18:09, 32.67s/it] 79%|███████▉ | 13616/17285 [121:56:06<33:40:39, 33.04s/it] 79%|███████▉ | 13617/17285 [121:56:34<32:10:33, 31.58s/it] 79%|███████▉ | 13618/17285 [121:57:14<34:41:03, 34.05s/it] 79%|███████▉ | 13619/17285 [121:57:40<32:14:56, 31.67s/it] 79%|███████▉ | 13620/17285 [121:58:10<31:39:02, 31.09s/it] {'loss': 1.2824, 'learning_rate': 2.550229765311628e-05, 'epoch': 2.36} + 79%|███████▉ | 13620/17285 [121:58:10<31:39:02, 31.09s/it] 79%|███████▉ | 13621/17285 [121:58:38<30:55:46, 30.39s/it] 79%|███████▉ | 13622/17285 [121:59:23<35:25:06, 34.81s/it] 79%|███████▉ | 13623/17285 [121:59:49<32:40:01, 32.11s/it] 79%|███████▉ | 13624/17285 [122:00:30<35:18:37, 34.72s/it][2023-08-28 01:55:36,359] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 79%|███████▉ | 13625/17285 [122:00:59<33:27:08, 32.90s/it] 79%|███████▉ | 13626/17285 [122:01:44<37:17:16, 36.69s/it] 79%|███████▉ | 13627/17285 [122:02:16<35:42:03, 35.13s/it] 79%|███████▉ | 13628/17285 [122:02:46<34:22:02, 33.83s/it] 79%|███████▉ | 13629/17285 [122:03:18<33:30:22, 32.99s/it][2023-08-28 01:58:20,025] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 79%|███████▉ | 13630/17285 [122:03:42<31:00:19, 30.54s/it] {'loss': 1.3018, 'learning_rate': 2.5400278969684065e-05, 'epoch': 2.37} + 79%|███████▉ | 13630/17285 [122:03:42<31:00:19, 30.54s/it] 79%|███████▉ | 13631/17285 [122:04:20<33:07:08, 32.63s/it] 79%|███████▉ | 13632/17285 [122:04:47<31:23:29, 30.94s/it] 79%|███████▉ | 13633/17285 [122:05:16<30:51:03, 30.41s/it] 79%|███████▉ | 13634/17285 [122:05:47<31:08:01, 30.70s/it] 79%|███████▉ | 13635/17285 [122:06:12<29:16:24, 28.87s/it] 79%|███████▉ | 13636/17285 [122:06:43<29:59:52, 29.60s/it] 79%|███████▉ | 13637/17285 [122:07:12<29:37:33, 29.24s/it] 79%|███████▉ | 13638/17285 [122:07:48<31:38:19, 31.23s/it] 79%|███████▉ | 13639/17285 [122:08:13<29:51:58, 29.49s/it] 79%|███████▉ | 13640/17285 [122:08:40<29:05:51, 28.74s/it] {'loss': 1.2786, 'learning_rate': 2.5273001415739562e-05, 'epoch': 2.37} + 79%|███████▉ | 13640/17285 [122:08:40<29:05:51, 28.74s/it] 79%|███████▉ | 13641/17285 [122:09:10<29:28:39, 29.12s/it] 79%|███████▉ | 13642/17285 [122:09:35<28:11:10, 27.85s/it] 79%|███████▉ | 13643/17285 [122:10:04<28:28:56, 28.15s/it] 79%|███████▉ | 13644/17285 [122:10:38<30:11:04, 29.84s/it] 79%|███████▉ | 13645/17285 [122:11:08<30:16:23, 29.94s/it] 79%|███████▉ | 13646/17285 [122:11:35<29:27:40, 29.15s/it] 79%|███████▉ | 13647/17285 [122:12:09<30:51:03, 30.53s/it] 79%|███████▉ | 13648/17285 [122:12:52<34:44:27, 34.39s/it] 79%|███████▉ | 13649/17285 [122:13:21<33:12:23, 32.88s/it] 79%|███████▉ | 13650/17285 [122:13:57<33:52:52, 33.56s/it] {'loss': 1.2634, 'learning_rate': 2.5145997408096057e-05, 'epoch': 2.37} + 79%|███████▉ | 13650/17285 [122:13:57<33:52:52, 33.56s/it][2023-08-28 02:09:06,890] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 79%|███████▉ | 13651/17285 [122:14:29<33:34:33, 33.26s/it] 79%|███████▉ | 13652/17285 [122:15:12<36:35:19, 36.26s/it] 79%|███████▉ | 13653/17285 [122:15:44<35:05:12, 34.78s/it] 79%|███████▉ | 13654/17285 [122:16:18<34:58:24, 34.67s/it] 79%|███████▉ | 13655/17285 [122:16:50<33:57:27, 33.68s/it] 79%|███████▉ | 13656/17285 [122:17:22<33:39:00, 33.38s/it] 79%|███████▉ | 13657/17285 [122:17:49<31:33:48, 31.32s/it] 79%|███████▉ | 13658/17285 [122:18:25<33:10:00, 32.92s/it] 79%|███████▉ | 13659/17285 [122:18:52<31:14:44, 31.02s/it] 79%|███████▉ | 13660/17285 [122:19:30<33:28:56, 33.25s/it] {'loss': 1.3144, 'learning_rate': 2.503192806757474e-05, 'epoch': 2.37} + 79%|███████▉ | 13660/17285 [122:19:30<33:28:56, 33.25s/it] 79%|███████▉ | 13661/17285 [122:20:00<32:17:31, 32.08s/it] 79%|███████▉ | 13662/17285 [122:20:35<33:21:27, 33.15s/it] 79%|███████▉ | 13663/17285 [122:21:06<32:41:06, 32.49s/it] 79%|███████▉ | 13664/17285 [122:21:37<31:58:38, 31.79s/it] 79%|███████▉ | 13665/17285 [122:22:09<32:07:53, 31.95s/it] 79%|███████▉ | 13666/17285 [122:22:42<32:36:55, 32.44s/it] 79%|███████▉ | 13667/17285 [122:23:16<32:51:00, 32.69s/it] 79%|███████▉ | 13668/17285 [122:23:45<31:50:02, 31.68s/it] 79%|███████▉ | 13669/17285 [122:24:14<31:06:06, 30.96s/it] 79%|███████▉ | 13670/17285 [122:24:47<31:27:21, 31.33s/it] {'loss': 1.2597, 'learning_rate': 2.4905445077906675e-05, 'epoch': 2.37} + 79%|███████▉ | 13670/17285 [122:24:47<31:27:21, 31.33s/it] 79%|███████▉ | 13671/17285 [122:25:22<32:38:47, 32.52s/it] 79%|███████▉ | 13672/17285 [122:25:55<32:43:19, 32.60s/it] 79%|███████▉ | 13673/17285 [122:26:35<35:04:26, 34.96s/it] 79%|███████▉ | 13674/17285 [122:27:10<35:05:28, 34.98s/it] 79%|███████▉ | 13675/17285 [122:27:37<32:32:25, 32.45s/it] 79%|███████▉ | 13676/17285 [122:28:19<35:26:19, 35.35s/it] 79%|███████▉ | 13677/17285 [122:28:48<33:27:51, 33.39s/it] 79%|███████▉ | 13678/17285 [122:29:23<34:09:57, 34.10s/it] 79%|███████▉ | 13679/17285 [122:29:58<34:17:38, 34.24s/it] 79%|███████▉ | 13680/17285 [122:30:32<34:11:09, 34.14s/it] {'loss': 1.2646, 'learning_rate': 2.477923698001955e-05, 'epoch': 2.37} + 79%|███████▉ | 13680/17285 [122:30:32<34:11:09, 34.14s/it] 79%|███████▉ | 13681/17285 [122:31:12<35:53:47, 35.86s/it] 79%|███████▉ | 13682/17285 [122:31:42<34:12:45, 34.18s/it] 79%|███████▉ | 13683/17285 [122:32:07<31:24:35, 31.39s/it] 79%|███████▉ | 13684/17285 [122:32:40<31:53:10, 31.88s/it] 79%|███████▉ | 13685/17285 [122:33:06<30:06:26, 30.11s/it] 79%|███████▉ | 13686/17285 [122:33:33<29:16:57, 29.29s/it] 79%|███████▉ | 13687/17285 [122:34:06<30:11:05, 30.20s/it] 79%|███████▉ | 13688/17285 [122:34:35<30:00:58, 30.04s/it] 79%|███████▉ | 13689/17285 [122:35:06<30:20:31, 30.38s/it] 79%|███████▉ | 13690/17285 [122:35:41<31:27:34, 31.50s/it] {'loss': 1.2471, 'learning_rate': 2.4653304235911823e-05, 'epoch': 2.38} + 79%|███████▉ | 13690/17285 [122:35:41<31:27:34, 31.50s/it] 79%|███████▉ | 13691/17285 [122:36:14<31:57:10, 32.01s/it] 79%|███████▉ | 13692/17285 [122:36:45<31:43:49, 31.79s/it] 79%|███████▉ | 13693/17285 [122:37:26<34:22:43, 34.46s/it] 79%|███████▉ | 13694/17285 [122:37:58<33:35:23, 33.67s/it] 79%|███████▉ | 13695/17285 [122:38:24<31:28:26, 31.56s/it] 79%|███████▉ | 13696/17285 [122:38:58<32:11:38, 32.29s/it] 79%|███████▉ | 13697/17285 [122:39:25<30:28:12, 30.57s/it] 79%|███████▉ | 13698/17285 [122:39:57<30:54:02, 31.01s/it] 79%|███████▉ | 13699/17285 [122:40:25<30:10:04, 30.29s/it] 79%|███████▉ | 13700/17285 [122:40:57<30:29:12, 30.61s/it] {'loss': 1.2835, 'learning_rate': 2.4527647306573998e-05, 'epoch': 2.38} + 79%|███████▉ | 13700/17285 [122:40:57<30:29:12, 30.61s/it] 79%|███████▉ | 13701/17285 [122:41:31<31:43:28, 31.87s/it] 79%|███████▉ | 13702/17285 [122:42:05<32:18:33, 32.46s/it] 79%|███████▉ | 13703/17285 [122:42:42<33:27:33, 33.63s/it] 79%|███████▉ | 13704/17285 [122:43:13<32:38:47, 32.82s/it] 79%|███████▉ | 13705/17285 [122:43:41<31:24:40, 31.59s/it] 79%|███████▉ | 13706/17285 [122:44:14<31:36:49, 31.80s/it] 79%|███████▉ | 13707/17285 [122:44:53<33:59:41, 34.20s/it] 79%|███████▉ | 13708/17285 [122:45:26<33:35:00, 33.80s/it] 79%|███████▉ | 13709/17285 [122:46:00<33:38:04, 33.86s/it] 79%|███████▉ | 13710/17285 [122:46:37<34:25:17, 34.66s/it] {'loss': 1.2674, 'learning_rate': 2.4402266651986927e-05, 'epoch': 2.38} + 79%|███████▉ | 13710/17285 [122:46:37<34:25:17, 34.66s/it] 79%|███████▉ | 13711/17285 [122:47:15<35:34:15, 35.83s/it] 79%|███████▉ | 13712/17285 [122:47:46<33:53:45, 34.15s/it] 79%|███████▉ | 13713/17285 [122:48:16<32:38:50, 32.90s/it] 79%|███████▉ | 13714/17285 [122:48:45<31:41:54, 31.96s/it] 79%|███████▉ | 13715/17285 [122:49:29<35:08:10, 35.43s/it] 79%|███████▉ | 13716/17285 [122:49:58<33:22:02, 33.66s/it] 79%|███████▉ | 13717/17285 [122:50:33<33:37:45, 33.93s/it] 79%|███████▉ | 13718/17285 [122:51:01<31:43:27, 32.02s/it] 79%|███████▉ | 13719/17285 [122:51:27<29:55:45, 30.21s/it] 79%|███████▉ | 13720/17285 [122:52:06<32:44:27, 33.06s/it] {'loss': 1.2984, 'learning_rate': 2.4277162731120108e-05, 'epoch': 2.38} + 79%|███████▉ | 13720/17285 [122:52:06<32:44:27, 33.06s/it] 79%|███████▉ | 13721/17285 [122:52:36<31:52:09, 32.19s/it] 79%|███████▉ | 13722/17285 [122:53:06<31:07:28, 31.45s/it] 79%|███████▉ | 13723/17285 [122:53:45<33:20:46, 33.70s/it] 79%|███████▉ | 13724/17285 [122:54:16<32:35:08, 32.94s/it] 79%|███████▉ | 13725/17285 [122:54:46<31:37:59, 31.99s/it] 79%|███████▉ | 13726/17285 [122:55:25<33:44:53, 34.14s/it] 79%|███████▉ | 13727/17285 [122:56:06<35:35:48, 36.02s/it] 79%|███████▉ | 13728/17285 [122:56:38<34:29:53, 34.92s/it] 79%|███████▉ | 13729/17285 [122:57:11<33:51:04, 34.27s/it] 79%|███████▉ | 13730/17285 [122:57:42<33:03:32, 33.48s/it] {'loss': 1.2879, 'learning_rate': 2.4152336001930054e-05, 'epoch': 2.38} + 79%|███████▉ | 13730/17285 [122:57:42<33:03:32, 33.48s/it] 79%|███████▉ | 13731/17285 [122:58:14<32:31:43, 32.95s/it] 79%|███████▉ | 13732/17285 [122:58:56<35:08:45, 35.61s/it] 79%|███████▉ | 13733/17285 [122:59:24<33:03:00, 33.50s/it] 79%|███████▉ | 13734/17285 [122:59:55<32:19:17, 32.77s/it] 79%|███████▉ | 13735/17285 [123:00:31<33:12:41, 33.68s/it] 79%|███████▉ | 13736/17285 [123:01:07<33:52:52, 34.37s/it] 79%|███████▉ | 13737/17285 [123:01:45<34:44:52, 35.26s/it] 79%|███████▉ | 13738/17285 [123:02:22<35:24:08, 35.93s/it] 79%|███████▉ | 13739/17285 [123:02:56<34:52:49, 35.41s/it] 79%|███████▉ | 13740/17285 [123:03:22<32:03:04, 32.55s/it] {'loss': 1.2361, 'learning_rate': 2.4027786921358607e-05, 'epoch': 2.38} + 79%|███████▉ | 13740/17285 [123:03:22<32:03:04, 32.55s/it] 79%|███████▉ | 13741/17285 [123:03:59<33:25:21, 33.95s/it] 80%|███████▉ | 13742/17285 [123:04:30<32:18:45, 32.83s/it] 80%|███████▉ | 13743/17285 [123:05:06<33:25:24, 33.97s/it] 80%|███████▉ | 13744/17285 [123:05:44<34:27:14, 35.03s/it] 80%|███████▉ | 13745/17285 [123:06:18<34:16:41, 34.86s/it] 80%|███████▉ | 13746/17285 [123:06:50<33:16:15, 33.84s/it] 80%|███████▉ | 13747/17285 [123:07:18<31:32:03, 32.09s/it] 80%|███████▉ | 13748/17285 [123:07:45<30:04:11, 30.61s/it] 80%|███████▉ | 13749/17285 [123:08:20<31:22:01, 31.93s/it] 80%|███████▉ | 13750/17285 [123:09:00<33:48:16, 34.43s/it] {'loss': 1.3072, 'learning_rate': 2.3903515945331155e-05, 'epoch': 2.39} + 80%|███████▉ | 13750/17285 [123:09:00<33:48:16, 34.43s/it] 80%|███████▉ | 13751/17285 [123:09:27<31:42:07, 32.29s/it] 80%|███████▉ | 13752/17285 [123:09:56<30:39:18, 31.24s/it] 80%|███████▉ | 13753/17285 [123:10:28<30:45:03, 31.34s/it] 80%|███████▉ | 13754/17285 [123:10:59<30:34:10, 31.17s/it] 80%|███████▉ | 13755/17285 [123:11:34<31:47:06, 32.42s/it] 80%|███████▉ | 13756/17285 [123:12:11<33:01:59, 33.70s/it] 80%|███████▉ | 13757/17285 [123:12:43<32:33:43, 33.23s/it] 80%|███████▉ | 13758/17285 [123:13:09<30:24:34, 31.04s/it] 80%|███████▉ | 13759/17285 [123:13:38<29:54:04, 30.53s/it] 80%|███████▉ | 13760/17285 [123:14:03<28:16:11, 28.87s/it] {'loss': 1.2665, 'learning_rate': 2.3779523528755145e-05, 'epoch': 2.39} + 80%|███████▉ | 13760/17285 [123:14:03<28:16:11, 28.87s/it] 80%|███████▉ | 13761/17285 [123:14:35<29:18:40, 29.94s/it] 80%|███████▉ | 13762/17285 [123:15:01<27:58:39, 28.59s/it] 80%|███████▉ | 13763/17285 [123:15:32<28:35:12, 29.22s/it] 80%|███████▉ | 13764/17285 [123:15:57<27:37:21, 28.24s/it] 80%|███████▉ | 13765/17285 [123:16:26<27:39:56, 28.29s/it] 80%|███████▉ | 13766/17285 [123:17:06<31:03:40, 31.78s/it] 80%|███████▉ | 13767/17285 [123:17:30<28:56:57, 29.62s/it] 80%|███████▉ | 13768/17285 [123:18:06<30:34:19, 31.29s/it] 80%|███████▉ | 13769/17285 [123:18:37<30:41:09, 31.42s/it] 80%|███████▉ | 13770/17285 [123:19:08<30:33:00, 31.29s/it] {'loss': 1.3312, 'learning_rate': 2.3655810125518284e-05, 'epoch': 2.39} + 80%|███████▉ | 13770/17285 [123:19:08<30:33:00, 31.29s/it] 80%|███████▉ | 13771/17285 [123:19:38<29:56:33, 30.68s/it] 80%|███████▉ | 13772/17285 [123:20:12<31:08:40, 31.92s/it] 80%|███████▉ | 13773/17285 [123:20:39<29:44:37, 30.49s/it] 80%|███████▉ | 13774/17285 [123:21:12<30:17:06, 31.05s/it] 80%|███████▉ | 13775/17285 [123:21:41<29:45:47, 30.53s/it] 80%|███████▉ | 13776/17285 [123:22:10<29:12:55, 29.97s/it] 80%|███████▉ | 13777/17285 [123:22:40<29:15:03, 30.02s/it] 80%|███████▉ | 13778/17285 [123:23:15<30:49:16, 31.64s/it] 80%|███████▉ | 13779/17285 [123:23:55<33:04:36, 33.96s/it] 80%|███████▉ | 13780/17285 [123:24:25<32:05:38, 32.96s/it] {'loss': 1.3107, 'learning_rate': 2.3532376188486948e-05, 'epoch': 2.39} + 80%|███████▉ | 13780/17285 [123:24:25<32:05:38, 32.96s/it] 80%|███████▉ | 13781/17285 [123:24:54<30:55:16, 31.77s/it] 80%|███████▉ | 13782/17285 [123:25:25<30:30:07, 31.35s/it] 80%|███████▉ | 13783/17285 [123:25:56<30:28:33, 31.33s/it] 80%|███████▉ | 13784/17285 [123:26:35<32:44:45, 33.67s/it] 80%|███████▉ | 13785/17285 [123:27:05<31:33:07, 32.45s/it] 80%|███████▉ | 13786/17285 [123:27:48<34:34:37, 35.58s/it] 80%|███████▉ | 13787/17285 [123:28:18<33:06:52, 34.08s/it] 80%|███████▉ | 13788/17285 [123:28:50<32:33:03, 33.51s/it] 80%|███████▉ | 13789/17285 [123:29:29<33:55:02, 34.93s/it] 80%|███████▉ | 13790/17285 [123:30:02<33:32:40, 34.55s/it] {'loss': 1.2497, 'learning_rate': 2.340922216950443e-05, 'epoch': 2.39} + 80%|███████▉ | 13790/17285 [123:30:02<33:32:40, 34.55s/it] 80%|███████▉ | 13791/17285 [123:30:33<32:32:33, 33.53s/it] 80%|███████▉ | 13792/17285 [123:30:59<30:16:21, 31.20s/it] 80%|███████▉ | 13793/17285 [123:31:26<29:05:25, 29.99s/it] 80%|███████▉ | 13794/17285 [123:32:00<30:00:56, 30.95s/it] 80%|███████▉ | 13795/17285 [123:32:30<29:46:54, 30.72s/it] 80%|███████▉ | 13796/17285 [123:33:00<29:44:10, 30.68s/it] 80%|███████▉ | 13797/17285 [123:33:32<29:54:56, 30.88s/it] 80%|███████▉ | 13798/17285 [123:34:07<31:08:12, 32.15s/it] 80%|███████▉ | 13799/17285 [123:34:39<31:17:00, 32.31s/it] 80%|███████▉ | 13800/17285 [123:35:07<29:48:55, 30.80s/it] {'loss': 1.3204, 'learning_rate': 2.328634851938949e-05, 'epoch': 2.4} + 80%|███████▉ | 13800/17285 [123:35:07<29:48:55, 30.80s/it] 80%|███████▉ | 13801/17285 [123:35:41<30:46:12, 31.79s/it] 80%|███████▉ | 13802/17285 [123:36:15<31:24:11, 32.46s/it] 80%|███████▉ | 13803/17285 [123:36:49<31:49:09, 32.90s/it] 80%|███████▉ | 13804/17285 [123:37:18<30:44:47, 31.80s/it] 80%|███████▉ | 13805/17285 [123:37:50<30:51:04, 31.91s/it] 80%|███████▉ | 13806/17285 [123:38:33<33:56:32, 35.12s/it] 80%|███████▉ | 13807/17285 [123:39:05<32:57:32, 34.12s/it] 80%|███████▉ | 13808/17285 [123:39:34<31:43:00, 32.84s/it] 80%|███████▉ | 13809/17285 [123:40:05<30:57:59, 32.07s/it] 80%|███████▉ | 13810/17285 [123:40:36<30:48:11, 31.91s/it] {'loss': 1.2692, 'learning_rate': 2.316375568793443e-05, 'epoch': 2.4} + 80%|███████▉ | 13810/17285 [123:40:36<30:48:11, 31.91s/it] 80%|███████▉ | 13811/17285 [123:41:10<31:18:51, 32.45s/it] 80%|███████▉ | 13812/17285 [123:41:37<29:41:41, 30.78s/it] 80%|███████▉ | 13813/17285 [123:42:11<30:38:49, 31.78s/it] 80%|███████▉ | 13814/17285 [123:42:45<31:18:17, 32.47s/it] 80%|███████▉ | 13815/17285 [123:43:17<31:04:35, 32.24s/it] 80%|███████▉ | 13816/17285 [123:43:51<31:42:50, 32.91s/it] 80%|███████▉ | 13817/17285 [123:44:35<34:47:49, 36.12s/it] 80%|███████▉ | 13818/17285 [123:45:10<34:36:31, 35.94s/it] 80%|███████▉ | 13819/17285 [123:45:42<33:18:07, 34.59s/it] 80%|███████▉ | 13820/17285 [123:46:19<33:56:17, 35.26s/it] {'loss': 1.26, 'learning_rate': 2.3041444123903668e-05, 'epoch': 2.4} + 80%|███████▉ | 13820/17285 [123:46:19<33:56:17, 35.26s/it] 80%|███████▉ | 13821/17285 [123:46:56<34:38:14, 36.00s/it] 80%|███████▉ | 13822/17285 [123:47:33<34:44:40, 36.12s/it] 80%|███████▉ | 13823/17285 [123:48:01<32:19:06, 33.61s/it] 80%|███████▉ | 13824/17285 [123:48:28<30:32:15, 31.76s/it] 80%|███████▉ | 13825/17285 [123:49:01<30:49:23, 32.07s/it] 80%|███████▉ | 13826/17285 [123:49:37<32:08:27, 33.45s/it] 80%|███████▉ | 13827/17285 [123:50:04<30:11:33, 31.43s/it] 80%|████████ | 13828/17285 [123:50:41<31:51:13, 33.17s/it] 80%|████████ | 13829/17285 [123:51:16<32:21:00, 33.70s/it] 80%|████████ | 13830/17285 [123:51:47<31:29:30, 32.81s/it] {'loss': 1.2608, 'learning_rate': 2.2919414275031914e-05, 'epoch': 2.4} + 80%|████████ | 13830/17285 [123:51:47<31:29:30, 32.81s/it] 80%|████████ | 13831/17285 [123:52:16<30:23:42, 31.68s/it] 80%|████████ | 13832/17285 [123:52:47<30:18:00, 31.59s/it] 80%|████████ | 13833/17285 [123:53:14<28:53:43, 30.13s/it] 80%|████████ | 13834/17285 [123:53:41<27:52:47, 29.08s/it] 80%|████████ | 13835/17285 [123:54:07<26:59:58, 28.17s/it] 80%|████████ | 13836/17285 [123:54:39<28:12:04, 29.44s/it] 80%|████████ | 13837/17285 [123:55:08<27:57:30, 29.19s/it] 80%|████████ | 13838/17285 [123:55:41<28:57:40, 30.25s/it] 80%|████████ | 13839/17285 [123:56:08<28:08:58, 29.41s/it] 80%|████████ | 13840/17285 [123:56:39<28:42:21, 30.00s/it] {'loss': 1.2862, 'learning_rate': 2.2797666588022748e-05, 'epoch': 2.4} + 80%|████████ | 13840/17285 [123:56:39<28:42:21, 30.00s/it] 80%|████████ | 13841/17285 [123:57:09<28:40:51, 29.98s/it] 80%|████████ | 13842/17285 [123:57:39<28:40:52, 29.99s/it] 80%|████████ | 13843/17285 [123:58:12<29:18:43, 30.66s/it] 80%|████████ | 13844/17285 [123:58:41<29:03:48, 30.41s/it] 80%|████████ | 13845/17285 [123:59:13<29:27:09, 30.82s/it] 80%|████████ | 13846/17285 [123:59:46<30:00:09, 31.41s/it] 80%|████████ | 13847/17285 [124:00:16<29:35:49, 30.99s/it] 80%|████████ | 13848/17285 [124:00:47<29:33:50, 30.97s/it] 80%|████████ | 13849/17285 [124:01:15<28:36:14, 29.97s/it] 80%|████████ | 13850/17285 [124:01:47<29:14:08, 30.64s/it] {'loss': 1.2762, 'learning_rate': 2.2676201508546792e-05, 'epoch': 2.4} + 80%|████████ | 13850/17285 [124:01:47<29:14:08, 30.64s/it] 80%|████████ | 13851/17285 [124:02:13<28:05:59, 29.46s/it][2023-08-28 03:57:27,122] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 80%|████████ | 13852/17285 [124:02:49<29:57:33, 31.42s/it] 80%|████████ | 13853/17285 [124:03:20<29:46:28, 31.23s/it] 80%|████████ | 13854/17285 [124:03:47<28:25:49, 29.83s/it] 80%|████████ | 13855/17285 [124:04:14<27:48:08, 29.18s/it] 80%|████████ | 13856/17285 [124:04:49<29:25:32, 30.89s/it] 80%|████████ | 13857/17285 [124:05:20<29:26:09, 30.91s/it] 80%|████████ | 13858/17285 [124:05:52<29:42:00, 31.20s/it] 80%|████████ | 13859/17285 [124:06:22<29:18:16, 30.79s/it] 80%|████████ | 13860/17285 [124:07:00<31:12:14, 32.80s/it] {'loss': 1.3093, 'learning_rate': 2.2567124933972495e-05, 'epoch': 2.41} + 80%|████████ | 13860/17285 [124:07:00<31:12:14, 32.80s/it] 80%|████████ | 13861/17285 [124:07:28<30:05:53, 31.65s/it] 80%|████████ | 13862/17285 [124:07:56<28:56:40, 30.44s/it][2023-08-28 04:03:06,467] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 80%|████████ | 13863/17285 [124:08:29<29:34:42, 31.12s/it] 80%|████████ | 13864/17285 [124:09:04<30:49:16, 32.43s/it] 80%|████████ | 13865/17285 [124:09:41<31:55:00, 33.60s/it] 80%|████████ | 13866/17285 [124:10:14<31:49:57, 33.52s/it] 80%|████████ | 13867/17285 [124:10:46<31:16:11, 32.93s/it][2023-08-28 04:05:51,968] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 80%|████████ | 13868/17285 [124:11:14<30:04:35, 31.69s/it] 80%|████████ | 13869/17285 [124:11:40<28:27:36, 29.99s/it] 80%|████████ | 13870/17285 [124:12:10<28:22:27, 29.91s/it] {'loss': 1.2785, 'learning_rate': 2.2470360715755768e-05, 'epoch': 2.41} + 80%|████████ | 13870/17285 [124:12:10<28:22:27, 29.91s/it] 80%|████████ | 13871/17285 [124:12:42<28:52:26, 30.45s/it] 80%|████████ | 13872/17285 [124:13:24<32:13:22, 33.99s/it] 80%|████████ | 13873/17285 [124:13:53<30:52:27, 32.58s/it] 80%|████████ | 13874/17285 [124:14:25<30:45:10, 32.46s/it] 80%|████████ | 13875/17285 [124:15:08<33:34:11, 35.44s/it] 80%|████████ | 13876/17285 [124:15:38<32:07:28, 33.92s/it] 80%|████████ | 13877/17285 [124:16:09<31:04:23, 32.82s/it] 80%|████████ | 13878/17285 [124:16:38<30:01:14, 31.72s/it] 80%|████████ | 13879/17285 [124:17:07<29:21:11, 31.03s/it] 80%|████████ | 13880/17285 [124:17:39<29:42:58, 31.42s/it] {'loss': 1.2841, 'learning_rate': 2.2349660894643332e-05, 'epoch': 2.41} + 80%|████████ | 13880/17285 [124:17:39<29:42:58, 31.42s/it] 80%|████████ | 13881/17285 [124:18:12<29:57:11, 31.68s/it] 80%|████████ | 13882/17285 [124:18:46<30:33:55, 32.33s/it] 80%|████████ | 13883/17285 [124:19:21<31:25:35, 33.26s/it] 80%|████████ | 13884/17285 [124:19:46<29:10:37, 30.88s/it] 80%|████████ | 13885/17285 [124:20:21<30:07:30, 31.90s/it] 80%|████████ | 13886/17285 [124:20:49<29:00:43, 30.73s/it] 80%|████████ | 13887/17285 [124:21:21<29:37:24, 31.38s/it] 80%|████████ | 13888/17285 [124:21:51<28:57:40, 30.69s/it] 80%|████████ | 13889/17285 [124:22:27<30:29:03, 32.32s/it] 80%|████████ | 13890/17285 [124:22:57<29:54:31, 31.71s/it] {'loss': 1.2676, 'learning_rate': 2.222924532103765e-05, 'epoch': 2.41} + 80%|████████ | 13890/17285 [124:22:57<29:54:31, 31.71s/it] 80%|████████ | 13891/17285 [124:23:31<30:38:41, 32.50s/it] 80%|████████ | 13892/17285 [124:24:00<29:31:42, 31.33s/it] 80%|████████ | 13893/17285 [124:24:35<30:42:06, 32.58s/it] 80%|████████ | 13894/17285 [124:25:11<31:30:45, 33.45s/it] 80%|████████ | 13895/17285 [124:25:44<31:27:15, 33.40s/it] 80%|████████ | 13896/17285 [124:26:18<31:34:06, 33.53s/it] 80%|████████ | 13897/17285 [124:26:53<32:01:17, 34.03s/it] 80%|████████ | 13898/17285 [124:27:31<33:00:17, 35.08s/it] 80%|████████ | 13899/17285 [124:27:58<30:55:28, 32.88s/it] 80%|████████ | 13900/17285 [124:28:36<32:13:33, 34.27s/it] {'loss': 1.2522, 'learning_rate': 2.2109114435733026e-05, 'epoch': 2.41} + 80%|████████ | 13900/17285 [124:28:36<32:13:33, 34.27s/it] 80%|████████ | 13901/17285 [124:29:07<31:21:52, 33.37s/it] 80%|████████ | 13902/17285 [124:29:37<30:22:53, 32.33s/it] 80%|████████ | 13903/17285 [124:30:07<29:45:43, 31.68s/it] 80%|████████ | 13904/17285 [124:30:38<29:22:01, 31.27s/it] 80%|████████ | 13905/17285 [124:31:08<29:02:19, 30.93s/it] 80%|████████ | 13906/17285 [124:31:49<31:47:45, 33.88s/it] 80%|████████ | 13907/17285 [124:32:21<31:19:37, 33.39s/it] 80%|████████ | 13908/17285 [124:33:00<32:50:10, 35.00s/it] 80%|████��███ | 13909/17285 [124:33:32<32:10:58, 34.32s/it] 80%|████████ | 13910/17285 [124:34:04<31:29:10, 33.59s/it] {'loss': 1.2573, 'learning_rate': 2.19892686784816e-05, 'epoch': 2.41} + 80%|████████ | 13910/17285 [124:34:04<31:29:10, 33.59s/it] 80%|████████ | 13911/17285 [124:34:34<30:25:44, 32.47s/it] 80%|████████ | 13912/17285 [124:35:01<28:55:40, 30.87s/it] 80%|████████ | 13913/17285 [124:35:34<29:24:25, 31.40s/it] 80%|████████ | 13914/17285 [124:36:02<28:25:34, 30.36s/it] 81%|████████ | 13915/17285 [124:36:38<30:05:29, 32.15s/it] 81%|████████ | 13916/17285 [124:37:15<31:32:31, 33.70s/it] 81%|████████ | 13917/17285 [124:37:51<32:06:04, 34.31s/it] 81%|████████ | 13918/17285 [124:38:32<33:53:37, 36.24s/it] 81%|████████ | 13919/17285 [124:39:07<33:35:50, 35.93s/it] 81%|████████ | 13920/17285 [124:39:45<34:04:12, 36.45s/it] {'loss': 1.2572, 'learning_rate': 2.1869708487991812e-05, 'epoch': 2.42} + 81%|████████ | 13920/17285 [124:39:45<34:04:12, 36.45s/it] 81%|████████ | 13921/17285 [124:40:14<32:00:39, 34.26s/it] 81%|████████ | 13922/17285 [124:40:47<31:49:15, 34.06s/it] 81%|████████ | 13923/17285 [124:41:13<29:33:52, 31.66s/it] 81%|████████ | 13924/17285 [124:41:58<33:04:36, 35.43s/it] 81%|████████ | 13925/17285 [124:42:36<33:52:02, 36.29s/it] 81%|████████ | 13926/17285 [124:43:12<33:50:29, 36.27s/it] 81%|████████ | 13927/17285 [124:43:54<35:21:20, 37.90s/it] 81%|████████ | 13928/17285 [124:44:20<31:58:22, 34.29s/it] 81%|████████ | 13929/17285 [124:44:52<31:21:15, 33.63s/it] 81%|████████ | 13930/17285 [124:45:28<31:55:15, 34.25s/it] {'loss': 1.23, 'learning_rate': 2.1750434301926704e-05, 'epoch': 2.42} + 81%|████████ | 13930/17285 [124:45:28<31:55:15, 34.25s/it] 81%|████████ | 13931/17285 [124:46:02<31:57:17, 34.30s/it] 81%|████████ | 13932/17285 [124:46:30<30:11:42, 32.42s/it] 81%|████████ | 13933/17285 [124:47:05<30:49:35, 33.11s/it] 81%|████████ | 13934/17285 [124:47:39<31:16:02, 33.59s/it] 81%|████████ | 13935/17285 [124:48:17<32:17:11, 34.70s/it] 81%|████████ | 13936/17285 [124:48:46<30:48:59, 33.13s/it] 81%|████████ | 13937/17285 [124:49:20<31:06:39, 33.45s/it] 81%|████████ | 13938/17285 [124:49:51<30:21:25, 32.65s/it] 81%|████████ | 13939/17285 [124:50:27<31:08:41, 33.51s/it] 81%|████████ | 13940/17285 [124:51:04<32:03:31, 34.50s/it] {'loss': 1.2547, 'learning_rate': 2.163144655690249e-05, 'epoch': 2.42} + 81%|████████ | 13940/17285 [124:51:04<32:03:31, 34.50s/it] 81%|████████ | 13941/17285 [124:51:32<30:19:23, 32.64s/it] 81%|████████ | 13942/17285 [124:52:00<28:59:41, 31.22s/it] 81%|████████ | 13943/17285 [124:52:33<29:34:29, 31.86s/it] 81%|████████ | 13944/17285 [124:53:05<29:37:48, 31.93s/it] 81%|████████ | 13945/17285 [124:53:43<31:21:45, 33.80s/it] 81%|████████ | 13946/17285 [124:54:14<30:27:31, 32.84s/it] 81%|████████ | 13947/17285 [124:54:44<29:36:02, 31.92s/it] 81%|████████ | 13948/17285 [124:55:18<30:07:29, 32.50s/it] 81%|████████ | 13949/17285 [124:55:54<31:06:08, 33.56s/it] 81%|████████ | 13950/17285 [124:56:23<30:02:03, 32.42s/it] {'loss': 1.29, 'learning_rate': 2.1512745688486646e-05, 'epoch': 2.42} + 81%|████████ | 13950/17285 [124:56:23<30:02:03, 32.42s/it] 81%|████████ | 13951/17285 [124:56:54<29:38:42, 32.01s/it] 81%|████████ | 13952/17285 [124:57:35<32:08:41, 34.72s/it] 81%|████████ | 13953/17285 [124:58:05<30:37:19, 33.09s/it] 81%|████████ | 13954/17285 [124:58:35<29:52:21, 32.29s/it] 81%|████████ | 13955/17285 [124:59:11<30:54:33, 33.42s/it] 81%|████████ | 13956/17285 [124:59:49<32:13:03, 34.84s/it] 81%|████████ | 13957/17285 [125:00:22<31:39:01, 34.24s/it] 81%|████████ | 13958/17285 [125:00:50<29:51:09, 32.30s/it] 81%|████████ | 13959/17285 [125:01:24<30:12:51, 32.70s/it] 81%|████████ | 13960/17285 [125:01:55<29:47:47, 32.26s/it] {'loss': 1.2863, 'learning_rate': 2.139433213119664e-05, 'epoch': 2.42} + 81%|████████ | 13960/17285 [125:01:55<29:47:47, 32.26s/it] 81%|████████ | 13961/17285 [125:02:22<28:21:39, 30.72s/it] 81%|████████ | 13962/17285 [125:02:49<27:14:28, 29.51s/it] 81%|████████ | 13963/17285 [125:03:30<30:26:36, 32.99s/it] 81%|████████ | 13964/17285 [125:04:05<31:10:39, 33.80s/it] 81%|████████ | 13965/17285 [125:04:35<30:00:22, 32.54s/it] 81%|████████ | 13966/17285 [125:05:13<31:25:56, 34.09s/it] 81%|████████ | 13967/17285 [125:05:48<31:43:55, 34.43s/it] 81%|████████ | 13968/17285 [125:06:17<30:10:34, 32.75s/it] 81%|████████ | 13969/17285 [125:06:46<29:11:54, 31.70s/it] 81%|████████ | 13970/17285 [125:07:22<30:17:40, 32.90s/it] {'loss': 1.2471, 'learning_rate': 2.127620631849816e-05, 'epoch': 2.42} + 81%|████████ | 13970/17285 [125:07:22<30:17:40, 32.90s/it] 81%|████████ | 13971/17285 [125:07:54<30:05:25, 32.69s/it] 81%|████████ | 13972/17285 [125:08:27<30:12:12, 32.82s/it] 81%|████████ | 13973/17285 [125:09:01<30:26:30, 33.09s/it] 81%|████████ | 13974/17285 [125:09:27<28:36:05, 31.10s/it] 81%|████████ | 13975/17285 [125:09:52<26:55:16, 29.28s/it] 81%|████████ | 13976/17285 [125:10:35<30:29:46, 33.18s/it] 81%|████████ | 13977/17285 [125:11:06<30:05:16, 32.74s/it] 81%|████████ | 13978/17285 [125:11:36<29:07:59, 31.71s/it] 81%|████████ | 13979/17285 [125:12:02<27:32:13, 29.99s/it] 81%|████████ | 13980/17285 [125:12:34<28:04:24, 30.58s/it] {'loss': 1.2756, 'learning_rate': 2.11583686828036e-05, 'epoch': 2.43} + 81%|████████ | 13980/17285 [125:12:34<28:04:24, 30.58s/it] 81%|████████ | 13981/17285 [125:13:07<28:52:41, 31.47s/it] 81%|████████ | 13982/17285 [125:13:33<27:14:48, 29.70s/it] 81%|████████ | 13983/17285 [125:14:03<27:31:36, 30.01s/it] 81%|████████ | 13984/17285 [125:14:47<31:19:36, 34.16s/it] 81%|████████ | 13985/17285 [125:15:20<31:00:18, 33.82s/it] 81%|████████ | 13986/17285 [125:15:49<29:41:06, 32.39s/it] 81%|████████ | 13987/17285 [125:16:18<28:46:05, 31.40s/it] 81%|████████ | 13988/17285 [125:16:54<29:47:41, 32.53s/it] 81%|████████ | 13989/17285 [125:17:20<27:59:13, 30.57s/it] 81%|████████ | 13990/17285 [125:17:55<29:19:09, 32.03s/it] {'loss': 1.272, 'learning_rate': 2.104081965547041e-05, 'epoch': 2.43} + 81%|████████ | 13990/17285 [125:17:55<29:19:09, 32.03s/it] 81%|████████ | 13991/17285 [125:18:22<27:55:56, 30.53s/it] 81%|████████ | 13992/17285 [125:19:06<31:29:24, 34.43s/it] 81%|████████ | 13993/17285 [125:19:38<31:03:34, 33.97s/it] 81%|████████ | 13994/17285 [125:20:09<30:06:56, 32.94s/it] 81%|████████ | 13995/17285 [125:20:39<29:19:58, 32.10s/it] 81%|████████ | 13996/17285 [125:21:11<29:11:23, 31.95s/it] 81%|████████ | 13997/17285 [125:21:43<29:21:41, 32.15s/it] 81%|████████ | 13998/17285 [125:22:14<29:03:48, 31.83s/it] 81%|████████ | 13999/17285 [125:22:47<29:18:13, 32.10s/it] 81%|████████ | 14000/17285 [125:23:17<28:39:07, 31.40s/it] {'loss': 1.2714, 'learning_rate': 2.092355966679961e-05, 'epoch': 2.43} + 81%|████████ | 14000/17285 [125:23:17<28:39:07, 31.40s/it][INFO|trainer.py:3081] 2023-08-28 05:17:54,673 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-28 05:17:54,674 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-28 05:17:54,674 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-11000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-14000 +[INFO|tokenization_utils_base.py:2210] 2023-08-28 05:19:19,317 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-14000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-28 05:19:19,325 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-14000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-14000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-14000 + 81%|████████ | 14001/17285 [125:25:24<54:41:07, 59.95s/it] 81%|████████ | 14002/17285 [125:26:00<48:08:32, 52.79s/it] 81%|████████ | 14003/17285 [125:26:31<42:08:53, 46.23s/it] 81%|████████ | 14004/17285 [125:26:56<36:22:17, 39.91s/it] 81%|████████ | 14005/17285 [125:27:21<32:28:47, 35.65s/it] 81%|████████ | 14006/17285 [125:28:04<34:14:06, 37.59s/it] 81%|████████ | 14007/17285 [125:28:37<33:01:54, 36.28s/it] 81%|████████ | 14008/17285 [125:29:07<31:30:39, 34.62s/it] 81%|████████ | 14009/17285 [125:29:36<29:49:41, 32.78s/it] 81%|████████ | 14010/17285 [125:30:13<30:58:24, 34.05s/it] {'loss': 1.3074, 'learning_rate': 2.080658914603415e-05, 'epoch': 2.43} + 81%|████████ | 14010/17285 [125:30:13<30:58:24, 34.05s/it] 81%|████████ | 14011/17285 [125:30:42<29:29:18, 32.42s/it] 81%|████████ | 14012/17285 [125:31:21<31:17:38, 34.42s/it] 81%|████████ | 14013/17285 [125:31:57<31:40:30, 34.85s/it] 81%|████████ | 14014/17285 [125:32:22<29:08:04, 32.06s/it] 81%|████████ | 14015/17285 [125:32:54<29:07:56, 32.07s/it] 81%|████████ | 14016/17285 [125:33:22<28:05:36, 30.94s/it] 81%|████████ | 14017/17285 [125:33:52<27:36:22, 30.41s/it] 81%|████████ | 14018/17285 [125:34:22<27:37:05, 30.43s/it] 81%|████████ | 14019/17285 [125:34:47<26:05:11, 28.75s/it] 81%|████████ | 14020/17285 [125:35:18<26:39:07, 29.39s/it] {'loss': 1.2862, 'learning_rate': 2.068990852135728e-05, 'epoch': 2.43} + 81%|████████ | 14020/17285 [125:35:18<26:39:07, 29.39s/it] 81%|████████ | 14021/17285 [125:35:49<27:07:08, 29.91s/it] 81%|████████ | 14022/17285 [125:36:23<28:06:06, 31.00s/it] 81%|████████ | 14023/17285 [125:36:58<29:20:06, 32.37s/it] 81%|████████ | 14024/17285 [125:37:24<27:35:37, 30.46s/it] 81%|████████ | 14025/17285 [125:37:53<27:06:09, 29.93s/it] 81%|████████ | 14026/17285 [125:38:22<26:52:27, 29.69s/it] 81%|████████ | 14027/17285 [125:38:52<27:02:53, 29.89s/it] 81%|████████ | 14028/17285 [125:39:30<29:05:34, 32.16s/it] 81%|████████ | 14029/17285 [125:39:58<28:07:30, 31.10s/it] 81%|████████ | 14030/17285 [125:40:25<26:59:43, 29.86s/it] {'loss': 1.285, 'learning_rate': 2.057351821989113e-05, 'epoch': 2.44} + 81%|████████ | 14030/17285 [125:40:25<26:59:43, 29.86s/it] 81%|████████ | 14031/17285 [125:41:03<29:06:49, 32.21s/it] 81%|████████ | 14032/17285 [125:41:36<29:26:47, 32.59s/it] 81%|████████ | 14033/17285 [125:42:05<28:15:01, 31.27s/it] 81%|████████ | 14034/17285 [125:42:35<27:57:13, 30.95s/it] 81%|████████ | 14035/17285 [125:43:00<26:26:36, 29.29s/it] 81%|████████ | 14036/17285 [125:43:33<27:14:01, 30.18s/it] 81%|████████ | 14037/17285 [125:44:08<28:33:40, 31.66s/it] 81%|████████ | 14038/17285 [125:44:39<28:34:36, 31.68s/it] 81%|████████ | 14039/17285 [125:45:11<28:29:50, 31.61s/it] 81%|████████ | 14040/17285 [125:45:50<30:29:50, 33.83s/it] {'loss': 1.2885, 'learning_rate': 2.045741866769507e-05, 'epoch': 2.44} + 81%|████████ | 14040/17285 [125:45:50<30:29:50, 33.83s/it] 81%|████████ | 14041/17285 [125:46:18<28:55:14, 32.09s/it] 81%|████████ | 14042/17285 [125:46:49<28:43:33, 31.89s/it] 81%|████████ | 14043/17285 [125:47:15<26:59:30, 29.97s/it] 81%|████████ | 14044/17285 [125:47:51<28:45:28, 31.94s/it] 81%|████████▏ | 14045/17285 [125:48:19<27:29:40, 30.55s/it] 81%|████████▏ | 14046/17285 [125:48:50<27:42:30, 30.80s/it] 81%|████████▏ | 14047/17285 [125:49:25<28:53:14, 32.12s/it] 81%|████████▏ | 14048/17285 [125:49:54<28:02:54, 31.19s/it] 81%|████████▏ | 14049/17285 [125:50:33<30:01:29, 33.40s/it] 81%|████████▏ | 14050/17285 [125:51:05<29:34:46, 32.92s/it] {'loss': 1.2729, 'learning_rate': 2.034161028976408e-05, 'epoch': 2.44} + 81%|████████▏ | 14050/17285 [125:51:05<29:34:46, 32.92s/it] 81%|████████▏ | 14051/17285 [125:51:35<28:56:48, 32.22s/it] 81%|████████▏ | 14052/17285 [125:52:12<30:11:14, 33.61s/it] 81%|████████▏ | 14053/17285 [125:52:46<30:16:05, 33.71s/it] 81%|████████▏ | 14054/17285 [125:53:29<32:45:19, 36.50s/it] 81%|████████▏ | 14055/17285 [125:53:58<30:51:26, 34.39s/it] 81%|████████▏ | 14056/17285 [125:54:30<30:00:02, 33.45s/it] 81%|████████▏ | 14057/17285 [125:55:03<30:01:25, 33.48s/it] 81%|████████▏ | 14058/17285 [125:55:33<29:06:45, 32.48s/it] 81%|████████▏ | 14059/17285 [125:56:07<29:22:02, 32.77s/it] 81%|████████▏ | 14060/17285 [125:56:39<29:08:21, 32.53s/it] {'loss': 1.2783, 'learning_rate': 2.0226093510027388e-05, 'epoch': 2.44} + 81%|████████▏ | 14060/17285 [125:56:39<29:08:21, 32.53s/it] 81%|████████▏ | 14061/17285 [125:57:11<29:02:28, 32.43s/it] 81%|████████▏ | 14062/17285 [125:57:43<28:47:14, 32.15s/it] 81%|████████▏ | 14063/17285 [125:58:12<28:03:10, 31.34s/it] 81%|████████▏ | 14064/17285 [125:58:41<27:21:15, 30.57s/it] 81%|████████▏ | 14065/17285 [125:59:10<26:58:36, 30.16s/it] 81%|████████▏ | 14066/17285 [125:59:48<29:02:52, 32.49s/it] 81%|████████▏ | 14067/17285 [126:00:18<28:18:22, 31.67s/it] 81%|████████▏ | 14068/17285 [126:00:53<29:19:58, 32.83s/it] 81%|████████▏ | 14069/17285 [126:01:24<28:40:49, 32.10s/it] 81%|████████▏ | 14070/17285 [126:01:54<28:18:59, 31.71s/it] {'loss': 1.2502, 'learning_rate': 2.0110868751346678e-05, 'epoch': 2.44} + 81%|████████▏ | 14070/17285 [126:01:54<28:18:59, 31.71s/it] 81%|████████▏ | 14071/17285 [126:02:27<28:28:45, 31.90s/it] 81%|████████▏ | 14072/17285 [126:02:56<27:46:44, 31.12s/it] 81%|████████▏ | 14073/17285 [126:03:26<27:22:20, 30.68s/it] 81%|████████▏ | 14074/17285 [126:03:57<27:34:27, 30.91s/it] 81%|████████▏ | 14075/17285 [126:04:33<28:55:31, 32.44s/it] 81%|████████▏ | 14076/17285 [126:05:04<28:33:58, 32.05s/it] 81%|████████▏ | 14077/17285 [126:05:30<26:55:19, 30.21s/it] 81%|████████▏ | 14078/17285 [126:06:09<29:05:27, 32.66s/it] 81%|████████▏ | 14079/17285 [126:06:34<27:08:30, 30.48s/it] 81%|████████▏ | 14080/17285 [126:07:01<26:14:54, 29.48s/it] {'loss': 1.2787, 'learning_rate': 1.999593643551475e-05, 'epoch': 2.44} + 81%|████████▏ | 14080/17285 [126:07:01<26:14:54, 29.48s/it] 81%|████████▏ | 14081/17285 [126:07:34<27:01:44, 30.37s/it] 81%|████████▏ | 14082/17285 [126:08:01<26:14:20, 29.49s/it] 81%|████████▏ | 14083/17285 [126:08:33<26:53:21, 30.23s/it] 81%|████████▏ | 14084/17285 [126:09:05<27:18:38, 30.71s/it] 81%|████████▏ | 14085/17285 [126:09:47<30:25:26, 34.23s/it] 81%|████████▏ | 14086/17285 [126:10:12<27:59:38, 31.50s/it] 81%|████████▏ | 14087/17285 [126:10:49<29:16:34, 32.96s/it] 82%|████████▏ | 14088/17285 [126:11:25<30:15:30, 34.07s/it] 82%|████████▏ | 14089/17285 [126:11:58<29:51:47, 33.64s/it] 82%|████████▏ | 14090/17285 [126:12:30<29:17:11, 33.00s/it] {'loss': 1.2514, 'learning_rate': 1.9881296983253773e-05, 'epoch': 2.45} + 82%|████████▏ | 14090/17285 [126:12:30<29:17:11, 33.00s/it] 82%|████████▏ | 14091/17285 [126:12:59<28:24:47, 32.02s/it] 82%|████████▏ | 14092/17285 [126:13:31<28:14:21, 31.84s/it] 82%|████████▏ | 14093/17285 [126:14:01<27:48:05, 31.35s/it] 82%|████████▏ | 14094/17285 [126:14:45<31:15:00, 35.26s/it] 82%|████████▏ | 14095/17285 [126:15:14<29:31:19, 33.32s/it] 82%|████████▏ | 14096/17285 [126:15:43<28:22:57, 32.04s/it] 82%|████████▏ | 14097/17285 [126:16:12<27:29:38, 31.05s/it] 82%|████████▏ | 14098/17285 [126:16:42<27:21:53, 30.91s/it] 82%|████████▏ | 14099/17285 [126:17:09<26:19:17, 29.74s/it] 82%|████████▏ | 14100/17285 [126:17:36<25:27:47, 28.78s/it] {'loss': 1.3199, 'learning_rate': 1.9766950814213946e-05, 'epoch': 2.45} + 82%|████████▏ | 14100/17285 [126:17:36<25:27:47, 28.78s/it] 82%|████████▏ | 14101/17285 [126:18:07<26:07:11, 29.53s/it] 82%|████████▏ | 14102/17285 [126:18:38<26:17:41, 29.74s/it][2023-08-28 06:13:52,091] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 82%|████████▏ | 14103/17285 [126:19:14<28:11:11, 31.89s/it] 82%|████████▏ | 14104/17285 [126:19:46<28:10:25, 31.88s/it] 82%|████████▏ | 14105/17285 [126:20:20<28:33:50, 32.34s/it] 82%|████████▏ | 14106/17285 [126:20:49<27:46:42, 31.46s/it] 82%|████████▏ | 14107/17285 [126:21:19<27:20:18, 30.97s/it] 82%|████████▏ | 14108/17285 [126:21:53<28:02:00, 31.77s/it] 82%|████████▏ | 14109/17285 [126:22:22<27:32:24, 31.22s/it] 82%|████████▏ | 14110/17285 [126:22:48<26:06:02, 29.59s/it] {'loss': 1.321, 'learning_rate': 1.966429036520796e-05, 'epoch': 2.45} + 82%|████████▏ | 14110/17285 [126:22:48<26:06:02, 29.59s/it] 82%|████████▏ | 14111/17285 [126:23:15<25:25:42, 28.84s/it] 82%|████████▏ | 14112/17285 [126:23:49<26:38:09, 30.22s/it] 82%|████████▏ | 14113/17285 [126:24:19<26:39:44, 30.26s/it] 82%|████████▏ | 14114/17285 [126:24:59<29:10:12, 33.12s/it] 82%|████████▏ | 14115/17285 [126:25:39<30:59:38, 35.20s/it] 82%|████████▏ | 14116/17285 [126:26:16<31:19:53, 35.59s/it] 82%|████████▏ | 14117/17285 [126:26:48<30:26:11, 34.59s/it] 82%|████████▏ | 14118/17285 [126:27:19<29:31:06, 33.55s/it] 82%|████████▏ | 14119/17285 [126:27:52<29:28:46, 33.52s/it] 82%|████████▏ | 14120/17285 [126:28:28<29:57:45, 34.08s/it] {'loss': 1.2912, 'learning_rate': 1.9550502586578255e-05, 'epoch': 2.45} + 82%|████████▏ | 14120/17285 [126:28:28<29:57:45, 34.08s/it] 82%|████████▏ | 14121/17285 [126:28:55<28:16:41, 32.17s/it] 82%|████████▏ | 14122/17285 [126:29:28<28:21:21, 32.27s/it] 82%|████████▏ | 14123/17285 [126:30:12<31:24:13, 35.75s/it] 82%|████████▏ | 14124/17285 [126:30:48<31:23:53, 35.76s/it] 82%|████████▏ | 14125/17285 [126:31:20<30:36:03, 34.86s/it] 82%|████████▏ | 14126/17285 [126:31:51<29:33:03, 33.68s/it] 82%|███���████▏ | 14127/17285 [126:32:31<31:09:06, 35.51s/it] 82%|████████▏ | 14128/17285 [126:33:05<30:44:40, 35.06s/it] 82%|████████▏ | 14129/17285 [126:33:31<28:14:00, 32.21s/it] 82%|████████▏ | 14130/17285 [126:33:58<27:01:40, 30.84s/it] {'loss': 1.2402, 'learning_rate': 1.9437009302078558e-05, 'epoch': 2.45} + 82%|████████▏ | 14130/17285 [126:33:58<27:01:40, 30.84s/it] 82%|████████▏ | 14131/17285 [126:34:31<27:28:54, 31.37s/it] 82%|████████▏ | 14132/17285 [126:34:55<25:42:02, 29.34s/it] 82%|████████▏ | 14133/17285 [126:35:35<28:17:45, 32.32s/it] 82%|████████▏ | 14134/17285 [126:36:08<28:31:26, 32.59s/it] 82%|████████▏ | 14135/17285 [126:36:41<28:41:11, 32.78s/it] 82%|████████▏ | 14136/17285 [126:37:30<32:52:10, 37.58s/it] 82%|████████▏ | 14137/17285 [126:38:14<34:31:32, 39.48s/it] 82%|████████▏ | 14138/17285 [126:39:03<37:03:18, 42.39s/it] 82%|████████▏ | 14139/17285 [126:39:28<32:28:54, 37.17s/it] 82%|████████▏ | 14140/17285 [126:39:58<30:33:40, 34.98s/it] {'loss': 1.2623, 'learning_rate': 1.9323810927163365e-05, 'epoch': 2.45} + 82%|████████▏ | 14140/17285 [126:39:58<30:33:40, 34.98s/it] 82%|████████▏ | 14141/17285 [126:40:27<29:03:29, 33.27s/it] 82%|████████▏ | 14142/17285 [126:41:00<29:02:03, 33.26s/it] 82%|████████▏ | 14143/17285 [126:41:31<28:17:35, 32.42s/it] 82%|████████▏ | 14144/17285 [126:41:56<26:22:50, 30.24s/it] 82%|████████▏ | 14145/17285 [126:42:36<29:01:47, 33.28s/it] 82%|████████▏ | 14146/17285 [126:43:09<28:51:03, 33.09s/it] 82%|████████▏ | 14147/17285 [126:43:46<29:43:36, 34.10s/it] 82%|████████▏ | 14148/17285 [126:44:19<29:27:37, 33.81s/it] 82%|████████▏ | 14149/17285 [126:44:54<29:48:54, 34.23s/it] 82%|████████▏ | 14150/17285 [126:45:21<27:50:47, 31.98s/it] {'loss': 1.2941, 'learning_rate': 1.921090787620764e-05, 'epoch': 2.46} + 82%|████████▏ | 14150/17285 [126:45:21<27:50:47, 31.98s/it] 82%|████████▏ | 14151/17285 [126:45:52<27:38:38, 31.75s/it] 82%|████████▏ | 14152/17285 [126:46:17<26:01:17, 29.90s/it] 82%|████████▏ | 14153/17285 [126:46:48<26:10:24, 30.08s/it] 82%|████████▏ | 14154/17285 [126:47:25<28:03:09, 32.25s/it] 82%|████████▏ | 14155/17285 [126:47:50<26:10:21, 30.10s/it] 82%|████████▏ | 14156/17285 [126:48:27<27:49:05, 32.01s/it] 82%|████████▏ | 14157/17285 [126:48:57<27:24:01, 31.53s/it] 82%|████████▏ | 14158/17285 [126:49:29<27:29:35, 31.65s/it] 82%|████████▏ | 14159/17285 [126:49:59<27:00:34, 31.11s/it] 82%|████████▏ | 14160/17285 [126:50:27<26:16:54, 30.28s/it] {'loss': 1.2472, 'learning_rate': 1.9098300562505266e-05, 'epoch': 2.46} + 82%|████████▏ | 14160/17285 [126:50:27<26:16:54, 30.28s/it] 82%|████████▏ | 14161/17285 [126:51:05<28:10:57, 32.48s/it] 82%|████████▏ | 14162/17285 [126:51:42<29:20:43, 33.83s/it] 82%|████████▏ | 14163/17285 [126:52:15<29:05:50, 33.55s/it] 82%|████████▏ | 14164/17285 [126:52:47<28:48:01, 33.22s/it] 82%|████████▏ | 14165/17285 [126:53:28<30:47:05, 35.52s/it] 82%|████████▏ | 14166/17285 [126:54:03<30:29:10, 35.19s/it] 82%|████████▏ | 14167/17285 [126:54:32<28:52:33, 33.34s/it] 82%|████████▏ | 14168/17285 [126:54:59<27:13:42, 31.45s/it] 82%|████████▏ | 14169/17285 [126:55:24<25:39:10, 29.64s/it] 82%|████████▏ | 14170/17285 [126:55:57<26:29:35, 30.62s/it] {'loss': 1.2568, 'learning_rate': 1.8985989398267557e-05, 'epoch': 2.46} + 82%|████████▏ | 14170/17285 [126:55:57<26:29:35, 30.62s/it] 82%|████████▏ | 14171/17285 [126:56:28<26:37:41, 30.78s/it] 82%|████████▏ | 14172/17285 [126:56:58<26:28:56, 30.63s/it] 82%|████████▏ | 14173/17285 [126:57:32<27:20:14, 31.62s/it] 82%|████████▏ | 14174/17285 [126:58:02<26:45:11, 30.96s/it] 82%|████████▏ | 14175/17285 [126:58:38<28:05:34, 32.52s/it] 82%|████████▏ | 14176/17285 [126:59:12<28:30:21, 33.01s/it] 82%|████████▏ | 14177/17285 [126:59:44<28:18:08, 32.78s/it] 82%|████████▏ | 14178/17285 [127:00:18<28:34:23, 33.11s/it] 82%|████████▏ | 14179/17285 [127:00:44<26:47:02, 31.04s/it] 82%|████████▏ | 14180/17285 [127:01:23<28:38:32, 33.21s/it] {'loss': 1.2569, 'learning_rate': 1.887397479462174e-05, 'epoch': 2.46} + 82%|████████▏ | 14180/17285 [127:01:23<28:38:32, 33.21s/it] 82%|████████▏ | 14181/17285 [127:01:59<29:21:54, 34.06s/it] 82%|████████▏ | 14182/17285 [127:02:41<31:35:31, 36.65s/it] 82%|████████▏ | 14183/17285 [127:03:15<30:41:30, 35.62s/it] 82%|████████▏ | 14184/17285 [127:03:47<29:50:50, 34.65s/it] 82%|████████▏ | 14185/17285 [127:04:21<29:45:40, 34.56s/it] 82%|████████▏ | 14186/17285 [127:04:50<28:09:08, 32.70s/it] 82%|████████▏ | 14187/17285 [127:05:16<26:35:55, 30.91s/it] 82%|████████▏ | 14188/17285 [127:05:52<27:53:32, 32.42s/it] 82%|████████▏ | 14189/17285 [127:06:19<26:22:45, 30.67s/it] 82%|████████▏ | 14190/17285 [127:06:51<26:36:06, 30.94s/it] {'loss': 1.2837, 'learning_rate': 1.8762257161609442e-05, 'epoch': 2.46} + 82%|████████▏ | 14190/17285 [127:06:51<26:36:06, 30.94s/it] 82%|████████▏ | 14191/17285 [127:07:22<26:44:45, 31.12s/it] 82%|████████▏ | 14192/17285 [127:07:54<27:00:22, 31.43s/it] 82%|████████▏ | 14193/17285 [127:08:19<25:13:48, 29.38s/it] 82%|████████▏ | 14194/17285 [127:08:47<24:55:34, 29.03s/it] 82%|████████▏ | 14195/17285 [127:09:23<26:48:01, 31.22s/it] 82%|████████▏ | 14196/17285 [127:09:57<27:27:04, 31.99s/it] 82%|████████▏ | 14197/17285 [127:10:30<27:36:08, 32.18s/it] 82%|████████▏ | 14198/17285 [127:11:02<27:35:18, 32.17s/it] 82%|████████▏ | 14199/17285 [127:11:30<26:30:36, 30.93s/it] 82%|████████▏ | 14200/17285 [127:11:59<25:58:31, 30.31s/it] {'loss': 1.255, 'learning_rate': 1.865083690818521e-05, 'epoch': 2.46} + 82%|████████▏ | 14200/17285 [127:11:59<25:58:31, 30.31s/it] 82%|████████▏ | 14201/17285 [127:12:27<25:29:18, 29.75s/it] 82%|████████▏ | 14202/17285 [127:12:53<24:29:44, 28.60s/it] 82%|████████▏ | 14203/17285 [127:13:24<24:58:12, 29.17s/it] 82%|████████▏ | 14204/17285 [127:13:53<25:05:06, 29.31s/it] 82%|████████▏ | 14205/17285 [127:14:30<26:57:24, 31.51s/it][2023-08-28 07:09:38,828] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 82%|████████▏ | 14206/17285 [127:15:01<26:51:51, 31.41s/it] 82%|████████▏ | 14207/17285 [127:15:28<25:34:31, 29.91s/it] 82%|████████▏ | 14208/17285 [127:15:58<25:38:43, 30.00s/it] 82%|████████▏ | 14209/17285 [127:16:23<24:31:48, 28.71s/it] 82%|████████▏ | 14210/17285 [127:16:57<25:48:23, 30.21s/it] {'loss': 1.264, 'learning_rate': 1.8550813276774915e-05, 'epoch': 2.47} + 82%|████████▏ | 14210/17285 [127:16:57<25:48:23, 30.21s/it] 82%|████████▏ | 14211/17285 [127:17:24<25:00:21, 29.28s/it][2023-08-28 07:12:27,042] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 82%|████████▏ | 14212/17285 [127:17:49<23:54:47, 28.01s/it] 82%|████████▏ | 14213/17285 [127:18:15<23:14:26, 27.24s/it] 82%|████████▏ | 14214/17285 [127:18:39<22:22:11, 26.22s/it] 82%|████████▏ | 14215/17285 [127:19:08<23:07:12, 27.11s/it] 82%|████████▏ | 14216/17285 [127:19:40<24:27:05, 28.68s/it] 82%|████████▏ | 14217/17285 [127:20:10<24:51:50, 29.18s/it] 82%|████████▏ | 14218/17285 [127:20:49<27:19:43, 32.08s/it] 82%|████████▏ | 14219/17285 [127:21:23<27:46:12, 32.61s/it] 82%|████████▏ | 14220/17285 [127:21:49<25:57:13, 30.48s/it] {'loss': 1.2722, 'learning_rate': 1.845103114979575e-05, 'epoch': 2.47} + 82%|████████▏ | 14220/17285 [127:21:49<25:57:13, 30.48s/it] 82%|████████▏ | 14221/17285 [127:22:26<27:48:04, 32.66s/it] 82%|████████▏ | 14222/17285 [127:23:06<29:35:39, 34.78s/it] 82%|████████▏ | 14223/17285 [127:23:37<28:27:33, 33.46s/it] 82%|████████▏ | 14224/17285 [127:24:12<28:50:37, 33.92s/it] 82%|████████▏ | 14225/17285 [127:24:41<27:37:21, 32.50s/it] 82%|████████▏ | 14226/17285 [127:25:09<26:34:48, 31.28s/it] 82%|████████▏ | 14227/17285 [127:25:38<25:52:49, 30.47s/it] 82%|████████▏ | 14228/17285 [127:26:08<25:52:06, 30.46s/it] 82%|████████▏ | 14229/17285 [127:26:40<26:09:33, 30.82s/it] 82%|████████▏ | 14230/17285 [127:27:22<28:57:44, 34.13s/it] {'loss': 1.2828, 'learning_rate': 1.8340445725584443e-05, 'epoch': 2.47} + 82%|████████▏ | 14230/17285 [127:27:22<28:57:44, 34.13s/it] 82%|████████▏ | 14231/17285 [127:27:52<27:55:56, 32.93s/it] 82%|████████▏ | 14232/17285 [127:28:18<26:15:37, 30.97s/it] 82%|████████▏ | 14233/17285 [127:28:49<26:08:28, 30.84s/it] 82%|████████▏ | 14234/17285 [127:29:23<26:56:54, 31.80s/it] 82%|████████▏ | 14235/17285 [127:29:57<27:31:11, 32.48s/it] 82%|████████▏ | 14236/17285 [127:30:22<25:32:14, 30.15s/it] 82%|████████▏ | 14237/17285 [127:30:57<26:58:13, 31.85s/it] 82%|████████▏ | 14238/17285 [127:31:31<27:24:37, 32.38s/it] 82%|████████▏ | 14239/17285 [127:32:02<27:04:05, 31.99s/it] 82%|████████▏ | 14240/17285 [127:32:31<26:18:05, 31.10s/it] {'loss': 1.2776, 'learning_rate': 1.8230159225047806e-05, 'epoch': 2.47} + 82%|████████▏ | 14240/17285 [127:32:31<26:18:05, 31.10s/it] 82%|████████▏ | 14241/17285 [127:33:07<27:24:31, 32.41s/it] 82%|████████▏ | 14242/17285 [127:33:44<28:43:54, 33.99s/it] 82%|████████▏ | 14243/17285 [127:34:19<28:54:49, 34.22s/it] 82%|████████▏ | 14244/17285 [127:34:53<28:52:42, 34.19s/it] 82%|████████▏ | 14245/17285 [127:35:20<26:59:10, 31.96s/it] 82%|████████▏ | 14246/17285 [127:35:52<27:05:00, 32.08s/it] 82%|████████▏ | 14247/17285 [127:36:22<26:21:29, 31.23s/it] 82%|████████▏ | 14248/17285 [127:36:49<25:17:50, 29.99s/it] 82%|████████▏ | 14249/17285 [127:37:23<26:26:36, 31.36s/it] 82%|████████▏ | 14250/17285 [127:37:49<25:03:36, 29.73s/it] {'loss': 1.2505, 'learning_rate': 1.8120172051901564e-05, 'epoch': 2.47} + 82%|████████▏ | 14250/17285 [127:37:49<25:03:36, 29.73s/it] 82%|████████▏ | 14251/17285 [127:38:19<25:09:10, 29.85s/it] 82%|████████▏ | 14252/17285 [127:38:55<26:42:30, 31.70s/it] 82%|████████▏ | 14253/17285 [127:39:22<25:33:59, 30.36s/it] 82%|████████▏ | 14254/17285 [127:39:57<26:40:05, 31.67s/it] 82%|████████▏ | 14255/17285 [127:40:32<27:34:20, 32.76s/it] 82%|████████▏ | 14256/17285 [127:41:05<27:31:47, 32.72s/it] 82%|████████▏ | 14257/17285 [127:41:42<28:34:34, 33.97s/it] 82%|████████▏ | 14258/17285 [127:42:22<30:05:03, 35.78s/it] 82%|████████▏ | 14259/17285 [127:42:49<27:44:10, 33.00s/it] 82%|████████▏ | 14260/17285 [127:43:27<29:04:42, 34.61s/it] {'loss': 1.2663, 'learning_rate': 1.801048460876572e-05, 'epoch': 2.47} + 82%|████████▏ | 14260/17285 [127:43:27<29:04:42, 34.61s/it] 83%|████████▎ | 14261/17285 [127:44:00<28:38:46, 34.10s/it] 83%|████████▎ | 14262/17285 [127:44:32<28:15:18, 33.65s/it] 83%|████████▎ | 14263/17285 [127:45:00<26:50:11, 31.97s/it] 83%|████████▎ | 14264/17285 [127:45:35<27:22:50, 32.63s/it] 83%|████████▎ | 14265/17285 [127:46:03<26:11:50, 31.23s/it] 83%|████████▎ | 14266/17285 [127:46:37<27:06:13, 32.32s/it] 83%|████████▎ | 14267/17285 [127:47:09<26:49:32, 32.00s/it] 83%|████████▎ | 14268/17285 [127:47:45<27:57:02, 33.35s/it] 83%|████████▎ | 14269/17285 [127:48:23<28:59:57, 34.61s/it] 83%|████████▎ | 14270/17285 [127:49:04<30:34:26, 36.51s/it] {'loss': 1.2305, 'learning_rate': 1.7901097297163094e-05, 'epoch': 2.48} + 83%|████████▎ | 14270/17285 [127:49:04<30:34:26, 36.51s/it] 83%|████████▎ | 14271/17285 [127:49:40<30:27:24, 36.38s/it] 83%|████████▎ | 14272/17285 [127:50:05<27:35:48, 32.97s/it] 83%|████████▎ | 14273/17285 [127:50:39<27:51:52, 33.30s/it] 83%|████████▎ | 14274/17285 [127:51:11<27:32:51, 32.94s/it] 83%|████████▎ | 14275/17285 [127:51:42<27:04:23, 32.38s/it] 83%|████████▎ | 14276/17285 [127:52:11<26:06:18, 31.23s/it] 83%|████████▎ | 14277/17285 [127:52:41<25:50:11, 30.92s/it] 83%|████████▎ | 14278/17285 [127:53:12<25:57:31, 31.08s/it] 83%|████████▎ | 14279/17285 [127:53:42<25:31:07, 30.56s/it] 83%|████████▎ | 14280/17285 [127:54:17<26:40:38, 31.96s/it] {'loss': 1.2955, 'learning_rate': 1.779201051751783e-05, 'epoch': 2.48} + 83%|████████▎ | 14280/17285 [127:54:17<26:40:38, 31.96s/it] 83%|████████▎ | 14281/17285 [127:54:54<28:05:55, 33.67s/it] 83%|████████▎ | 14282/17285 [127:55:28<28:09:47, 33.76s/it] 83%|████████▎ | 14283/17285 [127:56:04<28:37:55, 34.34s/it] 83%|████████▎ | 14284/17285 [127:56:33<27:12:23, 32.64s/it] 83%|████████▎ | 14285/17285 [127:57:05<27:02:33, 32.45s/it] 83%|████████▎ | 14286/17285 [127:57:36<26:46:21, 32.14s/it] 83%|████████▎ | 14287/17285 [127:58:04<25:42:31, 30.87s/it] 83%|████████▎ | 14288/17285 [127:58:37<26:05:37, 31.34s/it] 83%|████████▎ | 14289/17285 [127:59:11<26:49:51, 32.24s/it] 83%|████████▎ | 14290/17285 [127:59:37<25:13:53, 30.33s/it] {'loss': 1.2682, 'learning_rate': 1.768322466915392e-05, 'epoch': 2.48} + 83%|████████▎ | 14290/17285 [127:59:37<25:13:53, 30.33s/it] 83%|████████▎ | 14291/17285 [128:00:08<25:28:21, 30.63s/it] 83%|████████▎ | 14292/17285 [128:00:36<24:54:07, 29.95s/it] 83%|████████▎ | 14293/17285 [128:01:07<25:07:55, 30.24s/it] 83%|████████▎ | 14294/17285 [128:01:37<25:04:06, 30.17s/it] 83%|████████▎ | 14295/17285 [128:02:12<26:03:35, 31.38s/it] 83%|████████▎ | 14296/17285 [128:02:39<24:56:54, 30.05s/it] 83%|████████▎ | 14297/17285 [128:03:15<26:27:35, 31.88s/it] 83%|████████▎ | 14298/17285 [128:03:55<28:33:38, 34.42s/it] 83%|████████▎ | 14299/17285 [128:04:30<28:45:29, 34.67s/it] 83%|████████▎ | 14300/17285 [128:04:55<26:10:19, 31.56s/it] {'loss': 1.2796, 'learning_rate': 1.7574740150293778e-05, 'epoch': 2.48} + 83%|████████▎ | 14300/17285 [128:04:55<26:10:19, 31.56s/it] 83%|████████▎ | 14301/17285 [128:05:27<26:26:29, 31.90s/it] 83%|████████▎ | 14302/17285 [128:05:59<26:25:44, 31.90s/it] 83%|████████▎ | 14303/17285 [128:06:26<25:02:48, 30.24s/it] 83%|████████▎ | 14304/17285 [128:06:53<24:21:27, 29.42s/it] 83%|████████▎ | 14305/17285 [128:07:18<23:17:50, 28.14s/it] 83%|████████▎ | 14306/17285 [128:07:57<25:55:45, 31.33s/it] 83%|████████▎ | 14307/17285 [128:08:31<26:37:15, 32.18s/it] 83%|████████▎ | 14308/17285 [128:09:00<25:48:06, 31.20s/it] 83%|████████▎ | 14309/17285 [128:09:32<26:02:13, 31.50s/it] 83%|████████▎ | 14310/17285 [128:09:59<24:53:36, 30.12s/it] {'loss': 1.2872, 'learning_rate': 1.746655735805681e-05, 'epoch': 2.48} + 83%|████████▎ | 14310/17285 [128:09:59<24:53:36, 30.12s/it] 83%|████████▎ | 14311/17285 [128:10:36<26:34:17, 32.16s/it] 83%|████████▎ | 14312/17285 [128:11:04<25:35:36, 30.99s/it][2023-08-28 08:06:13,346] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2. Reducing hysteresis to 1 + 83%|████████▎ | 14313/17285 [128:11:36<25:39:51, 31.09s/it][2023-08-28 08:06:41,422] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072 + 83%|████████▎ | 14314/17285 [128:12:04<24:54:37, 30.18s/it] 83%|████████▎ | 14315/17285 [128:12:36<25:17:42, 30.66s/it] 83%|████████▎ | 14316/17285 [128:13:18<28:15:00, 34.25s/it] 83%|████████▎ | 14317/17285 [128:13:50<27:45:06, 33.66s/it] 83%|████████▎ | 14318/17285 [128:14:19<26:29:01, 32.13s/it] 83%|████████▎ | 14319/17285 [128:14:51<26:24:02, 32.04s/it] 83%|████████▎ | 14320/17285 [128:15:28<27:40:21, 33.60s/it] {'loss': 1.2768, 'learning_rate': 1.7380228633595075e-05, 'epoch': 2.49} + 83%|████████▎ | 14320/17285 [128:15:28<27:40:21, 33.60s/it] 83%|████████▎ | 14321/17285 [128:15:57<26:26:06, 32.11s/it] 83%|████████▎ | 14322/17285 [128:16:21<24:35:55, 29.89s/it] 83%|████████▎ | 14323/17285 [128:16:54<25:08:24, 30.56s/it] 83%|████████▎ | 14324/17285 [128:17:31<26:55:10, 32.73s/it] 83%|████████▎ | 14325/17285 [128:18:01<26:04:37, 31.72s/it] 83%|████████▎ | 14326/17285 [128:18:39<27:39:12, 33.64s/it] 83%|████████▎ | 14327/17285 [128:19:12<27:33:17, 33.54s/it] 83%|████████▎ | 14328/17285 [128:19:43<26:55:49, 32.79s/it] 83%|████████▎ | 14329/17285 [128:20:16<26:50:57, 32.70s/it] 83%|████████▎ | 14330/17285 [128:20:52<27:42:36, 33.76s/it] {'loss': 1.2804, 'learning_rate': 1.7272589946494132e-05, 'epoch': 2.49} + 83%|████████▎ | 14330/17285 [128:20:52<27:42:36, 33.76s/it] 83%|████████▎ | 14331/17285 [128:21:31<29:00:08, 35.34s/it] 83%|████████▎ | 14332/17285 [128:22:12<30:19:09, 36.96s/it] 83%|████████▎ | 14333/17285 [128:22:44<29:04:13, 35.45s/it] 83%|████████▎ | 14334/17285 [128:23:21<29:35:10, 36.09s/it] 83%|████████▎ | 14335/17285 [128:23:54<28:51:48, 35.22s/it] 83%|████████▎ | 14336/17285 [128:24:26<27:53:32, 34.05s/it] 83%|████████▎ | 14337/17285 [128:24:54<26:32:36, 32.41s/it] 83%|████████▎ | 14338/17285 [128:25:29<27:03:38, 33.06s/it] 83%|████████▎ | 14339/17285 [128:26:00<26:31:40, 32.42s/it] 83%|████████▎ | 14340/17285 [128:26:43<29:06:11, 35.58s/it] {'loss': 1.2582, 'learning_rate': 1.7165254092070015e-05, 'epoch': 2.49} + 83%|████████▎ | 14340/17285 [128:26:43<29:06:11, 35.58s/it] 83%|████████▎ | 14341/17285 [128:27:10<27:07:10, 33.16s/it] 83%|████████▎ | 14342/17285 [128:27:45<27:31:27, 33.67s/it] 83%|████████▎ | 14343/17285 [128:28:15<26:40:56, 32.65s/it] 83%|████████▎ | 14344/17285 [128:28:49<26:59:02, 33.03s/it] 83%|████████▎ | 14345/17285 [128:29:24<27:29:19, 33.66s/it] 83%|████████▎ | 14346/17285 [128:29:55<26:40:41, 32.68s/it] 83%|████████▎ | 14347/17285 [128:30:28<26:53:44, 32.96s/it] 83%|████████▎ | 14348/17285 [128:30:58<26:02:09, 31.91s/it] 83%|████████▎ | 14349/17285 [128:31:29<25:49:06, 31.66s/it] 83%|████████▎ | 14350/17285 [128:32:09<27:55:28, 34.25s/it] {'loss': 1.3296, 'learning_rate': 1.7058221463237277e-05, 'epoch': 2.49} + 83%|████████▎ | 14350/17285 [128:32:09<27:55:28, 34.25s/it] 83%|████████▎ | 14351/17285 [128:32:38<26:32:20, 32.56s/it] 83%|████████▎ | 14352/17285 [128:33:08<26:01:41, 31.95s/it] 83%|████████▎ | 14353/17285 [128:33:40<26:01:12, 31.95s/it] 83%|████████▎ | 14354/17285 [128:34:09<25:09:11, 30.89s/it] 83%|████████▎ | 14355/17285 [128:34:44<26:13:13, 32.22s/it] 83%|████████▎ | 14356/17285 [128:35:11<25:02:41, 30.78s/it] 83%|████████▎ | 14357/17285 [128:35:42<24:51:42, 30.57s/it] 83%|████████▎ | 14358/17285 [128:36:08<23:51:06, 29.34s/it] 83%|████████▎ | 14359/17285 [128:36:38<24:07:29, 29.68s/it] 83%|████████▎ | 14360/17285 [128:37:11<24:44:23, 30.45s/it] {'loss': 1.2726, 'learning_rate': 1.695149245180051e-05, 'epoch': 2.49} + 83%|████████▎ | 14360/17285 [128:37:11<24:44:23, 30.45s/it] 83%|████████▎ | 14361/17285 [128:37:40<24:30:46, 30.18s/it] 83%|████████▎ | 14362/17285 [128:38:10<24:27:37, 30.13s/it][2023-08-28 08:33:28,133] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 83%|████████▎ | 14363/17285 [128:38:50<26:53:50, 33.14s/it] 83%|████████▎ | 14364/17285 [128:39:22<26:32:28, 32.71s/it] 83%|████████▎ | 14365/17285 [128:39:50<25:14:44, 31.12s/it] 83%|████████▎ | 14366/17285 [128:40:19<24:49:59, 30.63s/it] 83%|████████▎ | 14367/17285 [128:40:49<24:42:05, 30.47s/it] 83%|████████▎ | 14368/17285 [128:41:22<25:21:13, 31.29s/it] 83%|████████▎ | 14369/17285 [128:41:49<24:14:17, 29.92s/it] 83%|████████▎ | 14370/17285 [128:42:18<24:05:22, 29.75s/it] {'loss': 1.273, 'learning_rate': 1.685569625731185e-05, 'epoch': 2.49} + 83%|████████▎ | 14370/17285 [128:42:18<24:05:22, 29.75s/it] 83%|████████▎ | 14371/17285 [128:42:45<23:12:42, 28.68s/it] 83%|████████▎ | 14372/17285 [128:43:17<23:59:33, 29.65s/it] 83%|████████▎ | 14373/17285 [128:43:56<26:20:38, 32.57s/it] 83%|████████▎ | 14374/17285 [128:44:25<25:31:53, 31.57s/it] 83%|████████▎ | 14375/17285 [128:44:56<25:25:19, 31.45s/it] 83%|████████▎ | 14376/17285 [128:45:29<25:36:16, 31.69s/it] 83%|████████▎ | 14377/17285 [128:45:57<24:42:17, 30.58s/it] 83%|████████▎ | 14378/17285 [128:46:39<27:28:27, 34.02s/it] 83%|████████▎ | 14379/17285 [128:47:15<28:02:25, 34.74s/it] 83%|████████▎ | 14380/17285 [128:47:48<27:42:48, 34.34s/it] {'loss': 1.3058, 'learning_rate': 1.6749545194367288e-05, 'epoch': 2.5} + 83%|████████▎ | 14380/17285 [128:47:48<27:42:48, 34.34s/it] 83%|████████▎ | 14381/17285 [128:48:18<26:33:14, 32.92s/it] 83%|████████▎ | 14382/17285 [128:48:49<26:04:26, 32.33s/it] 83%|████████▎ | 14383/17285 [128:49:27<27:27:51, 34.07s/it] 83%|████████▎ | 14384/17285 [128:49:56<26:18:37, 32.65s/it] 83%|████████▎ | 14385/17285 [128:50:31<26:41:38, 33.14s/it] 83%|████████▎ | 14386/17285 [128:51:07<27:28:38, 34.12s/it] 83%|████████▎ | 14387/17285 [128:51:38<26:37:36, 33.08s/it] 83%|████████▎ | 14388/17285 [128:52:14<27:16:21, 33.89s/it] 83%|████████▎ | 14389/17285 [128:52:44<26:18:59, 32.71s/it] 83%|████████▎ | 14390/17285 [128:53:12<25:22:28, 31.55s/it] {'loss': 1.295, 'learning_rate': 1.6643698878761716e-05, 'epoch': 2.5} + 83%|████████▎ | 14390/17285 [128:53:12<25:22:28, 31.55s/it] 83%|████████▎ | 14391/17285 [128:53:42<24:48:48, 30.87s/it] 83%|████████▎ | 14392/17285 [128:54:15<25:26:48, 31.67s/it] 83%|████████▎ | 14393/17285 [128:54:50<26:10:40, 32.59s/it] 83%|████████▎ | 14394/17285 [128:55:22<26:03:42, 32.45s/it] 83%|████████▎ | 14395/17285 [128:55:52<25:29:12, 31.75s/it] 83%|████████▎ | 14396/17285 [128:56:26<25:53:31, 32.26s/it] 83%|████████▎ | 14397/17285 [128:56:59<26:13:50, 32.70s/it] 83%|████████▎ | 14398/17285 [128:57:32<26:16:29, 32.76s/it] 83%|████████▎ | 14399/17285 [128:58:03<25:46:17, 32.15s/it] 83%|████████▎ | 14400/17285 [128:58:34<25:25:05, 31.72s/it] {'loss': 1.3008, 'learning_rate': 1.6538157697957113e-05, 'epoch': 2.5} + 83%|████████▎ | 14400/17285 [128:58:34<25:25:05, 31.72s/it] 83%|████████▎ | 14401/17285 [128:59:07<25:44:31, 32.13s/it] 83%|████████▎ | 14402/17285 [128:59:38<25:23:21, 31.70s/it] 83%|████████▎ | 14403/17285 [129:00:11<25:49:00, 32.25s/it] 83%|████████▎ | 14404/17285 [129:00:36<23:59:54, 29.99s/it] 83%|████████▎ | 14405/17285 [129:01:04<23:27:34, 29.32s/it] 83%|████████▎ | 14406/17285 [129:01:41<25:30:46, 31.90s/it] 83%|████████▎ | 14407/17285 [129:02:11<24:59:06, 31.25s/it] 83%|████████▎ | 14408/17285 [129:02:50<26:54:15, 33.67s/it] 83%|████████▎ | 14409/17285 [129:03:31<28:32:32, 35.73s/it] 83%|████████▎ | 14410/17285 [129:03:58<26:32:03, 33.23s/it] {'loss': 1.2531, 'learning_rate': 1.643292203829839e-05, 'epoch': 2.5} + 83%|████████▎ | 14410/17285 [129:03:58<26:32:03, 33.23s/it] 83%|████████▎ | 14411/17285 [129:04:33<26:51:56, 33.65s/it] 83%|████████▎ | 14412/17285 [129:05:08<27:03:36, 33.91s/it] 83%|████████▎ | 14413/17285 [129:05:33<24:57:05, 31.28s/it] 83%|████████▎ | 14414/17285 [129:06:13<27:02:21, 33.91s/it] 83%|████████▎ | 14415/17285 [129:06:42<25:51:47, 32.44s/it] 83%|████████▎ | 14416/17285 [129:07:13<25:33:43, 32.08s/it] 83%|████████▎ | 14417/17285 [129:07:44<25:19:41, 31.79s/it] 83%|████████▎ | 14418/17285 [129:08:15<25:05:52, 31.51s/it] 83%|████████▎ | 14419/17285 [129:08:55<27:00:12, 33.92s/it] 83%|████████▎ | 14420/17285 [129:09:27<26:44:39, 33.61s/it] {'loss': 1.2844, 'learning_rate': 1.632799228501215e-05, 'epoch': 2.5} + 83%|████████▎ | 14420/17285 [129:09:27<26:44:39, 33.61s/it] 83%|████████▎ | 14421/17285 [129:09:59<26:14:57, 33.00s/it] 83%|████████▎ | 14422/17285 [129:10:27<25:06:04, 31.56s/it] 83%|████████▎ | 14423/17285 [129:11:05<26:31:33, 33.37s/it] 83%|████████▎ | 14424/17285 [129:11:29<24:27:39, 30.78s/it] 83%|████████▎ | 14425/17285 [129:12:08<26:22:28, 33.20s/it] 83%|████████▎ | 14426/17285 [129:12:35<24:43:17, 31.13s/it] 83%|████████▎ | 14427/17285 [129:13:08<25:10:38, 31.71s/it] 83%|████████▎ | 14428/17285 [129:13:39<25:03:22, 31.57s/it] 83%|████████▎ | 14429/17285 [129:14:06<23:58:12, 30.21s/it] 83%|████████▎ | 14430/17285 [129:14:39<24:35:43, 31.01s/it] {'loss': 1.2638, 'learning_rate': 1.622336882220514e-05, 'epoch': 2.5} + 83%|████████▎ | 14430/17285 [129:14:39<24:35:43, 31.01s/it] 83%|████████▎ | 14431/17285 [129:15:19<26:46:44, 33.78s/it] 83%|████████▎ | 14432/17285 [129:16:00<28:21:28, 35.78s/it] 84%|████████▎ | 14433/17285 [129:16:33<27:43:54, 35.00s/it] 84%|████████▎ | 14434/17285 [129:17:04<26:48:34, 33.85s/it] 84%|████████▎ | 14435/17285 [129:17:35<26:08:55, 33.03s/it] 84%|████████▎ | 14436/17285 [129:18:04<25:10:59, 31.82s/it] 84%|████████▎ | 14437/17285 [129:18:33<24:26:17, 30.89s/it] 84%|████████▎ | 14438/17285 [129:19:18<27:49:59, 35.19s/it] 84%|████████▎ | 14439/17285 [129:19:47<26:24:01, 33.39s/it] 84%|████████▎ | 14440/17285 [129:20:14<24:54:59, 31.53s/it] {'loss': 1.2261, 'learning_rate': 1.6119052032862915e-05, 'epoch': 2.51} + 84%|████████▎ | 14440/17285 [129:20:14<24:54:59, 31.53s/it] 84%|████████▎ | 14441/17285 [129:20:52<26:27:25, 33.49s/it] 84%|████████▎ | 14442/17285 [129:21:23<25:41:05, 32.52s/it] 84%|████████▎ | 14443/17285 [129:21:51<24:41:33, 31.28s/it] 84%|████████▎ | 14444/17285 [129:22:27<25:53:41, 32.81s/it] 84%|████████▎ | 14445/17285 [129:22:57<25:08:43, 31.87s/it] 84%|████████▎ | 14446/17285 [129:23:28<24:55:12, 31.60s/it] 84%|████████▎ | 14447/17285 [129:23:55<23:54:16, 30.32s/it] 84%|████████▎ | 14448/17285 [129:24:22<22:55:44, 29.10s/it] 84%|████████▎ | 14449/17285 [129:25:02<25:37:44, 32.53s/it] 84%|████████▎ | 14450/17285 [129:25:28<23:59:21, 30.46s/it] {'loss': 1.2561, 'learning_rate': 1.601504229884846e-05, 'epoch': 2.51} + 84%|████████▎ | 14450/17285 [129:25:28<23:59:21, 30.46s/it] 84%|████████▎ | 14451/17285 [129:26:05<25:31:39, 32.43s/it] 84%|████████▎ | 14452/17285 [129:26:46<27:38:14, 35.12s/it] 84%|████████▎ | 14453/17285 [129:27:11<25:06:49, 31.92s/it] 84%|████████▎ | 14454/17285 [129:27:46<25:53:30, 32.93s/it] 84%|████████▎ | 14455/17285 [129:28:18<25:35:32, 32.56s/it] 84%|████████▎ | 14456/17285 [129:28:42<23:43:34, 30.19s/it] 84%|████████▎ | 14457/17285 [129:29:12<23:32:43, 29.97s/it] 84%|████████▎ | 14458/17285 [129:29:42<23:28:51, 29.90s/it] 84%|████████▎ | 14459/17285 [129:30:07<22:21:36, 28.48s/it] 84%|████████▎ | 14460/17285 [129:30:34<22:01:18, 28.06s/it] {'loss': 1.2693, 'learning_rate': 1.5911340000900688e-05, 'epoch': 2.51} + 84%|████████▎ | 14460/17285 [129:30:34<22:01:18, 28.06s/it] 84%|████████▎ | 14461/17285 [129:31:08<23:30:16, 29.96s/it] 84%|████████▎ | 14462/17285 [129:31:43<24:44:16, 31.55s/it] 84%|████████▎ | 14463/17285 [129:32:13<24:21:44, 31.08s/it] 84%|████████▎ | 14464/17285 [129:32:50<25:36:31, 32.68s/it] 84%|████████▎ | 14465/17285 [129:33:19<24:47:01, 31.64s/it] 84%|████████▎ | 14466/17285 [129:34:02<27:30:59, 35.14s/it] 84%|████████▎ | 14467/17285 [129:34:38<27:41:46, 35.38s/it] 84%|████████▎ | 14468/17285 [129:35:11<27:04:47, 34.61s/it] 84%|████████▎ | 14469/17285 [129:35:47<27:21:36, 34.98s/it] 84%|████████▎ | 14470/17285 [129:36:18<26:28:09, 33.85s/it] {'loss': 1.267, 'learning_rate': 1.580794551863316e-05, 'epoch': 2.51} + 84%|████████▎ | 14470/17285 [129:36:18<26:28:09, 33.85s/it] 84%|████████▎ | 14471/17285 [129:36:47<25:21:17, 32.44s/it] 84%|████████▎ | 14472/17285 [129:37:27<27:02:25, 34.61s/it] 84%|████████▎ | 14473/17285 [129:38:07<28:15:04, 36.17s/it] 84%|████████▎ | 14474/17285 [129:38:31<25:22:17, 32.49s/it] 84%|���███████▎ | 14475/17285 [129:39:04<25:33:12, 32.74s/it] 84%|████████▎ | 14476/17285 [129:39:34<24:48:53, 31.80s/it] 84%|████████▍ | 14477/17285 [129:40:01<23:51:19, 30.58s/it] 84%|████████▍ | 14478/17285 [129:40:32<23:57:05, 30.72s/it] 84%|████████▍ | 14479/17285 [129:41:03<23:52:17, 30.63s/it] 84%|████████▍ | 14480/17285 [129:41:35<24:12:10, 31.06s/it] {'loss': 1.3048, 'learning_rate': 1.5704859230532563e-05, 'epoch': 2.51} + 84%|████████▍ | 14480/17285 [129:41:35<24:12:10, 31.06s/it] 84%|████████▍ | 14481/17285 [129:42:00<22:47:22, 29.26s/it] 84%|████████▍ | 14482/17285 [129:42:31<23:10:28, 29.76s/it] 84%|████████▍ | 14483/17285 [129:43:08<24:45:32, 31.81s/it] 84%|████████▍ | 14484/17285 [129:43:34<23:36:10, 30.34s/it] 84%|████████▍ | 14485/17285 [129:44:09<24:30:49, 31.52s/it] 84%|████████▍ | 14486/17285 [129:44:38<24:02:05, 30.91s/it] 84%|████████▍ | 14487/17285 [129:45:08<23:50:08, 30.67s/it] 84%|████████▍ | 14488/17285 [129:45:41<24:20:54, 31.34s/it] 84%|████████▍ | 14489/17285 [129:46:14<24:40:06, 31.76s/it] 84%|████████▍ | 14490/17285 [129:46:47<24:58:40, 32.17s/it] {'loss': 1.2803, 'learning_rate': 1.560208151395749e-05, 'epoch': 2.51} + 84%|████████▍ | 14490/17285 [129:46:47<24:58:40, 32.17s/it] 84%|████████▍ | 14491/17285 [129:47:22<25:38:18, 33.03s/it] 84%|████████▍ | 14492/17285 [129:48:00<26:41:36, 34.41s/it] 84%|████████▍ | 14493/17285 [129:48:24<24:17:15, 31.32s/it] 84%|████████▍ | 14494/17285 [129:48:53<23:51:39, 30.78s/it] 84%|████████▍ | 14495/17285 [129:49:22<23:19:49, 30.10s/it] 84%|████████▍ | 14496/17285 [129:49:55<24:07:09, 31.13s/it] 84%|████████▍ | 14497/17285 [129:50:27<24:16:13, 31.34s/it] 84%|████████▍ | 14498/17285 [129:51:01<24:51:43, 32.11s/it] 84%|████████▍ | 14499/17285 [129:51:33<24:47:12, 32.03s/it] 84%|████████▍ | 14500/17285 [129:52:05<24:47:27, 32.05s/it] {'loss': 1.2607, 'learning_rate': 1.549961274513695e-05, 'epoch': 2.52} + 84%|████████▍ | 14500/17285 [129:52:05<24:47:27, 32.05s/it] 84%|████████▍ | 14501/17285 [129:52:39<25:07:00, 32.48s/it] 84%|████████▍ | 14502/17285 [129:53:13<25:40:47, 33.22s/it] 84%|████████▍ | 14503/17285 [129:53:47<25:49:37, 33.42s/it] 84%|████████▍ | 14504/17285 [129:54:13<24:01:24, 31.10s/it] 84%|████████▍ | 14505/17285 [129:54:38<22:31:51, 29.18s/it][2023-08-28 09:49:45,587] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 84%|████████▍ | 14506/17285 [129:55:08<22:44:46, 29.47s/it] 84%|████████▍ | 14507/17285 [129:55:35<22:09:23, 28.71s/it] 84%|████████▍ | 14508/17285 [129:56:14<24:35:02, 31.87s/it] 84%|████████▍ | 14509/17285 [129:56:53<26:17:11, 34.09s/it] 84%|████████▍ | 14510/17285 [129:57:25<25:48:09, 33.47s/it] {'loss': 1.275, 'learning_rate': 1.5407655313570525e-05, 'epoch': 2.52} + 84%|████████▍ | 14510/17285 [129:57:25<25:48:09, 33.47s/it] 84%|████████▍ | 14511/17285 [129:57:58<25:34:02, 33.18s/it] 84%|████████▍ | 14512/17285 [129:58:24<23:53:00, 31.01s/it] 84%|████████▍ | 14513/17285 [129:59:03<25:48:06, 33.51s/it][2023-08-28 09:54:11,459] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 84%|████████▍ | 14514/17285 [129:59:34<25:07:13, 32.64s/it] 84%|████████▍ | 14515/17285 [130:00:07<25:17:07, 32.86s/it] 84%|████████▍ | 14516/17285 [130:00:35<24:08:32, 31.39s/it] 84%|████████▍ | 14517/17285 [130:01:09<24:40:49, 32.10s/it] 84%|████████▍ | 14518/17285 [130:01:39<24:17:47, 31.61s/it] 84%|████████▍ | 14519/17285 [130:02:10<24:06:29, 31.38s/it] 84%|████████▍ | 14520/17285 [130:02:44<24:44:01, 32.20s/it] {'loss': 1.2627, 'learning_rate': 1.5315948706191573e-05, 'epoch': 2.52} + 84%|████████▍ | 14520/17285 [130:02:44<24:44:01, 32.20s/it] 84%|████████▍ | 14521/17285 [130:03:19<25:19:44, 32.99s/it] 84%|████████▍ | 14522/17285 [130:03:54<25:48:15, 33.62s/it] 84%|████████▍ | 14523/17285 [130:04:29<26:08:59, 34.08s/it] 84%|████████▍ | 14524/17285 [130:05:03<25:59:38, 33.89s/it] 84%|████████▍ | 14525/17285 [130:05:28<24:03:25, 31.38s/it] 84%|████████▍ | 14526/17285 [130:05:56<23:10:57, 30.25s/it] 84%|████████▍ | 14527/17285 [130:06:31<24:14:17, 31.64s/it] 84%|████████▍ | 14528/17285 [130:06:57<23:02:57, 30.10s/it] 84%|████████▍ | 14529/17285 [130:07:30<23:32:33, 30.75s/it] 84%|████████▍ | 14530/17285 [130:08:03<24:08:03, 31.54s/it] {'loss': 1.2514, 'learning_rate': 1.5214346982990213e-05, 'epoch': 2.52} + 84%|████████▍ | 14530/17285 [130:08:03<24:08:03, 31.54s/it] 84%|████████▍ | 14531/17285 [130:08:31<23:15:57, 30.41s/it] 84%|████████▍ | 14532/17285 [130:09:06<24:18:46, 31.79s/it] 84%|████████▍ | 14533/17285 [130:09:42<25:14:59, 33.03s/it] 84%|████████▍ | 14534/17285 [130:10:07<23:33:42, 30.83s/it] 84%|████████▍ | 14535/17285 [130:10:35<22:47:05, 29.83s/it] 84%|████████▍ | 14536/17285 [130:11:11<24:14:00, 31.74s/it] 84%|████████▍ | 14537/17285 [130:11:38<23:00:50, 30.15s/it] 84%|████████▍ | 14538/17285 [130:12:10<23:38:24, 30.98s/it] 84%|████████▍ | 14539/17285 [130:12:38<22:48:43, 29.91s/it] 84%|████████▍ | 14540/17285 [130:13:14<24:08:57, 31.67s/it] {'loss': 1.2496, 'learning_rate': 1.5113055626887762e-05, 'epoch': 2.52} + 84%|████████▍ | 14540/17285 [130:13:14<24:08:57, 31.67s/it] 84%|████████▍ | 14541/17285 [130:13:44<23:48:38, 31.24s/it] 84%|████████▍ | 14542/17285 [130:14:15<23:52:09, 31.33s/it] 84%|████████▍ | 14543/17285 [130:14:59<26:44:31, 35.11s/it] 84%|████████▍ | 14544/17285 [130:15:27<24:57:19, 32.78s/it] 84%|████████▍ | 14545/17285 [130:15:57<24:20:45, 31.99s/it] 84%|████████▍ | 14546/17285 [130:16:32<25:10:25, 33.09s/it] 84%|████████▍ | 14547/17285 [130:17:05<25:01:32, 32.90s/it] 84%|████████▍ | 14548/17285 [130:17:47<27:08:09, 35.69s/it] 84%|████████▍ | 14549/17285 [130:18:13<24:54:09, 32.77s/it] 84%|████████▍ | 14550/17285 [130:18:43<24:10:08, 31.81s/it] {'loss': 1.3028, 'learning_rate': 1.5012075008672267e-05, 'epoch': 2.53} + 84%|████████▍ | 14550/17285 [130:18:43<24:10:08, 31.81s/it] 84%|████████▍ | 14551/17285 [130:19:14<23:58:23, 31.57s/it] 84%|████████▍ | 14552/17285 [130:19:43<23:23:29, 30.81s/it] 84%|████████▍ | 14553/17285 [130:20:12<23:03:38, 30.39s/it] 84%|████████▍ | 14554/17285 [130:20:46<23:53:50, 31.50s/it] 84%|████████▍ | 14555/17285 [130:21:20<24:22:41, 32.15s/it] 84%|████████▍ | 14556/17285 [130:21:50<24:01:07, 31.68s/it] 84%|████████▍ | 14557/17285 [130:22:25<24:32:57, 32.40s/it] 84%|████████▍ | 14558/17285 [130:22:52<23:23:06, 30.87s/it] 84%|████████▍ | 14559/17285 [130:23:21<22:57:53, 30.33s/it] 84%|████████▍ | 14560/17285 [130:23:52<23:04:31, 30.49s/it] {'loss': 1.2599, 'learning_rate': 1.4911405497994235e-05, 'epoch': 2.53} + 84%|████████▍ | 14560/17285 [130:23:52<23:04:31, 30.49s/it] 84%|████████▍ | 14561/17285 [130:24:22<23:01:02, 30.42s/it] 84%|████████▍ | 14562/17285 [130:24:55<23:39:35, 31.28s/it] 84%|████████▍ | 14563/17285 [130:25:27<23:39:18, 31.29s/it] 84%|████████▍ | 14564/17285 [130:25:54<22:43:34, 30.07s/it] 84%|████████▍ | 14565/17285 [130:26:31<24:15:08, 32.10s/it] 84%|████████▍ | 14566/17285 [130:27:06<24:57:32, 33.05s/it] 84%|████████▍ | 14567/17285 [130:27:48<26:58:24, 35.73s/it] 84%|████████▍ | 14568/17285 [130:28:32<28:48:54, 38.18s/it] 84%|████████▍ | 14569/17285 [130:29:02<26:55:23, 35.69s/it] 84%|████████▍ | 14570/17285 [130:29:43<28:05:43, 37.25s/it] {'loss': 1.2633, 'learning_rate': 1.4811047463365357e-05, 'epoch': 2.53} + 84%|████████▍ | 14570/17285 [130:29:43<28:05:43, 37.25s/it] 84%|████████▍ | 14571/17285 [130:30:14<26:49:01, 35.57s/it] 84%|████████▍ | 14572/17285 [130:30:44<25:31:22, 33.87s/it] 84%|█���██████▍ | 14573/17285 [130:31:11<23:53:08, 31.71s/it] 84%|████████▍ | 14574/17285 [130:31:51<25:42:40, 34.14s/it] 84%|████████▍ | 14575/17285 [130:32:17<23:53:14, 31.73s/it] 84%|████████▍ | 14576/17285 [130:32:49<24:02:33, 31.95s/it] 84%|████████▍ | 14577/17285 [130:33:21<23:59:47, 31.90s/it] 84%|████████▍ | 14578/17285 [130:33:50<23:19:16, 31.01s/it] 84%|████████▍ | 14579/17285 [130:34:25<24:14:26, 32.25s/it] 84%|████████▍ | 14580/17285 [130:35:00<24:53:55, 33.14s/it] {'loss': 1.2443, 'learning_rate': 1.4711001272157132e-05, 'epoch': 2.53} + 84%|████████▍ | 14580/17285 [130:35:00<24:53:55, 33.14s/it] 84%|████████▍ | 14581/17285 [130:35:30<24:12:08, 32.22s/it] 84%|████████▍ | 14582/17285 [130:36:04<24:32:15, 32.68s/it] 84%|████████▍ | 14583/17285 [130:36:41<25:26:05, 33.89s/it] 84%|████████▍ | 14584/17285 [130:37:18<26:11:56, 34.92s/it] 84%|████████▍ | 14585/17285 [130:37:49<25:21:24, 33.81s/it] 84%|████████▍ | 14586/17285 [130:38:28<26:23:36, 35.20s/it] 84%|████████▍ | 14587/17285 [130:38:55<24:36:57, 32.85s/it] 84%|████████▍ | 14588/17285 [130:39:25<23:52:48, 31.88s/it] 84%|████████▍ | 14589/17285 [130:39:53<23:07:32, 30.88s/it] 84%|████████▍ | 14590/17285 [130:40:24<23:05:32, 30.85s/it] {'loss': 1.3036, 'learning_rate': 1.4611267290599528e-05, 'epoch': 2.53} + 84%|████████▍ | 14590/17285 [130:40:24<23:05:32, 30.85s/it] 84%|████████▍ | 14591/17285 [130:40:50<22:01:12, 29.43s/it] 84%|████████▍ | 14592/17285 [130:41:20<22:04:04, 29.50s/it] 84%|████████▍ | 14593/17285 [130:41:52<22:33:34, 30.17s/it] 84%|████████▍ | 14594/17285 [130:42:20<22:11:32, 29.69s/it] 84%|████████▍ | 14595/17285 [130:42:51<22:25:36, 30.01s/it] 84%|████████▍ | 14596/17285 [130:43:17<21:27:02, 28.72s/it] 84%|████████▍ | 14597/17285 [130:43:46<21:39:10, 29.00s/it] 84%|████████▍ | 14598/17285 [130:44:24<23:35:18, 31.60s/it] 84%|████████▍ | 14599/17285 [130:44:54<23:10:54, 31.07s/it] 84%|████████▍ | 14600/17285 [130:45:38<26:12:02, 35.13s/it] {'loss': 1.2934, 'learning_rate': 1.4511845883779607e-05, 'epoch': 2.53} + 84%|████████▍ | 14600/17285 [130:45:38<26:12:02, 35.13s/it] 84%|████████▍ | 14601/17285 [130:46:11<25:44:08, 34.52s/it] 84%|████████▍ | 14602/17285 [130:46:37<23:46:57, 31.91s/it] 84%|████████▍ | 14603/17285 [130:47:15<25:02:36, 33.62s/it] 84%|████████▍ | 14604/17285 [130:47:55<26:25:36, 35.49s/it] 84%|████████▍ | 14605/17285 [130:48:35<27:23:21, 36.79s/it] 85%|████████▍ | 14606/17285 [130:49:01<25:01:40, 33.63s/it] 85%|████████▍ | 14607/17285 [130:49:31<24:19:19, 32.70s/it] 85%|████████▍ | 14608/17285 [130:50:05<24:30:58, 32.97s/it] 85%|████████▍ | 14609/17285 [130:50:35<23:55:27, 32.19s/it] 85%|████████▍ | 14610/17285 [130:51:01<22:32:47, 30.34s/it] {'loss': 1.2485, 'learning_rate': 1.4412737415640232e-05, 'epoch': 2.54} + 85%|████████▍ | 14610/17285 [130:51:01<22:32:47, 30.34s/it] 85%|████████▍ | 14611/17285 [130:51:27<21:28:37, 28.91s/it] 85%|████████▍ | 14612/17285 [130:51:54<21:08:35, 28.48s/it] 85%|████████▍ | 14613/17285 [130:52:29<22:30:29, 30.33s/it] 85%|████████▍ | 14614/17285 [130:52:57<22:01:04, 29.68s/it] 85%|████████▍ | 14615/17285 [130:53:22<20:58:30, 28.28s/it] 85%|████████▍ | 14616/17285 [130:54:03<23:40:28, 31.93s/it] 85%|████████▍ | 14617/17285 [130:54:31<22:58:05, 30.99s/it] 85%|████████▍ | 14618/17285 [130:55:01<22:36:16, 30.51s/it] 85%|████████▍ | 14619/17285 [130:55:40<24:36:53, 33.24s/it] 85%|████████▍ | 14620/17285 [130:56:13<24:22:24, 32.92s/it] {'loss': 1.2625, 'learning_rate': 1.4313942248978752e-05, 'epoch': 2.54} + 85%|████████▍ | 14620/17285 [130:56:13<24:22:24, 32.92s/it] 85%|████████▍ | 14621/17285 [130:56:50<25:16:54, 34.16s/it][2023-08-28 10:51:54,861] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 85%|████████▍ | 14622/17285 [130:57:17<23:46:47, 32.15s/it] 85%|████████▍ | 14623/17285 [130:57:48<23:34:39, 31.89s/it] 85%|████████▍ | 14624/17285 [130:58:19<23:20:57, 31.59s/it] 85%|████████▍ | 14625/17285 [130:58:46<22:17:24, 30.17s/it] 85%|████████▍ | 14626/17285 [130:59:17<22:28:30, 30.43s/it] 85%|████████▍ | 14627/17285 [130:59:48<22:28:21, 30.44s/it] 85%|████████▍ | 14628/17285 [131:00:25<23:56:04, 32.43s/it][2023-08-28 10:55:36,074] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 85%|████████▍ | 14629/17285 [131:00:58<24:11:19, 32.79s/it] 85%|████████▍ | 14630/17285 [131:01:36<25:15:36, 34.25s/it] {'loss': 1.272, 'learning_rate': 1.4235131935781309e-05, 'epoch': 2.54} + 85%|████████▍ | 14630/17285 [131:01:36<25:15:36, 34.25s/it] 85%|████████▍ | 14631/17285 [131:02:15<26:11:01, 35.52s/it] 85%|████████▍ | 14632/17285 [131:02:44<24:47:59, 33.65s/it] 85%|████████▍ | 14633/17285 [131:03:20<25:26:53, 34.54s/it] 85%|████████▍ | 14634/17285 [131:03:51<24:38:17, 33.46s/it] 85%|████████▍ | 14635/17285 [131:04:19<23:21:02, 31.72s/it] 85%|████████▍ | 14636/17285 [131:04:53<23:49:17, 32.37s/it] 85%|████████▍ | 14637/17285 [131:05:27<24:14:25, 32.96s/it] 85%|████████▍ | 14638/17285 [131:05:55<23:08:39, 31.48s/it] 85%|████████▍ | 14639/17285 [131:06:25<22:49:59, 31.07s/it] 85%|████████▍ | 14640/17285 [131:07:00<23:40:56, 32.23s/it] {'loss': 1.2825, 'learning_rate': 1.4136901622367581e-05, 'epoch': 2.54} + 85%|████████▍ | 14640/17285 [131:07:00<23:40:56, 32.23s/it] 85%|████████▍ | 14641/17285 [131:07:37<24:37:21, 33.53s/it] 85%|████████▍ | 14642/17285 [131:08:13<25:15:44, 34.41s/it] 85%|████████▍ | 14643/17285 [131:08:43<24:16:57, 33.09s/it] 85%|████████▍ | 14644/17285 [131:09:15<23:55:22, 32.61s/it] 85%|████████▍ | 14645/17285 [131:09:43<22:53:32, 31.22s/it] 85%|████████▍ | 14646/17285 [131:10:19<23:52:22, 32.57s/it] 85%|████████▍ | 14647/17285 [131:10:48<23:14:47, 31.72s/it] 85%|████████▍ | 14648/17285 [131:11:21<23:27:51, 32.03s/it] 85%|████████▍ | 14649/17285 [131:11:52<23:18:21, 31.83s/it] 85%|████████▍ | 14650/17285 [131:12:31<24:46:06, 33.84s/it] {'loss': 1.2842, 'learning_rate': 1.403898562015863e-05, 'epoch': 2.54} + 85%|████████▍ | 14650/17285 [131:12:31<24:46:06, 33.84s/it] 85%|████████▍ | 14651/17285 [131:13:03<24:22:54, 33.32s/it] 85%|████████▍ | 14652/17285 [131:13:35<23:59:20, 32.80s/it] 85%|████████▍ | 14653/17285 [131:14:12<24:52:29, 34.02s/it] 85%|████████▍ | 14654/17285 [131:14:53<26:30:41, 36.28s/it] 85%|████████▍ | 14655/17285 [131:15:22<24:50:53, 34.01s/it] 85%|████████▍ | 14656/17285 [131:15:51<23:42:56, 32.47s/it] 85%|████████▍ | 14657/17285 [131:16:17<22:27:32, 30.77s/it] 85%|████████▍ | 14658/17285 [131:16:48<22:29:42, 30.83s/it] 85%|████████▍ | 14659/17285 [131:17:18<22:14:32, 30.49s/it] 85%|████████▍ | 14660/17285 [131:17:51<22:38:57, 31.06s/it] {'loss': 1.2833, 'learning_rate': 1.3941384287586633e-05, 'epoch': 2.54} + 85%|████████▍ | 14660/17285 [131:17:51<22:38:57, 31.06s/it] 85%|████████▍ | 14661/17285 [131:18:20<22:15:59, 30.55s/it] 85%|████████▍ | 14662/17285 [131:18:48<21:49:04, 29.94s/it] 85%|████████▍ | 14663/17285 [131:19:19<21:53:30, 30.06s/it] 85%|████████▍ | 14664/17285 [131:19:48<21:40:30, 29.77s/it] 85%|████████▍ | 14665/17285 [131:20:18<21:44:33, 29.88s/it] 85%|████████▍ | 14666/17285 [131:20:47<21:29:32, 29.54s/it] 85%|████████▍ | 14667/17285 [131:21:17<21:33:25, 29.64s/it] 85%|████████▍ | 14668/17285 [131:21:52<22:49:49, 31.41s/it] 85%|████████▍ | 14669/17285 [131:22:27<23:31:38, 32.38s/it] 85%|████████▍ | 14670/17285 [131:23:04<24:34:59, 33.84s/it] {'loss': 1.2957, 'learning_rate': 1.384409798193188e-05, 'epoch': 2.55} + 85%|████████▍ | 14670/17285 [131:23:04<24:34:59, 33.84s/it] 85%|██���█████▍ | 14671/17285 [131:23:33<23:34:40, 32.47s/it] 85%|████████▍ | 14672/17285 [131:24:04<23:11:59, 31.96s/it] 85%|████████▍ | 14673/17285 [131:24:46<25:20:25, 34.93s/it] 85%|████████▍ | 14674/17285 [131:25:12<23:21:53, 32.22s/it] 85%|████████▍ | 14675/17285 [131:25:42<22:57:53, 31.68s/it] 85%|████████▍ | 14676/17285 [131:26:10<22:04:15, 30.45s/it] 85%|████████▍ | 14677/17285 [131:26:51<24:24:17, 33.69s/it] 85%|████████▍ | 14678/17285 [131:27:18<22:54:35, 31.64s/it] 85%|████████▍ | 14679/17285 [131:27:59<25:02:42, 34.60s/it] 85%|████████▍ | 14680/17285 [131:28:39<26:12:44, 36.22s/it] {'loss': 1.2412, 'learning_rate': 1.3747127059321474e-05, 'epoch': 2.55} + 85%|████████▍ | 14680/17285 [131:28:39<26:12:44, 36.22s/it] 85%|████████▍ | 14681/17285 [131:29:11<25:16:17, 34.94s/it] 85%|████████▍ | 14682/17285 [131:29:43<24:26:38, 33.81s/it] 85%|████████▍ | 14683/17285 [131:30:08<22:36:17, 31.27s/it] 85%|████████▍ | 14684/17285 [131:30:33<21:09:29, 29.28s/it] 85%|████████▍ | 14685/17285 [131:31:04<21:34:49, 29.88s/it] 85%|████████▍ | 14686/17285 [131:31:37<22:11:29, 30.74s/it] 85%|████████▍ | 14687/17285 [131:32:06<21:58:11, 30.44s/it] 85%|████████▍ | 14688/17285 [131:32:37<22:06:30, 30.65s/it] 85%|████████▍ | 14689/17285 [131:33:13<23:13:53, 32.22s/it] 85%|████████▍ | 14690/17285 [131:33:44<22:54:45, 31.79s/it] {'loss': 1.2911, 'learning_rate': 1.3650471874727967e-05, 'epoch': 2.55} + 85%|████████▍ | 14690/17285 [131:33:44<22:54:45, 31.79s/it] 85%|████████▍ | 14691/17285 [131:34:13<22:18:28, 30.96s/it] 85%|████████▍ | 14692/17285 [131:34:49<23:25:42, 32.53s/it] 85%|████████▌ | 14693/17285 [131:35:21<23:17:56, 32.36s/it] 85%|████████▌ | 14694/17285 [131:35:49<22:18:19, 30.99s/it] 85%|████████▌ | 14695/17285 [131:36:23<22:58:48, 31.94s/it] 85%|████████▌ | 14696/17285 [131:36:56<23:04:40, 32.09s/it] 85%|████████▌ | 14697/17285 [131:37:28<23:07:52, 32.18s/it] 85%|████████▌ | 14698/17285 [131:37:58<22:39:23, 31.53s/it] 85%|████████▌ | 14699/17285 [131:38:31<22:59:42, 32.01s/it] 85%|████████▌ | 14700/17285 [131:39:08<23:59:07, 33.40s/it] {'loss': 1.3062, 'learning_rate': 1.3554132781968232e-05, 'epoch': 2.55} + 85%|████████▌ | 14700/17285 [131:39:08<23:59:07, 33.40s/it] 85%|████████▌ | 14701/17285 [131:39:36<22:56:38, 31.97s/it] 85%|████████▌ | 14702/17285 [131:40:05<22:06:04, 30.80s/it] 85%|████████▌ | 14703/17285 [131:40:30<21:02:48, 29.34s/it] 85%|████████▌ | 14704/17285 [131:41:05<22:08:11, 30.88s/it] 85%|████████▌ | 14705/17285 [131:41:31<21:00:29, 29.31s/it] 85%|████████▌ | 14706/17285 [131:42:11<23:18:32, 32.54s/it] 85%|████████▌ | 14707/17285 [131:42:38<22:07:35, 30.90s/it] 85%|████████▌ | 14708/17285 [131:43:09<22:13:49, 31.06s/it] 85%|████████▌ | 14709/17285 [131:43:40<22:13:17, 31.05s/it] 85%|████████▌ | 14710/17285 [131:44:19<23:58:09, 33.51s/it] {'loss': 1.2822, 'learning_rate': 1.3458110133701962e-05, 'epoch': 2.55} + 85%|████████▌ | 14710/17285 [131:44:19<23:58:09, 33.51s/it] 85%|████████▌ | 14711/17285 [131:44:50<23:21:05, 32.66s/it] 85%|████████▌ | 14712/17285 [131:45:23<23:26:56, 32.81s/it] 85%|████████▌ | 14713/17285 [131:46:04<25:13:54, 35.32s/it] 85%|████████▌ | 14714/17285 [131:46:32<23:30:16, 32.91s/it] 85%|████████▌ | 14715/17285 [131:46:59<22:22:39, 31.35s/it] 85%|████████▌ | 14716/17285 [131:47:32<22:35:33, 31.66s/it] 85%|████████▌ | 14717/17285 [131:48:05<22:49:52, 32.01s/it] 85%|████████▌ | 14718/17285 [131:48:35<22:29:01, 31.53s/it] 85%|████████▌ | 14719/17285 [131:49:10<23:17:02, 32.67s/it] 85%|████████▌ | 14720/17285 [131:49:43<23:12:37, 32.58s/it] {'loss': 1.2376, 'learning_rate': 1.3362404281430497e-05, 'epoch': 2.55} + 85%|████████▌ | 14720/17285 [131:49:43<23:12:37, 32.58s/it] 85%|████████▌ | 14721/17285 [131:50:12<22:33:11, 31.67s/it] 85%|███████��▌ | 14722/17285 [131:50:45<22:41:30, 31.87s/it] 85%|████████▌ | 14723/17285 [131:51:10<21:20:41, 29.99s/it] 85%|████████▌ | 14724/17285 [131:51:42<21:39:40, 30.45s/it] 85%|████████▌ | 14725/17285 [131:52:10<21:14:20, 29.87s/it] 85%|████████▌ | 14726/17285 [131:52:35<20:11:52, 28.41s/it] 85%|████████▌ | 14727/17285 [131:53:10<21:31:28, 30.29s/it] 85%|████████▌ | 14728/17285 [131:53:47<22:57:11, 32.32s/it] 85%|████████▌ | 14729/17285 [131:54:20<23:07:05, 32.56s/it] 85%|████████▌ | 14730/17285 [131:54:53<23:05:18, 32.53s/it] {'loss': 1.2577, 'learning_rate': 1.3267015575495512e-05, 'epoch': 2.56} + 85%|████████▌ | 14730/17285 [131:54:53<23:05:18, 32.53s/it] 85%|████████▌ | 14731/17285 [131:55:27<23:24:31, 33.00s/it] 85%|████████▌ | 14732/17285 [131:55:59<23:09:19, 32.65s/it] 85%|████████▌ | 14733/17285 [131:56:26<22:00:23, 31.04s/it] 85%|████████▌ | 14734/17285 [131:56:59<22:24:46, 31.63s/it] 85%|████████▌ | 14735/17285 [131:57:29<22:04:09, 31.16s/it] 85%|████████▌ | 14736/17285 [131:58:03<22:37:53, 31.96s/it] 85%|████████▌ | 14737/17285 [131:58:38<23:23:19, 33.05s/it] 85%|████████▌ | 14738/17285 [131:59:11<23:13:29, 32.83s/it] 85%|████████▌ | 14739/17285 [131:59:39<22:18:20, 31.54s/it] 85%|████████▌ | 14740/17285 [132:00:08<21:48:20, 30.84s/it] {'loss': 1.2595, 'learning_rate': 1.3171944365077748e-05, 'epoch': 2.56} + 85%|████████▌ | 14740/17285 [132:00:08<21:48:20, 30.84s/it] 85%|████████▌ | 14741/17285 [132:00:48<23:39:28, 33.48s/it] 85%|████████▌ | 14742/17285 [132:01:25<24:19:17, 34.43s/it] 85%|████████▌ | 14743/17285 [132:01:58<24:04:51, 34.10s/it] 85%|████████▌ | 14744/17285 [132:02:30<23:32:42, 33.36s/it] 85%|████████▌ | 14745/17285 [132:02:59<22:46:54, 32.29s/it] 85%|████████▌ | 14746/17285 [132:03:33<22:58:25, 32.57s/it] 85%|████████▌ | 14747/17285 [132:04:08<23:34:37, 33.44s/it] 85%|████████▌ | 14748/17285 [132:04:48<24:52:14, 35.29s/it] 85%|████████▌ | 14749/17285 [132:05:14<22:55:47, 32.55s/it] 85%|████████▌ | 14750/17285 [132:05:41<21:52:42, 31.07s/it] {'loss': 1.2946, 'learning_rate': 1.307719099819571e-05, 'epoch': 2.56} + 85%|████████▌ | 14750/17285 [132:05:41<21:52:42, 31.07s/it] 85%|████████▌ | 14751/17285 [132:06:14<22:06:30, 31.41s/it] 85%|████████▌ | 14752/17285 [132:06:49<22:53:58, 32.55s/it] 85%|████████▌ | 14753/17285 [132:07:22<23:00:25, 32.71s/it] 85%|████████▌ | 14754/17285 [132:07:49<21:52:14, 31.11s/it] 85%|████████▌ | 14755/17285 [132:08:18<21:22:57, 30.43s/it] 85%|████████▌ | 14756/17285 [132:08:50<21:45:54, 30.98s/it] 85%|████████▌ | 14757/17285 [132:09:25<22:32:02, 32.09s/it] 85%|████████▌ | 14758/17285 [132:10:05<24:04:58, 34.31s/it] 85%|████████▌ | 14759/17285 [132:10:30<22:12:34, 31.65s/it] 85%|████████▌ | 14760/17285 [132:11:04<22:39:04, 32.29s/it] {'loss': 1.2915, 'learning_rate': 1.2982755821704372e-05, 'epoch': 2.56} + 85%|████████▌ | 14760/17285 [132:11:04<22:39:04, 32.29s/it] 85%|████████▌ | 14761/17285 [132:11:33<22:03:13, 31.46s/it] 85%|████████▌ | 14762/17285 [132:12:04<21:51:48, 31.20s/it] 85%|████████▌ | 14763/17285 [132:12:30<20:49:20, 29.72s/it] 85%|████████▌ | 14764/17285 [132:13:02<21:10:43, 30.24s/it] 85%|████████▌ | 14765/17285 [132:13:30<20:42:53, 29.59s/it] 85%|████████▌ | 14766/17285 [132:13:57<20:13:52, 28.91s/it] 85%|████████▌ | 14767/17285 [132:14:33<21:43:30, 31.06s/it] 85%|████████▌ | 14768/17285 [132:15:04<21:43:24, 31.07s/it] 85%|████████▌ | 14769/17285 [132:15:45<23:46:29, 34.02s/it] 85%|████████▌ | 14770/17285 [132:16:16<23:02:20, 32.98s/it] {'loss': 1.2599, 'learning_rate': 1.288863918129396e-05, 'epoch': 2.56} + 85%|████████▌ | 14770/17285 [132:16:16<23:02:20, 32.98s/it] 85%|████████▌ | 14771/17285 [132:16:47<22:42:12, 32.51s/it] 85%|████████▌ | 14772/17285 [132:17:21<23:02:18, 33.00s/it] 85%|████████▌ | 14773/17285 [132:17:55<23:16:26, 33.35s/it] 85%|████████▌ | 14774/17285 [132:18:27<22:57:29, 32.91s/it] 85%|████████▌ | 14775/17285 [132:19:02<23:24:29, 33.57s/it] 85%|████████▌ | 14776/17285 [132:19:32<22:39:08, 32.50s/it] 85%|████████▌ | 14777/17285 [132:20:11<23:53:35, 34.30s/it] 85%|████████▌ | 14778/17285 [132:20:43<23:18:50, 33.48s/it] 86%|████████▌ | 14779/17285 [132:21:17<23:31:46, 33.80s/it] 86%|████████▌ | 14780/17285 [132:21:46<22:24:32, 32.20s/it] {'loss': 1.2552, 'learning_rate': 1.2794841421488679e-05, 'epoch': 2.57} + 86%|████████▌ | 14780/17285 [132:21:46<22:24:32, 32.20s/it][2023-08-28 12:17:02,690] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 86%|████████▌ | 14781/17285 [132:22:25<23:54:41, 34.38s/it] 86%|████████▌ | 14782/17285 [132:22:58<23:41:00, 34.06s/it] 86%|████████▌ | 14783/17285 [132:23:31<23:18:53, 33.55s/it] 86%|████████▌ | 14784/17285 [132:24:04<23:14:12, 33.45s/it] 86%|████████▌ | 14785/17285 [132:24:42<24:13:18, 34.88s/it] 86%|████████▌ | 14786/17285 [132:25:09<22:30:39, 32.43s/it] 86%|████████▌ | 14787/17285 [132:25:44<23:04:31, 33.26s/it] 86%|████████▌ | 14788/17285 [132:26:23<24:11:51, 34.89s/it] 86%|████████▌ | 14789/17285 [132:26:54<23:21:45, 33.70s/it] 86%|████████▌ | 14790/17285 [132:27:31<24:03:07, 34.70s/it] {'loss': 1.2647, 'learning_rate': 1.2710696364389941e-05, 'epoch': 2.57} + 86%|████████▌ | 14790/17285 [132:27:31<24:03:07, 34.70s/it] 86%|████████▌ | 14791/17285 [132:28:02<23:21:52, 33.73s/it] 86%|████████▌ | 14792/17285 [132:28:36<23:26:46, 33.86s/it][2023-08-28 12:23:42,176] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 86%|████████▌ | 14793/17285 [132:29:04<22:15:48, 32.16s/it] 86%|████████▌ | 14794/17285 [132:29:36<22:10:40, 32.05s/it] 86%|████████▌ | 14795/17285 [132:30:06<21:36:48, 31.25s/it] 86%|████████▌ | 14796/17285 [132:30:32<20:34:42, 29.76s/it] 86%|████████▌ | 14797/17285 [132:30:58<19:46:57, 28.62s/it] 86%|████████▌ | 14798/17285 [132:31:25<19:24:45, 28.10s/it] 86%|████████▌ | 14799/17285 [132:31:55<19:53:57, 28.82s/it] 86%|████████▌ | 14800/17285 [132:32:23<19:43:03, 28.56s/it] {'loss': 1.2425, 'learning_rate': 1.2626810128213363e-05, 'epoch': 2.57} + 86%|████████▌ | 14800/17285 [132:32:23<19:43:03, 28.56s/it] 86%|████████▌ | 14801/17285 [132:32:53<19:52:42, 28.81s/it] 86%|████████▌ | 14802/17285 [132:33:30<21:40:29, 31.43s/it] 86%|████████▌ | 14803/17285 [132:34:02<21:48:35, 31.63s/it] 86%|████████▌ | 14804/17285 [132:34:37<22:20:16, 32.41s/it] 86%|████████▌ | 14805/17285 [132:35:11<22:49:04, 33.12s/it] 86%|████████▌ | 14806/17285 [132:35:46<23:06:56, 33.57s/it] 86%|████████▌ | 14807/17285 [132:36:18<22:52:47, 33.24s/it] 86%|████████▌ | 14808/17285 [132:36:53<23:05:46, 33.57s/it] 86%|████████▌ | 14809/17285 [132:37:25<22:46:11, 33.11s/it] 86%|████████▌ | 14810/17285 [132:37:53<21:46:00, 31.66s/it] {'loss': 1.2571, 'learning_rate': 1.2533907057030315e-05, 'epoch': 2.57} + 86%|████████▌ | 14810/17285 [132:37:53<21:46:00, 31.66s/it] 86%|████████▌ | 14811/17285 [132:38:25<21:53:08, 31.85s/it] 86%|████████▌ | 14812/17285 [132:38:53<21:06:00, 30.72s/it] 86%|████████▌ | 14813/17285 [132:39:19<20:04:29, 29.24s/it] 86%|████████▌ | 14814/17285 [132:39:46<19:34:10, 28.51s/it] 86%|████████▌ | 14815/17285 [132:40:17<20:04:05, 29.25s/it] 86%|████████▌ | 14816/17285 [132:40:47<20:09:20, 29.39s/it] 86%|████████▌ | 14817/17285 [132:41:17<20:22:59, 29.73s/it] 86%|████████▌ | 14818/17285 [132:41:53<21:35:07, 31.50s/it] 86%|████████▌ | 14819/17285 [132:42:23<21:13:16, 30.98s/it] 86%|████████▌ | 14820/17285 [132:42:52<20:48:23, 30.39s/it] {'loss': 1.297, 'learning_rate': 1.244132416498789e-05, 'epoch': 2.57} + 86%|████████▌ | 14820/17285 [132:42:52<20:48:23, 30.39s/it] 86%|████████▌ | 14821/17285 [132:43:20<20:28:16, 29.91s/it] 86%|████████▌ | 14822/17285 [132:43:51<20:42:05, 30.26s/it] 86%|████████▌ | 14823/17285 [132:44:23<21:02:52, 30.78s/it] 86%|████████▌ | 14824/17285 [132:44:54<20:54:06, 30.58s/it] 86%|████████▌ | 14825/17285 [132:45:34<22:52:17, 33.47s/it] 86%|████████▌ | 14826/17285 [132:46:09<23:17:10, 34.09s/it] 86%|████████▌ | 14827/17285 [132:46:41<22:50:08, 33.45s/it] 86%|████████▌ | 14828/17285 [132:47:12<22:14:09, 32.58s/it] 86%|████████▌ | 14829/17285 [132:47:43<21:59:25, 32.23s/it] 86%|████████▌ | 14830/17285 [132:48:13<21:31:37, 31.57s/it] {'loss': 1.2686, 'learning_rate': 1.2349061790995841e-05, 'epoch': 2.57} + 86%|████████▌ | 14830/17285 [132:48:13<21:31:37, 31.57s/it] 86%|████████▌ | 14831/17285 [132:48:48<22:06:16, 32.43s/it] 86%|████████▌ | 14832/17285 [132:49:18<21:45:06, 31.92s/it] 86%|████████▌ | 14833/17285 [132:50:01<23:51:15, 35.02s/it] 86%|████████▌ | 14834/17285 [132:50:27<21:58:05, 32.27s/it] 86%|████████▌ | 14835/17285 [132:50:58<21:44:29, 31.95s/it] 86%|████████▌ | 14836/17285 [132:51:26<21:03:45, 30.96s/it] 86%|████████▌ | 14837/17285 [132:51:59<21:19:01, 31.35s/it] 86%|████████▌ | 14838/17285 [132:52:33<21:49:13, 32.10s/it] 86%|████████▌ | 14839/17285 [132:53:02<21:18:56, 31.37s/it] 86%|████████▌ | 14840/17285 [132:53:33<21:07:42, 31.11s/it] {'loss': 1.2944, 'learning_rate': 1.225712027279059e-05, 'epoch': 2.58} + 86%|████████▌ | 14840/17285 [132:53:33<21:07:42, 31.11s/it] 86%|████████▌ | 14841/17285 [132:54:03<20:58:10, 30.89s/it] 86%|████████▌ | 14842/17285 [132:54:38<21:47:19, 32.11s/it] 86%|████████▌ | 14843/17285 [132:55:12<22:07:41, 32.62s/it] 86%|████████▌ | 14844/17285 [132:55:42<21:31:19, 31.74s/it] 86%|████████▌ | 14845/17285 [132:56:06<19:57:32, 29.45s/it] 86%|████████▌ | 14846/17285 [132:56:41<21:06:49, 31.16s/it] 86%|████████▌ | 14847/17285 [132:57:15<21:38:57, 31.97s/it] 86%|████████▌ | 14848/17285 [132:57:49<22:08:14, 32.70s/it] 86%|████████▌ | 14849/17285 [132:58:14<20:33:35, 30.38s/it] 86%|████████▌ | 14850/17285 [132:59:01<23:55:00, 35.36s/it] {'loss': 1.2394, 'learning_rate': 1.21654999469341e-05, 'epoch': 2.58} + 86%|████████▌ | 14850/17285 [132:59:01<23:55:00, 35.36s/it] 86%|████████▌ | 14851/17285 [132:59:39<24:23:29, 36.08s/it] 86%|████████▌ | 14852/17285 [133:00:17<24:52:13, 36.80s/it] 86%|████████▌ | 14853/17285 [133:00:45<23:00:43, 34.06s/it] 86%|████████▌ | 14854/17285 [133:01:17<22:30:56, 33.34s/it] 86%|████████▌ | 14855/17285 [133:01:49<22:14:58, 32.96s/it] 86%|████████▌ | 14856/17285 [133:02:19<21:46:15, 32.27s/it] 86%|████████▌ | 14857/17285 [133:02:53<22:03:52, 32.72s/it] 86%|████████▌ | 14858/17285 [133:03:33<23:26:52, 34.78s/it] 86%|████████▌ | 14859/17285 [133:04:14<24:44:14, 36.71s/it] 86%|████████▌ | 14860/17285 [133:04:40<22:41:14, 33.68s/it] {'loss': 1.2908, 'learning_rate': 1.2074201148812537e-05, 'epoch': 2.58} + 86%|████████▌ | 14860/17285 [133:04:40<22:41:14, 33.68s/it] 86%|████████▌ | 14861/17285 [133:05:10<21:45:32, 32.32s/it] 86%|████████▌ | 14862/17285 [133:05:35<20:27:02, 30.39s/it] 86%|████████▌ | 14863/17285 [133:06:12<21:35:17, 32.09s/it] 86%|████████▌ | 14864/17285 [133:06:42<21:20:25, 31.73s/it] 86%|████████▌ | 14865/17285 [133:07:13<21:01:49, 31.28s/it] 86%|████████▌ | 14866/17285 [133:07:51<22:26:33, 33.40s/it] 86%|████████▌ | 14867/17285 [133:08:16<20:49:49, 31.01s/it] 86%|████████▌ | 14868/17285 [133:08:59<23:07:47, 34.45s/it] 86%|████████▌ | 14869/17285 [133:09:33<23:05:15, 34.40s/it] 86%|████████▌ | 14870/17285 [133:10:11<23:48:10, 35.48s/it] {'loss': 1.2721, 'learning_rate': 1.1983224212635024e-05, 'epoch': 2.58} + 86%|████████▌ | 14870/17285 [133:10:11<23:48:10, 35.48s/it] 86%|████████▌ | 14871/17285 [133:10:40<22:32:15, 33.61s/it] 86%|████████▌ | 14872/17285 [133:11:07<21:05:11, 31.46s/it] 86%|████████▌ | 14873/17285 [133:11:38<21:00:06, 31.35s/it] 86%|████████▌ | 14874/17285 [133:12:19<23:00:40, 34.36s/it] 86%|████████▌ | 14875/17285 [133:12:50<22:18:53, 33.33s/it] 86%|████████▌ | 14876/17285 [133:13:29<23:22:35, 34.93s/it] 86%|████████▌ | 14877/17285 [133:13:56<21:44:51, 32.51s/it] 86%|████████▌ | 14878/17285 [133:14:22<20:29:41, 30.65s/it] 86%|████████▌ | 14879/17285 [133:14:49<19:44:44, 29.54s/it] 86%|████████▌ | 14880/17285 [133:15:23<20:32:45, 30.76s/it] {'loss': 1.2818, 'learning_rate': 1.1892569471432557e-05, 'epoch': 2.58} + 86%|████████▌ | 14880/17285 [133:15:23<20:32:45, 30.76s/it] 86%|████████▌ | 14881/17285 [133:15:57<21:18:16, 31.90s/it] 86%|████████▌ | 14882/17285 [133:16:40<23:25:01, 35.08s/it] 86%|████████▌ | 14883/17285 [133:17:16<23:37:53, 35.42s/it] 86%|████████▌ | 14884/17285 [133:17:40<21:24:42, 32.10s/it] 86%|████████▌ | 14885/17285 [133:18:09<20:47:47, 31.19s/it] 86%|████████▌ | 14886/17285 [133:18:49<22:28:50, 33.73s/it] 86%|████████▌ | 14887/17285 [133:19:31<24:09:11, 36.26s/it] 86%|████████▌ | 14888/17285 [133:20:00<22:42:58, 34.12s/it] 86%|████████▌ | 14889/17285 [133:20:32<22:18:07, 33.51s/it] 86%|████████▌ | 14890/17285 [133:20:58<20:41:26, 31.10s/it] {'loss': 1.2811, 'learning_rate': 1.1802237257056659e-05, 'epoch': 2.58} + 86%|████████▌ | 14890/17285 [133:20:58<20:41:26, 31.10s/it] 86%|████████▌ | 14891/17285 [133:21:30<20:47:18, 31.26s/it] 86%|████████▌ | 14892/17285 [133:22:04<21:26:14, 32.25s/it] 86%|████████▌ | 14893/17285 [133:22:49<23:57:29, 36.06s/it] 86%|████████▌ | 14894/17285 [133:23:17<22:24:58, 33.75s/it] 86%|████████▌ | 14895/17285 [133:23:43<20:49:53, 31.38s/it] 86%|████████▌ | 14896/17285 [133:24:13<20:29:13, 30.87s/it] 86%|████████▌ | 14897/17285 [133:24:49<21:25:07, 32.29s/it] 86%|████████▌ | 14898/17285 [133:25:20<21:16:38, 32.09s/it] 86%|████████▌ | 14899/17285 [133:25:47<20:11:12, 30.46s/it] 86%|████████▌ | 14900/17285 [133:26:16<20:00:42, 30.21s/it] {'loss': 1.2835, 'learning_rate': 1.171222790017823e-05, 'epoch': 2.59} + 86%|████████▌ | 14900/17285 [133:26:16<20:00:42, 30.21s/it] 86%|████████▌ | 14901/17285 [133:26:47<19:58:34, 30.17s/it] 86%|████████▌ | 14902/17285 [133:27:15<19:35:30, 29.60s/it] 86%|████████▌ | 14903/17285 [133:27:48<20:12:54, 30.55s/it] 86%|████████▌ | 14904/17285 [133:28:15<19:34:58, 29.61s/it] 86%|████████▌ | 14905/17285 [133:29:02<22:58:52, 34.76s/it] 86%|████████▌ | 14906/17285 [133:29:28<21:19:57, 32.28s/it] 86%|████████▌ | 14907/17285 [133:29:58<20:46:59, 31.46s/it] 86%|████████▌ | 14908/17285 [133:30:29<20:48:00, 31.50s/it] 86%|████████▋ | 14909/17285 [133:30:57<20:06:45, 30.47s/it] 86%|████████▋ | 14910/17285 [133:31:27<19:51:04, 30.09s/it] {'loss': 1.2731, 'learning_rate': 1.1622541730286296e-05, 'epoch': 2.59} + 86%|████████▋ | 14910/17285 [133:31:27<19:51:04, 30.09s/it] 86%|████████▋ | 14911/17285 [133:32:01<20:43:20, 31.42s/it] 86%|████████▋ | 14912/17285 [133:32:30<20:12:09, 30.65s/it] 86%|████████▋ | 14913/17285 [133:33:10<21:57:01, 33.31s/it] 86%|████████▋ | 14914/17285 [133:33:42<21:49:21, 33.13s/it] 86%|████████▋ | 14915/17285 [133:34:11<20:56:13, 31.80s/it] 86%|████████▋ | 14916/17285 [133:34:43<21:03:27, 32.00s/it] 86%|████████▋ | 14917/17285 [133:35:13<20:36:17, 31.33s/it] 86%|████████▋ | 14918/17285 [133:35:44<20:35:10, 31.31s/it] 86%|████████▋ | 14919/17285 [133:36:16<20:38:49, 31.42s/it] 86%|████████▋ | 14920/17285 [133:36:43<19:41:49, 29.98s/it] {'loss': 1.2946, 'learning_rate': 1.153317907568684e-05, 'epoch': 2.59} + 86%|████████▋ | 14920/17285 [133:36:43<19:41:49, 29.98s/it] 86%|████████▋ | 14921/17285 [133:37:09<19:01:14, 28.97s/it] 86%|████████▋ | 14922/17285 [133:37:49<21:01:49, 32.04s/it] 86%|████████▋ | 14923/17285 [133:38:23<21:35:01, 32.90s/it] 86%|████████▋ | 14924/17285 [133:39:00<22:14:45, 33.92s/it] 86%|████████▋ | 14925/17285 [133:39:36<22:43:11, 34.66s/it] 86%|████████▋ | 14926/17285 [133:40:08<22:06:27, 33.74s/it] 86%|████████▋ | 14927/17285 [133:40:37<21:09:22, 32.30s/it] 86%|████████▋ | 14928/17285 [133:41:11<21:29:54, 32.84s/it] 86%|████████▋ | 14929/17285 [133:41:39<20:37:00, 31.50s/it] 86%|████████▋ | 14930/17285 [133:42:17<21:48:16, 33.33s/it] {'loss': 1.2726, 'learning_rate': 1.1444140263501591e-05, 'epoch': 2.59} + 86%|████████▋ | 14930/17285 [133:42:17<21:48:16, 33.33s/it] 86%|████████▋ | 14931/17285 [133:42:47<21:06:04, 32.27s/it] 86%|████████▋ | 14932/17285 [133:43:15<20:20:57, 31.13s/it] 86%|████████▋ | 14933/17285 [133:43:40<19:12:56, 29.41s/it] 86%|████████▋ | 14934/17285 [133:44:15<20:11:28, 30.92s/it] 86%|████████▋ | 14935/17285 [133:44:51<21:12:51, 32.50s/it] 86%|████████▋ | 14936/17285 [133:45:26<21:35:38, 33.09s/it] 86%|████████▋ | 14937/17285 [133:45:55<20:55:55, 32.09s/it] 86%|████████▋ | 14938/17285 [133:46:28<20:57:20, 32.14s/it] 86%|████████▋ | 14939/17285 [133:46:58<20:41:49, 31.76s/it] 86%|████████▋ | 14940/17285 [133:47:33<21:15:32, 32.64s/it] {'loss': 1.2807, 'learning_rate': 1.135542561966675e-05, 'epoch': 2.59} + 86%|████████▋ | 14940/17285 [133:47:33<21:15:32, 32.64s/it] 86%|████████▋ | 14941/17285 [133:48:13<22:41:06, 34.84s/it] 86%|████████▋ | 14942/17285 [133:48:43<21:46:13, 33.45s/it] 86%|████████▋ | 14943/17285 [133:49:15<21:28:07, 33.00s/it] 86%|████████▋ | 14944/17285 [133:49:45<20:46:37, 31.95s/it] 86%|████████▋ | 14945/17285 [133:50:22<21:53:01, 33.67s/it] 86%|████████▋ | 14946/17285 [133:50:53<21:18:48, 32.80s/it] 86%|████████▋ | 14947/17285 [133:51:28<21:45:44, 33.51s/it][2023-08-28 13:46:41,540] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 86%|████████▋ | 14948/17285 [133:52:04<22:08:05, 34.10s/it] 86%|████████▋ | 14949/17285 [133:52:32<20:55:21, 32.24s/it] 86%|████████▋ | 14950/17285 [133:52:58<19:43:57, 30.42s/it] {'loss': 1.2817, 'learning_rate': 1.1275859872585081e-05, 'epoch': 2.59} + 86%|████████▋ | 14950/17285 [133:52:58<19:43:57, 30.42s/it] 86%|████████▋ | 14951/17285 [133:53:31<20:14:55, 31.23s/it] 87%|████████▋ | 14952/17285 [133:54:02<20:06:54, 31.04s/it] 87%|████████▋ | 14953/17285 [133:54:38<21:09:14, 32.66s/it] 87%|████████▋ | 14954/17285 [133:55:04<19:44:41, 30.49s/it] 87%|████████▋ | 14955/17285 [133:55:31<19:04:13, 29.47s/it] 87%|████████▋ | 14956/17285 [133:55:57<18:25:57, 28.49s/it] 87%|████████▋ | 14957/17285 [133:56:28<18:55:50, 29.27s/it] 87%|████████▋ | 14958/17285 [133:57:04<20:10:02, 31.20s/it] 87%|████████▋ | 14959/17285 [133:57:40<21:07:36, 32.70s/it] 87%|████████▋ | 14960/17285 [133:58:16<21:48:16, 33.76s/it] {'loss': 1.2802, 'learning_rate': 1.1187762042319471e-05, 'epoch': 2.6} + 87%|████████▋ | 14960/17285 [133:58:16<21:48:16, 33.76s/it] 87%|████████▋ | 14961/17285 [133:58:48<21:21:35, 33.09s/it] 87%|████████▋ | 14962/17285 [133:59:23<21:50:23, 33.85s/it] 87%|████████▋ | 14963/17285 [133:59:56<21:36:25, 33.50s/it] 87%|████████▋ | 14964/17285 [134:00:27<21:02:35, 32.64s/it] 87%|████████▋ | 14965/17285 [134:00:57<20:42:13, 32.13s/it][2023-08-28 13:56:07,433] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 87%|████████▋ | 14966/17285 [134:01:30<20:43:52, 32.18s/it] 87%|████████▋ | 14967/17285 [134:02:01<20:33:19, 31.92s/it] 87%|████████▋ | 14968/17285 [134:02:29<19:51:57, 30.87s/it] 87%|████████▋ | 14969/17285 [134:03:01<19:54:16, 30.94s/it] 87%|████████▋ | 14970/17285 [134:03:26<18:55:18, 29.42s/it] {'loss': 1.2721, 'learning_rate': 1.1108751952271423e-05, 'epoch': 2.6} + 87%|████████▋ | 14970/17285 [134:03:26<18:55:18, 29.42s/it] 87%|████████▋ | 14971/17285 [134:03:59<19:33:09, 30.42s/it] 87%|████████▋ | 14972/17285 [134:04:29<19:27:23, 30.28s/it] 87%|████████▋ | 14973/17285 [134:05:03<20:03:17, 31.23s/it] 87%|████████▋ | 14974/17285 [134:05:38<20:45:42, 32.34s/it] 87%|████████▋ | 14975/17285 [134:06:15<21:40:58, 33.79s/it] 87%|████████▋ | 14976/17285 [134:06:45<20:58:28, 32.70s/it] 87%|████████▋ | 14977/17285 [134:07:11<19:39:57, 30.67s/it] 87%|████████▋ | 14978/17285 [134:07:36<18:40:59, 29.15s/it] 87%|████████▋ | 14979/17285 [134:08:13<20:08:23, 31.44s/it] 87%|████████▋ | 14980/17285 [134:08:47<20:39:14, 32.26s/it] {'loss': 1.2398, 'learning_rate': 1.1021272099769108e-05, 'epoch': 2.6} + 87%|████████▋ | 14980/17285 [134:08:47<20:39:14, 32.26s/it] 87%|████████▋ | 14981/17285 [134:09:22<21:11:32, 33.11s/it] 87%|████████▋ | 14982/17285 [134:09:56<21:19:23, 33.33s/it] 87%|████████▋ | 14983/17285 [134:10:38<22:53:55, 35.81s/it] 87%|████████▋ | 14984/17285 [134:11:08<21:46:30, 34.07s/it] 87%|████████▋ | 14985/17285 [134:11:38<21:03:13, 32.95s/it] 87%|████████▋ | 14986/17285 [134:12:08<20:22:41, 31.91s/it] 87%|████████▋ | 14987/17285 [134:12:42<20:48:40, 32.60s/it] 87%|████████▋ | 14988/17285 [134:13:10<19:59:32, 31.33s/it] 87%|████████▋ | 14989/17285 [134:13:44<20:30:34, 32.16s/it] 87%|████████▋ | 14990/17285 [134:14:18<20:42:43, 32.49s/it] {'loss': 1.2574, 'learning_rate': 1.093411796357211e-05, 'epoch': 2.6} + 87%|████████▋ | 14990/17285 [134:14:18<20:42:43, 32.49s/it] 87%|████████▋ | 14991/17285 [134:14:51<20:48:08, 32.65s/it] 87%|████████▋ | 14992/17285 [134:15:31<22:12:00, 34.85s/it] 87%|████████▋ | 14993/17285 [134:16:01<21:16:53, 33.43s/it] 87%|████████▋ | 14994/17285 [134:16:28<20:04:04, 31.53s/it] 87%|████████▋ | 14995/17285 [134:16:53<18:45:23, 29.49s/it] 87%|████████▋ | 14996/17285 [134:17:30<20:14:05, 31.82s/it] 87%|████████▋ | 14997/17285 [134:18:06<20:57:52, 32.99s/it] 87%|████████▋ | 14998/17285 [134:18:34<20:00:07, 31.49s/it] 87%|████████▋ | 14999/17285 [134:19:05<19:53:35, 31.33s/it] 87%|████████▋ | 15000/17285 [134:19:30<18:49:24, 29.66s/it] {'loss': 1.2228, 'learning_rate': 1.0847289862717614e-05, 'epoch': 2.6} + 87%|████████▋ | 15000/17285 [134:19:30<18:49:24, 29.66s/it][INFO|trainer.py:3081] 2023-08-28 14:14:08,008 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-28 14:14:08,009 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-28 14:14:08,009 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-12000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-15000 +[INFO|tokenization_utils_base.py:2210] 2023-08-28 14:15:32,803 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-15000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-28 14:15:32,807 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-15000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-15000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-15000 + 87%|████████▋ | 15001/17285 [134:21:34<36:44:49, 57.92s/it] 87%|████████▋ | 15002/17285 [134:22:08<32:08:30, 50.68s/it] 87%|████████▋ | 15003/17285 [134:22:37<28:05:58, 44.33s/it] 87%|████████▋ | 15004/17285 [134:23:14<26:35:46, 41.98s/it] 87%|████████▋ | 15005/17285 [134:23:51<25:34:22, 40.38s/it] 87%|████████▋ | 15006/17285 [134:24:21<23:44:56, 37.51s/it] 87%|████████▋ | 15007/17285 [134:24:50<22:04:39, 34.89s/it] 87%|████████▋ | 15008/17285 [134:25:25<21:57:56, 34.73s/it] 87%|████████▋ | 15009/17285 [134:25:55<21:04:25, 33.33s/it] 87%|████████▋ | 15010/17285 [134:26:23<20:12:01, 31.97s/it] {'loss': 1.3108, 'learning_rate': 1.0760788115049313e-05, 'epoch': 2.61} + 87%|████████▋ | 15010/17285 [134:26:23<20:12:01, 31.97s/it] 87%|████████▋ | 15011/17285 [134:27:05<22:00:04, 34.83s/it] 87%|████████▋ | 15012/17285 [134:27:32<20:31:04, 32.50s/it] 87%|████████▋ | 15013/17285 [134:28:00<19:36:42, 31.08s/it] 87%|████████▋ | 15014/17285 [134:28:29<19:18:51, 30.62s/it] 87%|████████▋ | 15015/17285 [134:29:06<20:25:35, 32.39s/it] 87%|████████▋ | 15016/17285 [134:29:44<21:28:50, 34.08s/it] 87%|████████▋ | 15017/17285 [134:30:23<22:20:16, 35.46s/it] 87%|████████▋ | 15018/17285 [134:30:52<21:17:43, 33.82s/it] 87%|████████▋ | 15019/17285 [134:31:22<20:30:29, 32.58s/it] 87%|████████▋ | 15020/17285 [134:31:52<19:55:41, 31.67s/it] {'loss': 1.2385, 'learning_rate': 1.0674613037216263e-05, 'epoch': 2.61} + 87%|████████▋ | 15020/17285 [134:31:52<19:55:41, 31.67s/it] 87%|████████▋ | 15021/17285 [134:32:20<19:16:06, 30.64s/it] 87%|████████▋ | 15022/17285 [134:32:45<18:15:07, 29.04s/it] 87%|████████▋ | 15023/17285 [134:33:21<19:32:30, 31.10s/it] 87%|████████▋ | 15024/17285 [134:34:02<21:18:30, 33.93s/it] 87%|████████▋ | 15025/17285 [134:34:31<20:24:02, 32.50s/it] 87%|████████▋ | 15026/17285 [134:35:04<20:33:21, 32.76s/it] 87%|████████▋ | 15027/17285 [134:35:40<21:05:00, 33.61s/it] 87%|████████▋ | 15028/17285 [134:36:11<20:31:44, 32.74s/it] 87%|████████▋ | 15029/17285 [134:36:49<21:30:10, 34.31s/it] 87%|████████▋ | 15030/17285 [134:37:15<20:02:50, 32.00s/it] {'loss': 1.2627, 'learning_rate': 1.0588764944671713e-05, 'epoch': 2.61} + 87%|████████▋ | 15030/17285 [134:37:15<20:02:50, 32.00s/it] 87%|████████▋ | 15031/17285 [134:37:55<21:28:06, 34.29s/it] 87%|████████▋ | 15032/17285 [134:38:29<21:25:41, 34.24s/it] 87%|████████▋ | 15033/17285 [134:38:56<20:05:09, 32.11s/it] 87%|████████▋ | 15034/17285 [134:39:24<19:22:10, 30.98s/it] 87%|████████▋ | 15035/17285 [134:39:52<18:40:02, 29.87s/it] 87%|████████▋ | 15036/17285 [134:40:23<18:52:12, 30.21s/it] 87%|████████▋ | 15037/17285 [134:40:53<18:51:25, 30.20s/it] 87%|████████▋ | 15038/17285 [134:41:19<18:10:01, 29.11s/it] 87%|████████▋ | 15039/17285 [134:41:49<18:12:46, 29.19s/it] 87%|████████▋ | 15040/17285 [134:42:23<19:03:18, 30.56s/it] {'loss': 1.2532, 'learning_rate': 1.0503244151671942e-05, 'epoch': 2.61} + 87%|████████▋ | 15040/17285 [134:42:23<19:03:18, 30.56s/it] 87%|████████▋ | 15041/17285 [134:42:54<19:17:51, 30.96s/it][2023-08-28 14:37:58,060] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 87%|████████▋ | 15042/17285 [134:43:20<18:21:17, 29.46s/it] 87%|████████▋ | 15043/17285 [134:43:48<18:00:26, 28.91s/it] 87%|████████▋ | 15044/17285 [134:44:18<18:13:45, 29.28s/it] 87%|████████▋ | 15045/17285 [134:44:44<17:34:02, 28.23s/it] 87%|████████▋ | 15046/17285 [134:45:14<17:48:51, 28.64s/it] 87%|████████▋ | 15047/17285 [134:45:42<17:49:02, 28.66s/it] 87%|████████▋ | 15048/17285 [134:46:12<17:57:55, 28.91s/it] 87%|████████▋ | 15049/17285 [134:46:51<19:57:47, 32.14s/it] 87%|████████▋ | 15050/17285 [134:47:26<20:27:27, 32.95s/it] {'loss': 1.2731, 'learning_rate': 1.0426555537850258e-05, 'epoch': 2.61} + 87%|████████▋ | 15050/17285 [134:47:26<20:27:27, 32.95s/it] 87%|████████▋ | 15051/17285 [134:47:52<19:04:22, 30.74s/it] 87%|████████▋ | 15052/17285 [134:48:27<19:50:24, 31.99s/it] 87%|████████▋ | 15053/17285 [134:49:00<20:05:55, 32.42s/it] 87%|████████▋ | 15054/17285 [134:49:25<18:39:13, 30.10s/it] 87%|████████▋ | 15055/17285 [134:49:59<19:28:38, 31.44s/it] 87%|████████▋ | 15056/17285 [134:50:32<19:40:56, 31.79s/it] 87%|████████▋ | 15057/17285 [134:50:57<18:26:13, 29.79s/it] 87%|████████▋ | 15058/17285 [134:51:24<17:48:03, 28.78s/it] 87%|████████▋ | 15059/17285 [134:52:03<19:44:43, 31.93s/it] 87%|████████▋ | 15060/17285 [134:52:31<19:00:02, 30.74s/it] {'loss': 1.2618, 'learning_rate': 1.034165747546959e-05, 'epoch': 2.61} + 87%|████████▋ | 15060/17285 [134:52:31<19:00:02, 30.74s/it] 87%|████████▋ | 15061/17285 [134:53:00<18:41:36, 30.26s/it] 87%|████████▋ | 15062/17285 [134:53:36<19:42:43, 31.92s/it] 87%|████████▋ | 15063/17285 [134:54:05<19:09:44, 31.05s/it] 87%|████████▋ | 15064/17285 [134:54:31<18:18:02, 29.66s/it] 87%|████████▋ | 15065/17285 [134:55:03<18:36:35, 30.18s/it] 87%|████████▋ | 15066/17285 [134:55:32<18:28:04, 29.96s/it] 87%|████████▋ | 15067/17285 [134:56:16<20:57:25, 34.02s/it] 87%|████████▋ | 15068/17285 [134:56:58<22:26:35, 36.44s/it] 87%|████████▋ | 15069/17285 [134:57:38<23:04:27, 37.49s/it] 87%|████████▋ | 15070/17285 [134:58:09<21:54:45, 35.61s/it] {'loss': 1.2941, 'learning_rate': 1.0257087617197447e-05, 'epoch': 2.62} + 87%|████████▋ | 15070/17285 [134:58:09<21:54:45, 35.61s/it] 87%|████████▋ | 15071/17285 [134:58:42<21:30:46, 34.98s/it] 87%|████████▋ | 15072/17285 [134:59:07<19:39:42, 31.98s/it] 87%|████████▋ | 15073/17285 [134:59:38<19:27:14, 31.66s/it] 87%|████████▋ | 15074/17285 [135:00:16<20:37:00, 33.57s/it] 87%|████████▋ | 15075/17285 [135:00:56<21:46:25, 35.47s/it] 87%|████████▋ | 15076/17285 [135:01:25<20:34:13, 33.52s/it] 87%|████████▋ | 15077/17285 [135:01:56<20:06:54, 32.80s/it] 87%|████████▋ | 15078/17285 [135:02:29<20:05:33, 32.77s/it] 87%|████████▋ | 15079/17285 [135:02:58<19:20:22, 31.56s/it] 87%|████████▋ | 15080/17285 [135:03:33<20:03:24, 32.75s/it] {'loss': 1.229, 'learning_rate': 1.017284627261097e-05, 'epoch': 2.62} + 87%|████████▋ | 15080/17285 [135:03:33<20:03:24, 32.75s/it] 87%|████████▋ | 15081/17285 [135:04:07<20:12:55, 33.02s/it] 87%|████████▋ | 15082/17285 [135:04:35<19:17:17, 31.52s/it] 87%|████████▋ | 15083/17285 [135:05:04<18:54:20, 30.91s/it] 87%|████████▋ | 15084/17285 [135:05:29<17:50:17, 29.18s/it] 87%|████████▋ | 15085/17285 [135:05:55<17:11:38, 28.14s/it] 87%|████████▋ | 15086/17285 [135:06:27<17:48:29, 29.15s/it] 87%|████████▋ | 15087/17285 [135:06:53<17:19:06, 28.37s/it] 87%|████████▋ | 15088/17285 [135:07:26<18:06:52, 29.68s/it] 87%|████████▋ | 15089/17285 [135:07:53<17:37:33, 28.90s/it] 87%|████████▋ | 15090/17285 [135:08:23<17:48:19, 29.20s/it] {'loss': 1.3288, 'learning_rate': 1.008893375008475e-05, 'epoch': 2.62} + 87%|████████▋ | 15090/17285 [135:08:23<17:48:19, 29.20s/it] 87%|████████▋ | 15091/17285 [135:08:54<18:07:43, 29.75s/it] 87%|████████▋ | 15092/17285 [135:09:25<18:18:04, 30.04s/it] 87%|████████▋ | 15093/17285 [135:10:07<20:27:34, 33.60s/it] 87%|████████▋ | 15094/17285 [135:10:38<20:07:51, 33.08s/it] 87%|████████▋ | 15095/17285 [135:11:12<20:15:03, 33.29s/it] 87%|████████▋ | 15096/17285 [135:11:42<19:33:06, 32.15s/it] 87%|████████▋ | 15097/17285 [135:12:15<19:49:36, 32.62s/it] 87%|████████▋ | 15098/17285 [135:12:56<21:14:02, 34.95s/it] 87%|████████▋ | 15099/17285 [135:13:27<20:28:18, 33.71s/it] 87%|████████▋ | 15100/17285 [135:14:08<21:55:24, 36.12s/it] {'loss': 1.2818, 'learning_rate': 1.0005350356789733e-05, 'epoch': 2.62} + 87%|████████▋ | 15100/17285 [135:14:08<21:55:24, 36.12s/it] 87%|████████▋ | 15101/17285 [135:14:43<21:40:30, 35.73s/it] 87%|████████▋ | 15102/17285 [135:15:14<20:43:26, 34.18s/it] 87%|████████▋ | 15103/17285 [135:15:46<20:18:36, 33.51s/it] 87%|████████▋ | 15104/17285 [135:16:16<19:37:35, 32.40s/it] 87%|████████▋ | 15105/17285 [135:16:51<20:05:43, 33.19s/it] 87%|████████▋ | 15106/17285 [135:17:25<20:15:06, 33.46s/it] 87%|████████▋ | 15107/17285 [135:18:03<21:12:58, 35.07s/it] 87%|████████▋ | 15108/17285 [135:18:36<20:41:22, 34.21s/it] 87%|████████▋ | 15109/17285 [135:19:25<23:25:04, 38.74s/it] 87%|████████▋ | 15110/17285 [135:20:00<22:47:22, 37.72s/it] {'loss': 1.2817, 'learning_rate': 9.922096398692005e-06, 'epoch': 2.62} + 87%|████████▋ | 15110/17285 [135:20:00<22:47:22, 37.72s/it] 87%|████████▋ | 15111/17285 [135:20:26<20:32:48, 34.02s/it] 87%|████████▋ | 15112/17285 [135:21:00<20:39:27, 34.22s/it] 87%|████████▋ | 15113/17285 [135:21:34<20:34:57, 34.11s/it] 87%|████████▋ | 15114/17285 [135:22:11<21:01:25, 34.86s/it] 87%|████████▋ | 15115/17285 [135:22:38<19:36:19, 32.52s/it] 87%|████████▋ | 15116/17285 [135:23:24<22:01:46, 36.56s/it] 87%|████████▋ | 15117/17285 [135:23:53<20:37:56, 34.26s/it] 87%|████████▋ | 15118/17285 [135:24:22<19:43:00, 32.76s/it] 87%|████████▋ | 15119/17285 [135:24:58<20:21:16, 33.83s/it] 87%|████████▋ | 15120/17285 [135:25:29<19:45:59, 32.87s/it] {'loss': 1.2831, 'learning_rate': 9.839172180551736e-06, 'epoch': 2.62} + 87%|████████▋ | 15120/17285 [135:25:29<19:45:59, 32.87s/it] 87%|████████▋ | 15121/17285 [135:26:00<19:22:12, 32.22s/it] 87%|████████▋ | 15122/17285 [135:26:31<19:06:53, 31.81s/it] 87%|████████▋ | 15123/17285 [135:26:59<18:33:15, 30.90s/it] 87%|████████▋ | 15124/17285 [135:27:31<18:39:15, 31.08s/it] 88%|████████▊ | 15125/17285 [135:28:01<18:30:11, 30.84s/it] 88%|████████▊ | 15126/17285 [135:28:30<18:10:44, 30.31s/it] 88%|████████▊ | 15127/17285 [135:28:59<17:49:53, 29.75s/it] 88%|████████▊ | 15128/17285 [135:29:26<17:25:51, 29.09s/it] 88%|████████▊ | 15129/17285 [135:29:54<17:13:17, 28.76s/it] 88%|████████▊ | 15130/17285 [135:30:30<18:32:22, 30.97s/it] {'loss': 1.2657, 'learning_rate': 9.756578005922001e-06, 'epoch': 2.63} + 88%|████████▊ | 15130/17285 [135:30:30<18:32:22, 30.97s/it] 88%|████████▊ | 15131/17285 [135:31:01<18:24:28, 30.77s/it] 88%|████████▊ | 15132/17285 [135:31:41<20:05:15, 33.59s/it] 88%|████████▊ | 15133/17285 [135:32:10<19:19:31, 32.33s/it] 88%|████████▊ | 15134/17285 [135:32:39<18:44:26, 31.37s/it] 88%|████████▊ | 15135/17285 [135:33:09<18:25:59, 30.86s/it] 88%|████████▊ | 15136/17285 [135:33:43<18:59:09, 31.81s/it] 88%|████████▊ | 15137/17285 [135:34:16<19:08:50, 32.09s/it] 88%|████████▊ | 15138/17285 [135:34:54<20:12:28, 33.88s/it] 88%|████████▊ | 15139/17285 [135:35:32<21:02:48, 35.31s/it] 88%|████████▊ | 15140/17285 [135:36:00<19:43:05, 33.09s/it] {'loss': 1.2788, 'learning_rate': 9.674314177147791e-06, 'epoch': 2.63} + 88%|████████▊ | 15140/17285 [135:36:00<19:43:05, 33.09s/it] 88%|████████▊ | 15141/17285 [135:36:33<19:42:11, 33.08s/it] 88%|████████▊ | 15142/17285 [135:37:07<19:43:35, 33.14s/it] 88%|████████▊ | 15143/17285 [135:37:50<21:36:20, 36.31s/it] 88%|████████▊ | 15144/17285 [135:38:17<19:51:35, 33.39s/it] 88%|████████▊ | 15145/17285 [135:38:53<20:15:58, 34.09s/it] 88%|████████▊ | 15146/17285 [135:39:25<19:53:18, 33.47s/it] 88%|████████▊ | 15147/17285 [135:39:54<19:10:43, 32.29s/it] 88%|████████▊ | 15148/17285 [135:40:25<18:49:49, 31.72s/it] 88%|████████▊ | 15149/17285 [135:40:55<18:37:49, 31.40s/it] 88%|████████▊ | 15150/17285 [135:41:27<18:35:57, 31.36s/it] {'loss': 1.2736, 'learning_rate': 9.592380995364781e-06, 'epoch': 2.63} + 88%|████████▊ | 15150/17285 [135:41:27<18:35:57, 31.36s/it] 88%|████████▊ | 15151/17285 [135:41:56<18:13:59, 30.76s/it] 88%|████████▊ | 15152/17285 [135:42:30<18:44:00, 31.62s/it] 88%|████████▊ | 15153/17285 [135:43:01<18:45:59, 31.69s/it] 88%|████████▊ | 15154/17285 [135:43:28<17:46:26, 30.03s/it] 88%|████████▊ | 15155/17285 [135:44:02<18:35:45, 31.43s/it] 88%|████████▊ | 15156/17285 [135:44:35<18:50:50, 31.87s/it] 88%|████████▊ | 15157/17285 [135:45:12<19:41:50, 33.32s/it] 88%|████████▊ | 15158/17285 [135:45:41<18:51:44, 31.93s/it] 88%|████████▊ | 15159/17285 [135:46:07<17:55:00, 30.34s/it] 88%|████████▊ | 15160/17285 [135:46:38<18:01:11, 30.53s/it] {'loss': 1.262, 'learning_rate': 9.510778760498273e-06, 'epoch': 2.63} + 88%|████████▊ | 15160/17285 [135:46:38<18:01:11, 30.53s/it] 88%|████████▊ | 15161/17285 [135:47:11<18:19:57, 31.07s/it] 88%|████████▊ | 15162/17285 [135:47:42<18:19:54, 31.09s/it] 88%|████████▊ | 15163/17285 [135:48:09<17:41:03, 30.00s/it] 88%|████████▊ | 15164/17285 [135:48:49<19:23:49, 32.92s/it] 88%|████████▊ | 15165/17285 [135:49:20<18:59:28, 32.25s/it] 88%|████████▊ | 15166/17285 [135:49:51<18:53:34, 32.10s/it] 88%|████████▊ | 15167/17285 [135:50:30<19:58:47, 33.96s/it] 88%|████████▊ | 15168/17285 [135:50:59<19:13:09, 32.68s/it] 88%|████████▊ | 15169/17285 [135:51:42<21:01:31, 35.77s/it] 88%|████████▊ | 15170/17285 [135:52:16<20:42:35, 35.25s/it] {'loss': 1.2497, 'learning_rate': 9.429507771262148e-06, 'epoch': 2.63} + 88%|████████▊ | 15170/17285 [135:52:16<20:42:35, 35.25s/it] 88%|████████▊ | 15171/17285 [135:52:43<19:15:02, 32.78s/it] 88%|████████▊ | 15172/17285 [135:53:17<19:25:43, 33.10s/it] 88%|████████▊ | 15173/17285 [135:53:47<18:47:53, 32.04s/it] 88%|████████▊ | 15174/17285 [135:54:15<18:04:15, 30.82s/it] 88%|████████▊ | 15175/17285 [135:54:47<18:19:58, 31.28s/it] 88%|████████▊ | 15176/17285 [135:55:18<18:15:22, 31.16s/it] 88%|████████▊ | 15177/17285 [135:55:56<19:32:48, 33.38s/it] 88%|████████▊ | 15178/17285 [135:56:28<19:10:48, 32.77s/it] 88%|████████▊ | 15179/17285 [135:57:03<19:40:38, 33.64s/it] 88%|████████▊ | 15180/17285 [135:57:35<19:12:42, 32.86s/it] {'loss': 1.2698, 'learning_rate': 9.348568325157681e-06, 'epoch': 2.63} + 88%|████████▊ | 15180/17285 [135:57:35<19:12:42, 32.86s/it] 88%|████████▊ | 15181/17285 [135:58:00<17:54:20, 30.64s/it] 88%|████████▊ | 15182/17285 [135:58:31<17:56:47, 30.72s/it] 88%|████████▊ | 15183/17285 [135:59:06<18:43:25, 32.07s/it] 88%|████████▊ | 15184/17285 [135:59:37<18:27:25, 31.63s/it] 88%|████████▊ | 15185/17285 [136:00:08<18:20:48, 31.45s/it] 88%|████████▊ | 15186/17285 [136:00:34<17:25:31, 29.89s/it] 88%|████████▊ | 15187/17285 [136:01:14<19:11:51, 32.94s/it] 88%|████████▊ | 15188/17285 [136:01:42<18:18:45, 31.44s/it] 88%|████████▊ | 15189/17285 [136:02:09<17:33:09, 30.15s/it] 88%|████████▊ | 15190/17285 [136:02:42<17:59:10, 30.91s/it] {'loss': 1.2894, 'learning_rate': 9.267960718472513e-06, 'epoch': 2.64} + 88%|████████▊ | 15190/17285 [136:02:42<17:59:10, 30.91s/it] 88%|████████▊ | 15191/17285 [136:03:06<16:50:37, 28.96s/it] 88%|████████▊ | 15192/17285 [136:03:35<16:43:39, 28.77s/it] 88%|████████▊ | 15193/17285 [136:04:12<18:11:50, 31.31s/it] 88%|████████▊ | 15194/17285 [136:04:50<19:22:22, 33.35s/it] 88%|████████▊ | 15195/17285 [136:05:21<18:54:58, 32.58s/it] 88%|████████▊ | 15196/17285 [136:05:49<18:06:50, 31.22s/it] 88%|████████▊ | 15197/17285 [136:06:28<19:33:53, 33.73s/it] 88%|████████▊ | 15198/17285 [136:06:59<18:59:25, 32.76s/it] 88%|████████▊ | 15199/17285 [136:07:25<17:54:56, 30.92s/it] 88%|████████▊ | 15200/17285 [136:07:50<16:52:00, 29.12s/it] {'loss': 1.277, 'learning_rate': 9.187685246279565e-06, 'epoch': 2.64} + 88%|████████▊ | 15200/17285 [136:07:50<16:52:00, 29.12s/it] 88%|████████▊ | 15201/17285 [136:08:28<18:23:03, 31.76s/it] 88%|████████▊ | 15202/17285 [136:09:04<19:00:42, 32.86s/it] 88%|████████▊ | 15203/17285 [136:09:33<18:24:15, 31.82s/it] 88%|████████▊ | 15204/17285 [136:10:04<18:14:49, 31.57s/it] 88%|████████▊ | 15205/17285 [136:10:31<17:26:45, 30.19s/it] 88%|████████▊ | 15206/17285 [136:11:10<18:53:08, 32.70s/it] 88%|████████▊ | 15207/17285 [136:11:40<18:29:47, 32.04s/it] 88%|████████▊ | 15208/17285 [136:12:15<18:56:01, 32.82s/it] 88%|████████▊ | 15209/17285 [136:12:46<18:36:39, 32.27s/it] 88%|████████▊ | 15210/17285 [136:13:19<18:42:21, 32.45s/it] {'loss': 1.2803, 'learning_rate': 9.107742202435876e-06, 'epoch': 2.64} + 88%|████████▊ | 15210/17285 [136:13:19<18:42:21, 32.45s/it] 88%|████████▊ | 15211/17285 [136:13:58<19:50:27, 34.44s/it] 88%|████████▊ | 15212/17285 [136:14:30<19:31:57, 33.92s/it] 88%|████████▊ | 15213/17285 [136:15:12<20:55:29, 36.36s/it] 88%|████████▊ | 15214/17285 [136:15:52<21:23:43, 37.19s/it] 88%|████████▊ | 15215/17285 [136:16:27<21:03:05, 36.61s/it] 88%|████████▊ | 15216/17285 [136:16:59<20:11:33, 35.13s/it] 88%|████████▊ | 15217/17285 [136:17:23<18:23:06, 32.01s/it] 88%|████████▊ | 15218/17285 [136:17:55<18:16:01, 31.81s/it] 88%|████████▊ | 15219/17285 [136:18:21<17:22:58, 30.29s/it] 88%|████████▊ | 15220/17285 [136:18:52<17:22:11, 30.28s/it] {'loss': 1.2451, 'learning_rate': 9.028131879581714e-06, 'epoch': 2.64} + 88%|████████▊ | 15220/17285 [136:18:52<17:22:11, 30.28s/it] 88%|████████▊ | 15221/17285 [136:19:29<18:34:11, 32.39s/it] 88%|████████▊ | 15222/17285 [136:20:00<18:19:15, 31.97s/it] 88%|████████▊ | 15223/17285 [136:20:27<17:27:33, 30.48s/it] 88%|████████▊ | 15224/17285 [136:21:01<18:01:26, 31.48s/it] 88%|████████▊ | 15225/17285 [136:21:30<17:34:17, 30.71s/it] 88%|████████▊ | 15226/17285 [136:22:00<17:32:30, 30.67s/it] 88%|████████▊ | 15227/17285 [136:22:38<18:45:38, 32.82s/it] 88%|████████▊ | 15228/17285 [136:23:05<17:41:12, 30.95s/it] 88%|████████▊ | 15229/17285 [136:23:41<18:37:36, 32.62s/it] 88%|████████▊ | 15230/17285 [136:24:10<17:56:36, 31.43s/it] {'loss': 1.241, 'learning_rate': 8.948854569139287e-06, 'epoch': 2.64} + 88%|████████▊ | 15230/17285 [136:24:10<17:56:36, 31.43s/it] 88%|████████▊ | 15231/17285 [136:24:44<18:26:16, 32.32s/it] 88%|████████▊ | 15232/17285 [136:25:14<18:01:58, 31.62s/it] 88%|████████▊ | 15233/17285 [136:25:46<17:59:18, 31.56s/it] 88%|████████▊ | 15234/17285 [136:26:11<16:52:15, 29.61s/it] 88%|████████▊ | 15235/17285 [136:26:43<17:15:21, 30.30s/it] 88%|████████▊ | 15236/17285 [136:27:18<18:11:48, 31.97s/it] 88%|████████▊ | 15237/17285 [136:27:57<19:16:33, 33.88s/it] 88%|██���█████▊ | 15238/17285 [136:28:27<18:42:16, 32.90s/it] 88%|████████▊ | 15239/17285 [136:28:56<17:58:19, 31.62s/it] 88%|████████▊ | 15240/17285 [136:29:35<19:08:56, 33.71s/it] {'loss': 1.2558, 'learning_rate': 8.8699105613118e-06, 'epoch': 2.65} + 88%|████████▊ | 15240/17285 [136:29:35<19:08:56, 33.71s/it] 88%|████████▊ | 15241/17285 [136:30:07<18:50:08, 33.17s/it] 88%|████████▊ | 15242/17285 [136:30:41<19:00:04, 33.48s/it] 88%|████████▊ | 15243/17285 [136:31:07<17:47:17, 31.36s/it][2023-08-28 16:26:15,015] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 88%|████████▊ | 15244/17285 [136:31:37<17:34:32, 31.00s/it] 88%|████████▊ | 15245/17285 [136:32:16<18:54:26, 33.37s/it] 88%|████████▊ | 15246/17285 [136:32:57<20:14:08, 35.73s/it] 88%|████████▊ | 15247/17285 [136:33:26<19:05:23, 33.72s/it] 88%|████████▊ | 15248/17285 [136:33:58<18:38:49, 32.95s/it] 88%|████████▊ | 15249/17285 [136:34:31<18:45:28, 33.17s/it] 88%|████████▊ | 15250/17285 [136:34:57<17:29:24, 30.94s/it] {'loss': 1.2357, 'learning_rate': 8.79914616687264e-06, 'epoch': 2.65} + 88%|████████▊ | 15250/17285 [136:34:57<17:29:24, 30.94s/it] 88%|████████▊ | 15251/17285 [136:35:22<16:27:32, 29.13s/it] 88%|████████▊ | 15252/17285 [136:35:55<17:05:36, 30.27s/it] 88%|████████▊ | 15253/17285 [136:36:30<17:53:39, 31.70s/it] 88%|████████▊ | 15254/17285 [136:37:02<17:52:32, 31.69s/it] 88%|████████▊ | 15255/17285 [136:37:30<17:13:51, 30.56s/it] 88%|████████▊ | 15256/17285 [136:38:02<17:35:25, 31.21s/it] 88%|████████▊ | 15257/17285 [136:38:35<17:46:05, 31.54s/it] 88%|████████▊ | 15258/17285 [136:39:01<16:52:54, 29.98s/it] 88%|████████▊ | 15259/17285 [136:39:27<16:09:24, 28.71s/it] 88%|████████▊ | 15260/17285 [136:39:58<16:38:32, 29.59s/it] {'loss': 1.2819, 'learning_rate': 8.720836229152817e-06, 'epoch': 2.65} + 88%|████████▊ | 15260/17285 [136:39:58<16:38:32, 29.59s/it] 88%|████████▊ | 15261/17285 [136:40:33<17:28:05, 31.07s/it] 88%|████████▊ | 15262/17285 [136:41:01<16:56:31, 30.15s/it] 88%|████████▊ | 15263/17285 [136:41:38<18:05:20, 32.21s/it] 88%|████████▊ | 15264/17285 [136:42:08<17:48:46, 31.73s/it] 88%|████████▊ | 15265/17285 [136:42:41<17:52:09, 31.85s/it] 88%|████████▊ | 15266/17285 [136:43:12<17:46:35, 31.70s/it] 88%|████████▊ | 15267/17285 [136:43:51<19:02:40, 33.97s/it] 88%|████████▊ | 15268/17285 [136:44:22<18:34:39, 33.16s/it] 88%|████████▊ | 15269/17285 [136:44:52<17:53:35, 31.95s/it] 88%|████████▊ | 15270/17285 [136:45:17<16:45:47, 29.95s/it] {'loss': 1.288, 'learning_rate': 8.642860428733857e-06, 'epoch': 2.65} + 88%|████████▊ | 15270/17285 [136:45:17<16:45:47, 29.95s/it] 88%|████████▊ | 15271/17285 [136:45:55<18:08:29, 32.43s/it] 88%|████████▊ | 15272/17285 [136:46:20<16:50:53, 30.13s/it] 88%|████████▊ | 15273/17285 [136:46:51<17:02:43, 30.50s/it] 88%|████████▊ | 15274/17285 [136:47:25<17:33:17, 31.43s/it] 88%|████████▊ | 15275/17285 [136:47:55<17:20:56, 31.07s/it] 88%|████████▊ | 15276/17285 [136:48:34<18:42:18, 33.52s/it] 88%|████████▊ | 15277/17285 [136:49:01<17:37:00, 31.58s/it] 88%|████████▊ | 15278/17285 [136:49:33<17:37:53, 31.63s/it] 88%|████████▊ | 15279/17285 [136:50:02<17:09:37, 30.80s/it] 88%|████████▊ | 15280/17285 [136:50:27<16:15:26, 29.19s/it] {'loss': 1.283, 'learning_rate': 8.565219051054663e-06, 'epoch': 2.65} + 88%|████████▊ | 15280/17285 [136:50:27<16:15:26, 29.19s/it] 88%|████████▊ | 15281/17285 [136:51:06<17:47:51, 31.97s/it] 88%|████████▊ | 15282/17285 [136:51:34<17:09:47, 30.85s/it][2023-08-28 16:46:48,564] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 88%|████████▊ | 15283/17285 [136:52:11<18:09:12, 32.64s/it] 88%|████████▊ | 15284/17285 [136:52:39<17:27:56, 31.42s/it] 88%|���███████▊ | 15285/17285 [136:53:10<17:15:09, 31.05s/it] 88%|████████▊ | 15286/17285 [136:53:43<17:39:24, 31.80s/it] 88%|████████▊ | 15287/17285 [136:54:12<17:10:43, 30.95s/it] 88%|████████▊ | 15288/17285 [136:54:41<16:44:57, 30.19s/it] 88%|████████▊ | 15289/17285 [136:55:13<17:01:54, 30.72s/it] 88%|████████▊ | 15290/17285 [136:55:37<16:01:33, 28.92s/it] {'loss': 1.2858, 'learning_rate': 8.495627977514654e-06, 'epoch': 2.65} + 88%|████████▊ | 15290/17285 [136:55:37<16:01:33, 28.92s/it] 88%|████████▊ | 15291/17285 [136:56:07<16:09:56, 29.19s/it] 88%|████████▊ | 15292/17285 [136:56:39<16:33:41, 29.92s/it] 88%|████████▊ | 15293/17285 [136:57:08<16:27:55, 29.76s/it] 88%|████████▊ | 15294/17285 [136:57:49<18:20:52, 33.18s/it] 88%|████████▊ | 15295/17285 [136:58:16<17:13:43, 31.17s/it] 88%|████████▊ | 15296/17285 [136:58:44<16:40:30, 30.18s/it] 88%|████████▊ | 15297/17285 [136:59:22<18:01:30, 32.64s/it] 89%|████████▊ | 15298/17285 [136:59:52<17:31:12, 31.74s/it] 89%|████████▊ | 15299/17285 [137:00:31<18:47:43, 34.07s/it] 89%|████████▊ | 15300/17285 [137:01:04<18:33:15, 33.65s/it] {'loss': 1.2931, 'learning_rate': 8.41862278503991e-06, 'epoch': 2.66} + 89%|████████▊ | 15300/17285 [137:01:04<18:33:15, 33.65s/it] 89%|████████▊ | 15301/17285 [137:01:33<17:48:49, 32.32s/it] 89%|████████▊ | 15302/17285 [137:02:12<18:55:48, 34.37s/it] 89%|████████▊ | 15303/17285 [137:02:43<18:24:42, 33.44s/it] 89%|████████▊ | 15304/17285 [137:03:09<17:06:12, 31.08s/it] 89%|████████▊ | 15305/17285 [137:03:41<17:10:37, 31.23s/it] 89%|████████▊ | 15306/17285 [137:04:08<16:28:38, 29.97s/it] 89%|████████▊ | 15307/17285 [137:04:40<16:47:44, 30.57s/it] 89%|████████▊ | 15308/17285 [137:05:19<18:18:28, 33.34s/it] 89%|████████▊ | 15309/17285 [137:05:50<17:55:54, 32.67s/it] 89%|████████▊ | 15310/17285 [137:06:25<18:16:44, 33.32s/it] {'loss': 1.2803, 'learning_rate': 8.341952836151169e-06, 'epoch': 2.66} + 89%|████████▊ | 15310/17285 [137:06:25<18:16:44, 33.32s/it] 89%|████████▊ | 15311/17285 [137:07:00<18:25:48, 33.61s/it] 89%|████████▊ | 15312/17285 [137:07:33<18:21:16, 33.49s/it] 89%|████████▊ | 15313/17285 [137:07:58<17:01:24, 31.08s/it] 89%|████████▊ | 15314/17285 [137:08:30<17:09:58, 31.35s/it] 89%|████████▊ | 15315/17285 [137:09:01<17:02:11, 31.13s/it] 89%|████████▊ | 15316/17285 [137:09:34<17:21:08, 31.73s/it] 89%|████████▊ | 15317/17285 [137:10:12<18:22:19, 33.61s/it] 89%|████████▊ | 15318/17285 [137:10:43<17:58:55, 32.91s/it] 89%|████████▊ | 15319/17285 [137:11:25<19:21:36, 35.45s/it] 89%|████████▊ | 15320/17285 [137:11:51<17:50:02, 32.67s/it] {'loss': 1.2773, 'learning_rate': 8.265618411507148e-06, 'epoch': 2.66} + 89%|████████▊ | 15320/17285 [137:11:51<17:50:02, 32.67s/it] 89%|████████▊ | 15321/17285 [137:12:21<17:23:47, 31.89s/it] 89%|████████▊ | 15322/17285 [137:12:52<17:15:54, 31.66s/it] 89%|████████▊ | 15323/17285 [137:13:20<16:43:42, 30.69s/it] 89%|████████▊ | 15324/17285 [137:13:52<16:51:59, 30.96s/it] 89%|████████▊ | 15325/17285 [137:14:26<17:19:21, 31.82s/it] 89%|████████▊ | 15326/17285 [137:15:01<17:55:57, 32.95s/it] 89%|████████▊ | 15327/17285 [137:15:35<18:02:45, 33.18s/it] 89%|████████▊ | 15328/17285 [137:16:13<18:43:49, 34.46s/it] 89%|████████▊ | 15329/17285 [137:16:46<18:31:20, 34.09s/it] 89%|████████▊ | 15330/17285 [137:17:12<17:08:58, 31.58s/it] {'loss': 1.2717, 'learning_rate': 8.189619790538295e-06, 'epoch': 2.66} + 89%|████████▊ | 15330/17285 [137:17:12<17:08:58, 31.58s/it] 89%|████████▊ | 15331/17285 [137:17:45<17:25:22, 32.10s/it] 89%|████████▊ | 15332/17285 [137:18:16<17:16:38, 31.85s/it] 89%|████████▊ | 15333/17285 [137:18:52<17:59:36, 33.18s/it] 89%|████████▊ | 15334/17285 [137:19:20<16:59:12, 31.34s/it] 89%|████████▊ | 15335/17285 [137:19:48<16:35:00, 30.62s/it] 89%|███████��▊ | 15336/17285 [137:20:23<17:11:30, 31.76s/it] 89%|████████▊ | 15337/17285 [137:20:49<16:16:38, 30.08s/it] 89%|████████▊ | 15338/17285 [137:21:26<17:23:29, 32.16s/it] 89%|████████▊ | 15339/17285 [137:21:58<17:24:56, 32.22s/it] 89%|████████▊ | 15340/17285 [137:22:27<16:52:41, 31.24s/it] {'loss': 1.2474, 'learning_rate': 8.113957251445836e-06, 'epoch': 2.66} + 89%|████████▊ | 15340/17285 [137:22:27<16:52:41, 31.24s/it] 89%|████████▉ | 15341/17285 [137:23:04<17:46:08, 32.91s/it] 89%|████████▉ | 15342/17285 [137:23:33<17:07:01, 31.71s/it] 89%|████████▉ | 15343/17285 [137:24:05<17:08:00, 31.76s/it] 89%|████████▉ | 15344/17285 [137:24:49<19:06:32, 35.44s/it] 89%|████████▉ | 15345/17285 [137:25:28<19:42:41, 36.58s/it] 89%|████████▉ | 15346/17285 [137:25:58<18:38:09, 34.60s/it] 89%|████████▉ | 15347/17285 [137:26:31<18:16:40, 33.95s/it] 89%|████████▉ | 15348/17285 [137:27:02<17:52:46, 33.23s/it] 89%|████████▉ | 15349/17285 [137:27:28<16:38:06, 30.93s/it] 89%|████████▉ | 15350/17285 [137:28:01<17:00:14, 31.64s/it] {'loss': 1.2828, 'learning_rate': 8.038631071200698e-06, 'epoch': 2.66} + 89%|████████▉ | 15350/17285 [137:28:01<17:00:14, 31.64s/it] 89%|████████▉ | 15351/17285 [137:28:31<16:45:43, 31.20s/it] 89%|████████▉ | 15352/17285 [137:29:04<17:01:02, 31.69s/it] 89%|████████▉ | 15353/17285 [137:29:37<17:09:59, 31.99s/it] 89%|████████▉ | 15354/17285 [137:30:09<17:08:17, 31.95s/it] 89%|████████▉ | 15355/17285 [137:30:42<17:20:07, 32.34s/it] 89%|████████▉ | 15356/17285 [137:31:11<16:46:16, 31.30s/it] 89%|████████▉ | 15357/17285 [137:31:39<16:15:54, 30.37s/it] 89%|████████▉ | 15358/17285 [137:32:15<17:08:48, 32.03s/it] 89%|████████▉ | 15359/17285 [137:32:44<16:38:28, 31.11s/it] 89%|████████▉ | 15360/17285 [137:33:15<16:36:44, 31.07s/it] {'loss': 1.2829, 'learning_rate': 7.963641525542564e-06, 'epoch': 2.67} + 89%|████████▉ | 15360/17285 [137:33:15<16:36:44, 31.07s/it] 89%|████████▉ | 15361/17285 [137:33:52<17:33:27, 32.85s/it] 89%|████████▉ | 15362/17285 [137:34:36<19:20:10, 36.20s/it] 89%|████████▉ | 15363/17285 [137:35:06<18:24:01, 34.46s/it] 89%|████████▉ | 15364/17285 [137:35:44<18:57:31, 35.53s/it] 89%|████████▉ | 15365/17285 [137:36:13<17:52:50, 33.53s/it] 89%|████████▉ | 15366/17285 [137:36:42<17:11:28, 32.25s/it] 89%|████████▉ | 15367/17285 [137:37:11<16:33:20, 31.07s/it] 89%|████████▉ | 15368/17285 [137:37:43<16:47:46, 31.54s/it] 89%|████████▉ | 15369/17285 [137:38:10<16:00:23, 30.07s/it] 89%|████████▉ | 15370/17285 [137:38:38<15:44:55, 29.61s/it] {'loss': 1.2845, 'learning_rate': 7.888988888978833e-06, 'epoch': 2.67} + 89%|████████▉ | 15370/17285 [137:38:38<15:44:55, 29.61s/it] 89%|████████▉ | 15371/17285 [137:39:06<15:27:19, 29.07s/it] 89%|████████▉ | 15372/17285 [137:39:31<14:49:12, 27.89s/it] 89%|████████▉ | 15373/17285 [137:40:03<15:23:22, 28.98s/it] 89%|████████▉ | 15374/17285 [137:40:34<15:41:01, 29.55s/it] 89%|████████▉ | 15375/17285 [137:41:04<15:47:55, 29.78s/it] 89%|████████▉ | 15376/17285 [137:41:35<15:59:18, 30.15s/it] 89%|████████▉ | 15377/17285 [137:42:02<15:22:55, 29.02s/it] 89%|████████▉ | 15378/17285 [137:42:31<15:29:45, 29.25s/it] 89%|████████▉ | 15379/17285 [137:42:56<14:49:44, 28.01s/it] 89%|████████▉ | 15380/17285 [137:43:26<15:06:04, 28.54s/it] {'loss': 1.2726, 'learning_rate': 7.814673434783604e-06, 'epoch': 2.67} + 89%|████████▉ | 15380/17285 [137:43:26<15:06:04, 28.54s/it] 89%|████████▉ | 15381/17285 [137:44:01<16:07:18, 30.48s/it] 89%|████████▉ | 15382/17285 [137:44:38<17:05:13, 32.32s/it] 89%|████████▉ | 15383/17285 [137:45:19<18:29:06, 34.99s/it] 89%|████████▉ | 15384/17285 [137:45:49<17:39:17, 33.43s/it] 89%|████████▉ | 15385/17285 [137:46:19<17:09:48, 32.52s/it] 89%|████████▉ | 15386/17285 [137:46:51<17:04:37, 32.37s/it] 89%|████████▉ | 15387/17285 [137:47:19<16:24:33, 31.12s/it] 89%|████████▉ | 15388/17285 [137:47:51<16:32:36, 31.40s/it] 89%|████████▉ | 15389/17285 [137:48:22<16:26:05, 31.21s/it] 89%|████████▉ | 15390/17285 [137:48:54<16:34:30, 31.49s/it] {'loss': 1.2498, 'learning_rate': 7.740695434996626e-06, 'epoch': 2.67} + 89%|████████▉ | 15390/17285 [137:48:54<16:34:30, 31.49s/it] 89%|████████▉ | 15391/17285 [137:49:21<15:45:05, 29.94s/it] 89%|████████▉ | 15392/17285 [137:49:50<15:39:32, 29.78s/it] 89%|████████▉ | 15393/17285 [137:50:16<15:06:07, 28.74s/it] 89%|████████▉ | 15394/17285 [137:50:44<14:55:03, 28.40s/it] 89%|████████▉ | 15395/17285 [137:51:11<14:44:37, 28.08s/it] 89%|████████▉ | 15396/17285 [137:51:42<15:05:03, 28.75s/it] 89%|████████▉ | 15397/17285 [137:52:12<15:19:40, 29.23s/it] 89%|████████▉ | 15398/17285 [137:52:52<17:01:47, 32.49s/it] 89%|████████▉ | 15399/17285 [137:53:32<18:11:26, 34.72s/it] 89%|████████▉ | 15400/17285 [137:54:01<17:13:05, 32.88s/it] {'loss': 1.2746, 'learning_rate': 7.667055160422431e-06, 'epoch': 2.67} + 89%|████████▉ | 15400/17285 [137:54:01<17:13:05, 32.88s/it] 89%|████████▉ | 15401/17285 [137:54:35<17:28:06, 33.38s/it] 89%|████████▉ | 15402/17285 [137:55:06<16:59:31, 32.49s/it] 89%|████████▉ | 15403/17285 [137:55:41<17:26:21, 33.36s/it] 89%|████████▉ | 15404/17285 [137:56:18<18:03:49, 34.57s/it] 89%|████████▉ | 15405/17285 [137:56:50<17:34:08, 33.64s/it] 89%|████████▉ | 15406/17285 [137:57:16<16:25:43, 31.48s/it] 89%|████████▉ | 15407/17285 [137:57:44<15:51:10, 30.39s/it] 89%|████████▉ | 15408/17285 [137:58:14<15:47:51, 30.30s/it] 89%|████████▉ | 15409/17285 [137:58:47<16:07:10, 30.93s/it] 89%|████████▉ | 15410/17285 [137:59:12<15:15:14, 29.29s/it] {'loss': 1.271, 'learning_rate': 7.593752880629257e-06, 'epoch': 2.67} + 89%|████████▉ | 15410/17285 [137:59:12<15:15:14, 29.29s/it] 89%|████████▉ | 15411/17285 [137:59:45<15:44:39, 30.25s/it] 89%|████████▉ | 15412/17285 [138:00:16<15:56:52, 30.65s/it] 89%|████████▉ | 15413/17285 [138:00:46<15:46:13, 30.33s/it] 89%|████████▉ | 15414/17285 [138:01:17<15:52:12, 30.54s/it] 89%|████████▉ | 15415/17285 [138:01:49<16:03:52, 30.93s/it] 89%|████████▉ | 15416/17285 [138:02:16<15:30:10, 29.86s/it] 89%|████████▉ | 15417/17285 [138:02:45<15:23:54, 29.68s/it] 89%|████████▉ | 15418/17285 [138:03:24<16:47:31, 32.38s/it] 89%|████████▉ | 15419/17285 [138:03:51<15:54:22, 30.69s/it] 89%|████████▉ | 15420/17285 [138:04:23<16:08:01, 31.14s/it] {'loss': 1.256, 'learning_rate': 7.52078886394807e-06, 'epoch': 2.68} + 89%|████████▉ | 15420/17285 [138:04:23<16:08:01, 31.14s/it] 89%|████████▉ | 15421/17285 [138:04:56<16:30:29, 31.88s/it] 89%|████████▉ | 15422/17285 [138:05:24<15:49:10, 30.57s/it] 89%|████████▉ | 15423/17285 [138:05:51<15:20:05, 29.65s/it] 89%|████████▉ | 15424/17285 [138:06:26<16:03:39, 31.07s/it] 89%|████████▉ | 15425/17285 [138:06:54<15:37:40, 30.25s/it] 89%|████████▉ | 15426/17285 [138:07:22<15:17:22, 29.61s/it] 89%|████████▉ | 15427/17285 [138:08:01<16:43:39, 32.41s/it] 89%|████████▉ | 15428/17285 [138:08:27<15:37:31, 30.29s/it] 89%|████████▉ | 15429/17285 [138:09:02<16:22:22, 31.76s/it] 89%|████████▉ | 15430/17285 [138:09:42<17:37:42, 34.21s/it] {'loss': 1.2778, 'learning_rate': 7.448163377471562e-06, 'epoch': 2.68} + 89%|████████▉ | 15430/17285 [138:09:42<17:37:42, 34.21s/it] 89%|████████▉ | 15431/17285 [138:10:15<17:27:42, 33.91s/it] 89%|████████▉ | 15432/17285 [138:10:42<16:28:21, 32.00s/it] 89%|████████▉ | 15433/17285 [138:11:11<15:55:55, 30.97s/it] 89%|████████▉ | 15434/17285 [138:11:44<16:15:24, 31.62s/it] 89%|████████▉ | 15435/17285 [138:12:15<16:03:33, 31.25s/it] 89%|████████▉ | 15436/17285 [138:12:38<14:51:57, 28.94s/it] 89%|████████▉ | 15437/17285 [138:13:10<15:15:24, 29.72s/it] 89%|████████▉ | 15438/17285 [138:13:46<16:16:22, 31.72s/it] 89%|████████▉ | 15439/17285 [138:14:13<15:33:39, 30.35s/it] 89%|████████▉ | 15440/17285 [138:14:52<16:50:54, 32.87s/it] {'loss': 1.2898, 'learning_rate': 7.375876687053251e-06, 'epoch': 2.68} + 89%|████████▉ | 15440/17285 [138:14:52<16:50:54, 32.87s/it] 89%|████████▉ | 15441/17285 [138:15:23<16:35:32, 32.39s/it] 89%|████████▉ | 15442/17285 [138:15:48<15:25:41, 30.14s/it] 89%|████████▉ | 15443/17285 [138:16:31<17:26:36, 34.09s/it] 89%|████████▉ | 15444/17285 [138:17:06<17:32:31, 34.30s/it] 89%|████████▉ | 15445/17285 [138:17:39<17:14:57, 33.75s/it] 89%|████████▉ | 15446/17285 [138:18:05<16:02:50, 31.41s/it] 89%|████████▉ | 15447/17285 [138:18:36<15:58:59, 31.31s/it] 89%|████████▉ | 15448/17285 [138:19:08<16:04:49, 31.51s/it] 89%|████████▉ | 15449/17285 [138:19:39<16:01:38, 31.43s/it] 89%|████████▉ | 15450/17285 [138:20:15<16:40:31, 32.71s/it] {'loss': 1.2512, 'learning_rate': 7.303929057306414e-06, 'epoch': 2.68} + 89%|████████▉ | 15450/17285 [138:20:15<16:40:31, 32.71s/it] 89%|████████▉ | 15451/17285 [138:20:55<17:52:54, 35.10s/it] 89%|████████▉ | 15452/17285 [138:21:26<17:16:27, 33.93s/it] 89%|████████▉ | 15453/17285 [138:21:57<16:48:16, 33.02s/it] 89%|████████▉ | 15454/17285 [138:22:29<16:33:43, 32.56s/it] 89%|████████▉ | 15455/17285 [138:23:01<16:25:46, 32.32s/it] 89%|████████▉ | 15456/17285 [138:23:36<16:51:04, 33.17s/it] 89%|████████▉ | 15457/17285 [138:24:06<16:25:21, 32.34s/it] 89%|████████▉ | 15458/17285 [138:24:35<15:56:56, 31.43s/it] 89%|████████▉ | 15459/17285 [138:25:12<16:45:37, 33.04s/it] 89%|████████▉ | 15460/17285 [138:25:41<16:01:33, 31.61s/it] {'loss': 1.2936, 'learning_rate': 7.23232075160315e-06, 'epoch': 2.68} + 89%|████████▉ | 15460/17285 [138:25:41<16:01:33, 31.61s/it] 89%|████████▉ | 15461/17285 [138:26:10<15:44:51, 31.08s/it] 89%|████████▉ | 15462/17285 [138:26:41<15:40:36, 30.96s/it] 89%|████████▉ | 15463/17285 [138:27:14<15:57:16, 31.52s/it] 89%|████████▉ | 15464/17285 [138:27:42<15:23:30, 30.43s/it] 89%|████████▉ | 15465/17285 [138:28:18<16:17:18, 32.22s/it] 89%|████████▉ | 15466/17285 [138:28:54<16:51:02, 33.35s/it] 89%|████████▉ | 15467/17285 [138:29:18<15:23:15, 30.47s/it] 89%|████████▉ | 15468/17285 [138:29:45<14:55:43, 29.58s/it] 89%|████████▉ | 15469/17285 [138:30:13<14:34:14, 28.88s/it] 89%|████████▉ | 15470/17285 [138:30:42<14:39:58, 29.09s/it] {'loss': 1.2946, 'learning_rate': 7.161052032073445e-06, 'epoch': 2.68} + 89%|████████▉ | 15470/17285 [138:30:42<14:39:58, 29.09s/it] 90%|████████▉ | 15471/17285 [138:31:08<14:08:49, 28.08s/it] 90%|████████▉ | 15472/17285 [138:31:41<14:57:31, 29.70s/it] 90%|████████▉ | 15473/17285 [138:32:18<16:02:15, 31.86s/it] 90%|████████▉ | 15474/17285 [138:32:45<15:11:43, 30.21s/it] 90%|████████▉ | 15475/17285 [138:33:19<15:51:56, 31.56s/it] 90%|████████▉ | 15476/17285 [138:33:54<16:16:54, 32.40s/it] 90%|████████▉ | 15477/17285 [138:34:25<16:05:29, 32.04s/it] 90%|████████▉ | 15478/17285 [138:34:56<15:51:05, 31.58s/it] 90%|████████▉ | 15479/17285 [138:35:32<16:35:07, 33.06s/it][2023-08-28 18:30:40,896] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 90%|████████▉ | 15480/17285 [138:36:03<16:17:43, 32.50s/it] {'loss': 1.2764, 'learning_rate': 7.097200746323862e-06, 'epoch': 2.69} + 90%|████████▉ | 15480/17285 [138:36:03<16:17:43, 32.50s/it] 90%|████████▉ | 15481/17285 [138:36:30<15:22:09, 30.67s/it] 90%|████████▉ | 15482/17285 [138:37:06<16:08:49, 32.24s/it] 90%|████████▉ | 15483/17285 [138:37:38<16:08:52, 32.26s/it] 90%|████████▉ | 15484/17285 [138:38:13<16:37:12, 33.22s/it] 90%|████████▉ | 15485/17285 [138:38:46<16:35:08, 33.17s/it] 90%|████████▉ | 15486/17285 [138:39:14<15:49:11, 31.66s/it] 90%|████████▉ | 15487/17285 [138:39:48<16:04:45, 32.19s/it] 90%|████████▉ | 15488/17285 [138:40:19<15:50:26, 31.73s/it] 90%|████████▉ | 15489/17285 [138:40:49<15:41:17, 31.45s/it] 90%|████████▉ | 15490/17285 [138:41:21<15:39:01, 31.39s/it] {'loss': 1.2406, 'learning_rate': 7.026577958239167e-06, 'epoch': 2.69} + 90%|████████▉ | 15490/17285 [138:41:21<15:39:01, 31.39s/it] 90%|████████▉ | 15491/17285 [138:41:50<15:17:45, 30.69s/it] 90%|████████▉ | 15492/17285 [138:42:26<16:07:59, 32.39s/it] 90%|████████▉ | 15493/17285 [138:43:02<16:41:57, 33.55s/it] 90%|████████▉ | 15494/17285 [138:43:33<16:13:44, 32.62s/it] 90%|████████▉ | 15495/17285 [138:44:04<16:02:55, 32.28s/it] 90%|████████▉ | 15496/17285 [138:44:39<16:28:28, 33.15s/it] 90%|████████▉ | 15497/17285 [138:45:13<16:36:08, 33.43s/it] 90%|████████▉ | 15498/17285 [138:45:49<16:55:00, 34.08s/it] 90%|████████▉ | 15499/17285 [138:46:23<16:51:01, 33.97s/it] 90%|████████▉ | 15500/17285 [138:46:54<16:21:55, 33.01s/it] {'loss': 1.2662, 'learning_rate': 6.956295509471921e-06, 'epoch': 2.69} + 90%|████████▉ | 15500/17285 [138:46:54<16:21:55, 33.01s/it] 90%|████████▉ | 15501/17285 [138:47:27<16:28:46, 33.26s/it] 90%|████████▉ | 15502/17285 [138:48:01<16:27:29, 33.23s/it] 90%|████████▉ | 15503/17285 [138:48:30<15:54:37, 32.14s/it] 90%|████████▉ | 15504/17285 [138:49:01<15:43:42, 31.79s/it] 90%|████████▉ | 15505/17285 [138:49:35<16:02:53, 32.46s/it] 90%|████████▉ | 15506/17285 [138:50:04<15:27:41, 31.29s/it] 90%|████████▉ | 15507/17285 [138:50:30<14:45:11, 29.87s/it] 90%|████████▉ | 15508/17285 [138:50:59<14:35:27, 29.56s/it] 90%|████████▉ | 15509/17285 [138:51:25<13:58:19, 28.32s/it] 90%|████████▉ | 15510/17285 [138:51:54<14:07:48, 28.66s/it] {'loss': 1.244, 'learning_rate': 6.88635365729865e-06, 'epoch': 2.69} + 90%|████████▉ | 15510/17285 [138:51:54<14:07:48, 28.66s/it] 90%|████████▉ | 15511/17285 [138:52:25<14:26:12, 29.30s/it] 90%|████████▉ | 15512/17285 [138:52:58<14:56:39, 30.34s/it] 90%|████████▉ | 15513/17285 [138:53:34<15:54:16, 32.31s/it] 90%|████████▉ | 15514/17285 [138:54:02<15:14:17, 30.98s/it] 90%|████████▉ | 15515/17285 [138:54:32<15:02:34, 30.60s/it] 90%|████████▉ | 15516/17285 [138:55:08<15:51:21, 32.27s/it] 90%|████████▉ | 15517/17285 [138:55:40<15:47:18, 32.15s/it] 90%|████████▉ | 15518/17285 [138:56:10<15:29:46, 31.57s/it] 90%|████████▉ | 15519/17285 [138:56:42<15:27:50, 31.52s/it] 90%|████████▉ | 15520/17285 [138:57:17<15:58:59, 32.60s/it] {'loss': 1.2808, 'learning_rate': 6.8167526577491034e-06, 'epoch': 2.69} + 90%|████████▉ | 15520/17285 [138:57:17<15:58:59, 32.60s/it] 90%|████████▉ | 15521/17285 [138:57:44<15:13:31, 31.07s/it] 90%|████████▉ | 15522/17285 [138:58:12<14:44:43, 30.11s/it] 90%|████████▉ | 15523/17285 [138:58:45<15:07:09, 30.89s/it] 90%|████████▉ | 15524/17285 [138:59:12<14:30:13, 29.65s/it] 90%|████████▉ | 15525/17285 [138:59:47<15:18:26, 31.31s/it] 90%|████████▉ | 15526/17285 [139:00:18<15:20:22, 31.39s/it] 90%|████████▉ | 15527/17285 [139:00:47<14:52:20, 30.46s/it] 90%|████████▉ | 15528/17285 [139:01:14<14:20:28, 29.38s/it] 90%|████████▉ | 15529/17285 [139:01:43<14:23:33, 29.51s/it] 90%|████████▉ | 15530/17285 [139:02:18<15:04:13, 30.91s/it] {'loss': 1.3011, 'learning_rate': 6.747492765605312e-06, 'epoch': 2.7} + 90%|████████▉ | 15530/17285 [139:02:18<15:04:13, 30.91s/it] 90%|████████▉ | 15531/17285 [139:02:50<15:19:40, 31.46s/it] 90%|████████▉ | 15532/17285 [139:03:17<14:38:19, 30.06s/it] 90%|████████▉ | 15533/17285 [139:03:51<15:13:04, 31.27s/it] 90%|████████▉ | 15534/17285 [139:04:26<15:46:59, 32.45s/it] 90%|████████▉ | 15535/17285 [139:04:59<15:51:34, 32.63s/it] 90%|████████▉ | 15536/17285 [139:05:33<15:59:52, 32.93s/it] 90%|████████▉ | 15537/17285 [139:06:08<16:21:12, 33.68s/it] 90%|████████▉ | 15538/17285 [139:06:45<16:42:49, 34.44s/it] 90%|█���██████▉ | 15539/17285 [139:07:22<17:03:45, 35.18s/it] 90%|████████▉ | 15540/17285 [139:07:51<16:15:59, 33.56s/it] {'loss': 1.2447, 'learning_rate': 6.678574234400659e-06, 'epoch': 2.7} + 90%|████████▉ | 15540/17285 [139:07:51<16:15:59, 33.56s/it] 90%|████████▉ | 15541/17285 [139:08:25<16:20:13, 33.72s/it] 90%|████████▉ | 15542/17285 [139:09:00<16:28:34, 34.03s/it] 90%|████████▉ | 15543/17285 [139:09:34<16:22:39, 33.85s/it] 90%|████████▉ | 15544/17285 [139:10:11<16:53:09, 34.92s/it] 90%|████████▉ | 15545/17285 [139:10:42<16:16:52, 33.69s/it] 90%|████████▉ | 15546/17285 [139:11:12<15:48:32, 32.73s/it] 90%|████████▉ | 15547/17285 [139:11:42<15:19:09, 31.73s/it] 90%|████████▉ | 15548/17285 [139:12:14<15:20:30, 31.80s/it] 90%|████████▉ | 15549/17285 [139:12:47<15:30:49, 32.17s/it] 90%|████████▉ | 15550/17285 [139:13:29<16:55:10, 35.11s/it] {'loss': 1.2038, 'learning_rate': 6.60999731641887e-06, 'epoch': 2.7} + 90%|████████▉ | 15550/17285 [139:13:29<16:55:10, 35.11s/it] 90%|████████▉ | 15551/17285 [139:13:58<16:02:43, 33.31s/it] 90%|████████▉ | 15552/17285 [139:14:33<16:16:01, 33.79s/it] 90%|████████▉ | 15553/17285 [139:15:00<15:18:23, 31.81s/it] 90%|████████▉ | 15554/17285 [139:15:26<14:26:11, 30.02s/it] 90%|████████▉ | 15555/17285 [139:15:55<14:14:27, 29.63s/it] 90%|████████▉ | 15556/17285 [139:16:36<15:58:17, 33.25s/it][2023-08-28 19:11:51,566] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 90%|█████████ | 15557/17285 [139:17:14<16:35:37, 34.57s/it] 90%|█████████ | 15558/17285 [139:17:45<16:06:43, 33.59s/it] 90%|█████████ | 15559/17285 [139:18:16<15:40:17, 32.69s/it] 90%|█████████ | 15560/17285 [139:18:50<15:53:17, 33.16s/it] {'loss': 1.3007, 'learning_rate': 6.548570377045693e-06, 'epoch': 2.7} + 90%|█████████ | 15560/17285 [139:18:50<15:53:17, 33.16s/it] 90%|█████████ | 15561/17285 [139:19:24<16:00:59, 33.45s/it] 90%|█████████ | 15562/17285 [139:19:57<15:57:58, 33.36s/it] 90%|█████████ | 15563/17285 [139:20:22<14:39:21, 30.64s/it] 90%|█████████ | 15564/17285 [139:21:01<15:50:50, 33.15s/it] 90%|█████████ | 15565/17285 [139:21:32<15:36:42, 32.68s/it] 90%|█████████ | 15566/17285 [139:22:12<16:37:03, 34.80s/it] 90%|█████████ | 15567/17285 [139:22:38<15:19:21, 32.11s/it] 90%|█████████ | 15568/17285 [139:23:08<15:01:26, 31.50s/it] 90%|█████████ | 15569/17285 [139:23:41<15:16:21, 32.04s/it] 90%|█████████ | 15570/17285 [139:24:08<14:32:19, 30.52s/it] {'loss': 1.2823, 'learning_rate': 6.480643214749759e-06, 'epoch': 2.7} + 90%|█████████ | 15570/17285 [139:24:08<14:32:19, 30.52s/it] 90%|█████████ | 15571/17285 [139:24:46<15:31:19, 32.60s/it] 90%|█████████ | 15572/17285 [139:25:18<15:28:29, 32.52s/it] 90%|█████████ | 15573/17285 [139:25:46<14:53:43, 31.32s/it] 90%|█████████ | 15574/17285 [139:26:20<15:09:07, 31.88s/it] 90%|█████████ | 15575/17285 [139:27:02<16:35:27, 34.93s/it] 90%|█████████ | 15576/17285 [139:27:35<16:20:12, 34.41s/it] 90%|█████████ | 15577/17285 [139:28:13<16:51:27, 35.53s/it] 90%|█████████ | 15578/17285 [139:28:54<17:38:53, 37.22s/it] 90%|█████████ | 15579/17285 [139:29:29<17:15:18, 36.41s/it] 90%|█████████ | 15580/17285 [139:30:01<16:35:42, 35.04s/it] {'loss': 1.2388, 'learning_rate': 6.413058390224724e-06, 'epoch': 2.7} + 90%|█████████ | 15580/17285 [139:30:01<16:35:42, 35.04s/it] 90%|█████████ | 15581/17285 [139:30:28<15:32:44, 32.84s/it] 90%|█████████ | 15582/17285 [139:31:03<15:52:46, 33.57s/it] 90%|█████████ | 15583/17285 [139:31:36<15:44:03, 33.28s/it] 90%|█████████ | 15584/17285 [139:32:08<15:35:50, 33.01s/it] 90%|█████████ | 15585/17285 [139:32:43<15:52:08, 33.61s/it] 90%|█████████ | 15586/17285 [139:33:18<15:57:35, 33.82s/it] 90%|█████████ | 15587/17285 [139:33:44<14:51:24, 31.50s/it] 90%|█████████ | 15588/17285 [139:34:11<14:10:57, 30.09s/it] 90%|█████████ | 15589/17285 [139:34:46<14:51:02, 31.52s/it] 90%|█████████ | 15590/17285 [139:35:11<13:59:30, 29.72s/it] {'loss': 1.2874, 'learning_rate': 6.345816150872197e-06, 'epoch': 2.71} + 90%|█████████ | 15590/17285 [139:35:11<13:59:30, 29.72s/it] 90%|█████████ | 15591/17285 [139:35:38<13:35:04, 28.87s/it] 90%|█████████ | 15592/17285 [139:36:12<14:14:56, 30.30s/it] 90%|█████████ | 15593/17285 [139:36:44<14:28:58, 30.81s/it] 90%|█████████ | 15594/17285 [139:37:12<14:11:48, 30.22s/it] 90%|█████████ | 15595/17285 [139:37:42<14:01:45, 29.88s/it] 90%|█████████ | 15596/17285 [139:38:14<14:25:36, 30.75s/it] 90%|█████████ | 15597/17285 [139:38:48<14:51:29, 31.69s/it] 90%|█████████ | 15598/17285 [139:39:16<14:17:52, 30.51s/it] 90%|█████████ | 15599/17285 [139:39:54<15:17:19, 32.65s/it] 90%|█████████ | 15600/17285 [139:40:34<16:23:03, 35.01s/it] {'loss': 1.2493, 'learning_rate': 6.278916742839691e-06, 'epoch': 2.71} + 90%|█████████ | 15600/17285 [139:40:34<16:23:03, 35.01s/it] 90%|█████████ | 15601/17285 [139:41:04<15:43:15, 33.61s/it] 90%|█████████ | 15602/17285 [139:41:33<15:03:07, 32.20s/it] 90%|█████████ | 15603/17285 [139:42:06<15:06:57, 32.35s/it] 90%|█████████ | 15604/17285 [139:42:30<13:57:53, 29.91s/it] 90%|█████████ | 15605/17285 [139:43:03<14:17:52, 30.64s/it] 90%|█████████ | 15606/17285 [139:43:28<13:33:09, 29.06s/it] 90%|█████████ | 15607/17285 [139:44:01<14:04:17, 30.19s/it] 90%|█████████ | 15608/17285 [139:44:31<14:01:53, 30.12s/it] 90%|█████████ | 15609/17285 [139:45:02<14:13:45, 30.56s/it] 90%|█████████ | 15610/17285 [139:45:33<14:14:39, 30.61s/it] {'loss': 1.282, 'learning_rate': 6.2123604110197686e-06, 'epoch': 2.71} + 90%|█████████ | 15610/17285 [139:45:33<14:14:39, 30.61s/it] 90%|█████████ | 15611/17285 [139:46:04<14:16:35, 30.70s/it] 90%|█████████ | 15612/17285 [139:46:32<13:51:43, 29.83s/it] 90%|█████████ | 15613/17285 [139:47:06<14:28:30, 31.17s/it] 90%|█████████ | 15614/17285 [139:47:44<15:20:46, 33.06s/it] 90%|█████████ | 15615/17285 [139:48:13<14:53:55, 32.12s/it] 90%|█████████ | 15616/17285 [139:48:47<15:06:21, 32.58s/it] 90%|█████████ | 15617/17285 [139:49:19<15:00:43, 32.40s/it] 90%|█████████ | 15618/17285 [139:49:56<15:35:40, 33.68s/it] 90%|█████████ | 15619/17285 [139:50:33<16:07:04, 34.83s/it] 90%|█████████ | 15620/17285 [139:51:07<15:59:17, 34.57s/it] {'loss': 1.2575, 'learning_rate': 6.146147399049107e-06, 'epoch': 2.71} + 90%|█████████ | 15620/17285 [139:51:07<15:59:17, 34.57s/it] 90%|█████████ | 15621/17285 [139:51:33<14:48:12, 32.03s/it] 90%|█████████ | 15622/17285 [139:52:06<14:53:22, 32.23s/it] 90%|█████████ | 15623/17285 [139:52:38<14:48:10, 32.06s/it] 90%|█████████ | 15624/17285 [139:53:03<13:48:34, 29.93s/it] 90%|█████████ | 15625/17285 [139:53:29<13:19:46, 28.91s/it] 90%|█████████ | 15626/17285 [139:54:02<13:52:10, 30.10s/it] 90%|█████████ | 15627/17285 [139:54:37<14:30:02, 31.49s/it] 90%|█████████ | 15628/17285 [139:55:11<14:54:38, 32.40s/it] 90%|█████████ | 15629/17285 [139:55:37<14:01:13, 30.48s/it] 90%|█████████ | 15630/17285 [139:56:07<13:53:08, 30.20s/it] {'loss': 1.286, 'learning_rate': 6.0802779493076665e-06, 'epoch': 2.71} + 90%|█████████ | 15630/17285 [139:56:07<13:53:08, 30.20s/it] 90%|█████████ | 15631/17285 [139:56:38<13:56:13, 30.33s/it] 90%|█████████ | 15632/17285 [139:57:08<13:59:36, 30.48s/it] 90%|█████████ | 15633/17285 [139:57:33<13:15:12, 28.88s/it] 90%|█████████ | 15634/17285 [139:58:03<13:21:11, 29.12s/it] 90%|█████████ | 15635/17285 [139:58:29<12:57:42, 28.28s/it] 90%|█████████ | 15636/17285 [139:58:57<12:50:05, 28.02s/it] 90%|█████████ | 15637/17285 [139:59:27<13:09:06, 28.73s/it] 90%|█████████ | 15638/17285 [140:00:04<14:18:05, 31.26s/it] 90%|█████████ | 15639/17285 [140:00:40<14:51:52, 32.51s/it] 90%|█████████ | 15640/17285 [140:01:13<14:57:16, 32.73s/it] {'loss': 1.281, 'learning_rate': 6.014752302917681e-06, 'epoch': 2.71} + 90%|█████████ | 15640/17285 [140:01:13<14:57:16, 32.73s/it] 90%|█████████ | 15641/17285 [140:01:42<14:22:02, 31.46s/it] 90%|█████████ | 15642/17285 [140:02:08<13:37:38, 29.86s/it] 91%|█████████ | 15643/17285 [140:02:33<13:02:47, 28.60s/it] 91%|█████████ | 15644/17285 [140:03:06<13:31:44, 29.68s/it] 91%|█████████ | 15645/17285 [140:03:37<13:41:28, 30.05s/it] 91%|█████████ | 15646/17285 [140:04:09<14:01:41, 30.81s/it] 91%|█████████ | 15647/17285 [140:04:39<13:51:41, 30.46s/it] 91%|█████████ | 15648/17285 [140:05:09<13:46:26, 30.29s/it] 91%|█████████ | 15649/17285 [140:05:39<13:49:46, 30.43s/it] 91%|█████████ | 15650/17285 [140:06:16<14:41:06, 32.33s/it] {'loss': 1.2855, 'learning_rate': 5.949570699742935e-06, 'epoch': 2.72} + 91%|█████████ | 15650/17285 [140:06:16<14:41:06, 32.33s/it] 91%|█████████ | 15651/17285 [140:06:52<15:05:03, 33.23s/it] 91%|█████████ | 15652/17285 [140:07:28<15:28:22, 34.11s/it] 91%|█████████ | 15653/17285 [140:08:03<15:37:39, 34.47s/it] 91%|█████████ | 15654/17285 [140:08:33<15:03:34, 33.24s/it] 91%|█████████ | 15655/17285 [140:09:14<16:06:11, 35.57s/it] 91%|█████████ | 15656/17285 [140:09:44<15:18:36, 33.83s/it] 91%|█████████ | 15657/17285 [140:10:21<15:46:43, 34.89s/it] 91%|█████████ | 15658/17285 [140:10:55<15:34:43, 34.47s/it] 91%|█████████ | 15659/17285 [140:11:25<14:57:44, 33.13s/it] 91%|█████████ | 15660/17285 [140:12:08<16:19:07, 36.15s/it] {'loss': 1.2316, 'learning_rate': 5.8847333783877635e-06, 'epoch': 2.72} + 91%|█████████ | 15660/17285 [140:12:08<16:19:07, 36.15s/it] 91%|█████████ | 15661/17285 [140:12:44<16:19:08, 36.17s/it] 91%|█████████ | 15662/17285 [140:13:26<17:02:56, 37.82s/it] 91%|█████████ | 15663/17285 [140:14:05<17:07:20, 38.00s/it] 91%|█████████ | 15664/17285 [140:14:30<15:24:35, 34.22s/it] 91%|█████████ | 15665/17285 [140:15:04<15:25:02, 34.26s/it] 91%|█████████ | 15666/17285 [140:15:41<15:46:11, 35.07s/it] 91%|█████████ | 15667/17285 [140:16:14<15:24:14, 34.27s/it] 91%|█████████ | 15668/17285 [140:16:47<15:14:34, 33.94s/it] 91%|█████████ | 15669/17285 [140:17:13<14:12:09, 31.64s/it] 91%|█████████ | 15670/17285 [140:17:44<14:04:41, 31.38s/it] {'loss': 1.2645, 'learning_rate': 5.820240576196223e-06, 'epoch': 2.72} + 91%|█████████ | 15670/17285 [140:17:44<14:04:41, 31.38s/it] 91%|█████████ | 15671/17285 [140:18:18<14:24:39, 32.14s/it] 91%|█████████ | 15672/17285 [140:18:49<14:15:57, 31.84s/it] 91%|█████████ | 15673/17285 [140:19:18<13:52:43, 30.99s/it] 91%|█████████ | 15674/17285 [140:19:52<14:17:04, 31.92s/it] 91%|█████████ | 15675/17285 [140:20:34<15:34:40, 34.83s/it] 91%|█████████ | 15676/17285 [140:21:06<15:15:12, 34.13s/it] 91%|█████████ | 15677/17285 [140:21:47<16:05:16, 36.02s/it] 91%|█████████ | 15678/17285 [140:22:23<16:07:16, 36.12s/it] 91%|█████████ | 15679/17285 [140:22:49<14:46:14, 33.11s/it] 91%|█████████ | 15680/17285 [140:23:24<14:57:11, 33.54s/it] {'loss': 1.2897, 'learning_rate': 5.7560925292512335e-06, 'epoch': 2.72} + 91%|█████████ | 15680/17285 [140:23:24<14:57:11, 33.54s/it] 91%|█████████ | 15681/17285 [140:24:03<15:42:39, 35.26s/it] 91%|█████████ | 15682/17285 [140:24:36<15:24:35, 34.61s/it] 91%|█████████ | 15683/17285 [140:25:07<14:59:51, 33.70s/it] 91%|█████████ | 15684/17285 [140:25:38<14:30:20, 32.62s/it] 91%|█████████ | 15685/17285 [140:26:06<13:59:08, 31.47s/it] 91%|█████████ | 15686/17285 [140:26:37<13:53:49, 31.29s/it] 91%|█████████ | 15687/17285 [140:27:12<14:22:01, 32.37s/it] 91%|█████████ | 15688/17285 [140:27:49<14:56:58, 33.70s/it] 91%|█████████ | 15689/17285 [140:28:23<14:58:36, 33.78s/it] 91%|█████████ | 15690/17285 [140:28:54<14:39:51, 33.10s/it] {'loss': 1.2823, 'learning_rate': 5.69228947237368e-06, 'epoch': 2.72} + 91%|█████████ | 15690/17285 [140:28:54<14:39:51, 33.10s/it] 91%|█████████ | 15691/17285 [140:29:28<14:45:17, 33.32s/it] 91%|█████████ | 15692/17285 [140:30:01<14:41:18, 33.19s/it] 91%|█████████ | 15693/17285 [140:30:35<14:44:33, 33.34s/it] 91%|█████████ | 15694/17285 [140:31:07<14:31:34, 32.87s/it] 91%|█████████ | 15695/17285 [140:31:37<14:14:05, 32.23s/it] 91%|█████████ | 15696/17285 [140:32:04<13:30:56, 30.62s/it] 91%|█████████ | 15697/17285 [140:32:34<13:26:18, 30.47s/it] 91%|█████████ | 15698/17285 [140:33:10<14:05:17, 31.96s/it][2023-08-28 20:28:13,884] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 91%|█████████ | 15699/17285 [140:33:36<13:21:23, 30.32s/it] 91%|█████████ | 15700/17285 [140:34:02<12:42:36, 28.87s/it] {'loss': 1.2788, 'learning_rate': 5.635161880753381e-06, 'epoch': 2.72} + 91%|█████████ | 15700/17285 [140:34:02<12:42:36, 28.87s/it] 91%|█████████ | 15701/17285 [140:34:35<13:17:59, 30.23s/it] 91%|█████████ | 15702/17285 [140:35:05<13:12:41, 30.05s/it] 91%|█████████ | 15703/17285 [140:35:38<13:40:30, 31.12s/it] 91%|█████████ | 15704/17285 [140:36:09<13:39:32, 31.10s/it] 91%|█████████ | 15705/17285 [140:36:48<14:38:16, 33.35s/it] 91%|█████████ | 15706/17285 [140:37:19<14:16:26, 32.54s/it] 91%|█████████ | 15707/17285 [140:37:47<13:39:49, 31.17s/it] 91%|█████████ | 15708/17285 [140:38:16<13:24:33, 30.61s/it] 91%|█████████ | 15709/17285 [140:38:43<12:52:54, 29.43s/it] 91%|█████████ | 15710/17285 [140:39:18<13:40:14, 31.25s/it] {'loss': 1.2814, 'learning_rate': 5.572014947411885e-06, 'epoch': 2.73} + 91%|█████████ | 15710/17285 [140:39:18<13:40:14, 31.25s/it] 91%|█████████ | 15711/17285 [140:39:53<14:05:32, 32.23s/it] 91%|█████████ | 15712/17285 [140:40:18<13:11:37, 30.20s/it] 91%|█████████ | 15713/17285 [140:40:53<13:49:01, 31.64s/it] 91%|█████████ | 15714/17285 [140:41:34<14:58:58, 34.33s/it] 91%|█████████ | 15715/17285 [140:42:04<14:28:49, 33.20s/it] 91%|█████████ | 15716/17285 [140:42:34<13:59:52, 32.12s/it] 91%|█████████ | 15717/17285 [140:43:06<13:57:43, 32.06s/it] 91%|█████████ | 15718/17285 [140:43:32<13:09:34, 30.23s/it] 91%|█████████ | 15719/17285 [140:44:12<14:25:52, 33.17s/it][2023-08-28 20:39:18,539] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 91%|█████████ | 15720/17285 [140:44:41<13:53:19, 31.95s/it] {'loss': 1.2719, 'learning_rate': 5.515478243480177e-06, 'epoch': 2.73} + 91%|█████████ | 15720/17285 [140:44:41<13:53:19, 31.95s/it] 91%|█████████ | 15721/17285 [140:45:17<14:22:32, 33.09s/it] 91%|█████████ | 15722/17285 [140:45:50<14:25:26, 33.22s/it] 91%|█████████ | 15723/17285 [140:46:22<14:16:34, 32.90s/it] 91%|█████████ | 15724/17285 [140:46:52<13:51:52, 31.97s/it] 91%|█████████ | 15725/17285 [140:47:26<14:05:12, 32.51s/it] 91%|█████████ | 15726/17285 [140:47:52<13:15:52, 30.63s/it] 91%|█████████ | 15727/17285 [140:48:18<12:40:29, 29.29s/it] 91%|█████████ | 15728/17285 [140:48:49<12:52:04, 29.75s/it] 91%|█████████ | 15729/17285 [140:49:28<13:59:46, 32.38s/it] 91%|█████████ | 15730/17285 [140:49:57<13:35:33, 31.47s/it] {'loss': 1.2618, 'learning_rate': 5.452988268147996e-06, 'epoch': 2.73} + 91%|█████████ | 15730/17285 [140:49:57<13:35:33, 31.47s/it] 91%|█████████ | 15731/17285 [140:50:22<12:46:37, 29.60s/it] 91%|█████████ | 15732/17285 [140:50:58<13:33:54, 31.45s/it] 91%|█████████ | 15733/17285 [140:51:23<12:41:52, 29.45s/it] 91%|█████████ | 15734/17285 [140:51:57<13:18:14, 30.88s/it] 91%|█████████ | 15735/17285 [140:52:23<12:43:09, 29.54s/it] 91%|█████████ | 15736/17285 [140:52:56<13:03:19, 30.34s/it] 91%|█████████ | 15737/17285 [140:53:30<13:30:53, 31.43s/it] 91%|█████████ | 15738/17285 [140:54:01<13:29:41, 31.40s/it] 91%|█████████ | 15739/17285 [140:54:31<13:17:59, 30.97s/it] 91%|█████████ | 15740/17285 [140:55:00<13:05:28, 30.50s/it] {'loss': 1.3436, 'learning_rate': 5.390844392429362e-06, 'epoch': 2.73} + 91%|█████████ | 15740/17285 [140:55:00<13:05:28, 30.50s/it] 91%|█████████ | 15741/17285 [140:55:33<13:18:47, 31.04s/it] 91%|█████████ | 15742/17285 [140:56:03<13:11:39, 30.78s/it] 91%|█████████ | 15743/17285 [140:56:36<13:30:37, 31.54s/it] 91%|█████████ | 15744/17285 [140:57:07<13:24:19, 31.32s/it] 91%|█████████ | 15745/17285 [140:57:40<13:36:49, 31.82s/it] 91%|█████████ | 15746/17285 [140:58:09<13:17:35, 31.10s/it] 91%|█████████ | 15747/17285 [140:58:47<14:10:45, 33.19s/it] 91%|█████████ | 15748/17285 [140:59:22<14:17:58, 33.49s/it] 91%|█████████ | 15749/17285 [140:59:58<14:36:26, 34.24s/it] 91%|█████████ | 15750/17285 [141:00:36<15:09:00, 35.53s/it] {'loss': 1.2658, 'learning_rate': 5.329046843808683e-06, 'epoch': 2.73} + 91%|█████████ | 15750/17285 [141:00:36<15:09:00, 35.53s/it] 91%|█████████ | 15751/17285 [141:01:05<14:17:36, 33.54s/it] 91%|█████████ | 15752/17285 [141:01:30<13:11:00, 30.96s/it] 91%|█████████ | 15753/17285 [141:02:01<13:09:06, 30.91s/it] 91%|█████████ | 15754/17285 [141:02:33<13:15:56, 31.19s/it] 91%|█████████ | 15755/17285 [141:03:03<13:11:08, 31.03s/it] 91%|█████████ | 15756/17285 [141:03:31<12:47:20, 30.11s/it] 91%|█████████ | 15757/17285 [141:04:10<13:51:59, 32.67s/it] 91%|█████████ | 15758/17285 [141:04:37<13:10:45, 31.07s/it] 91%|█████████ | 15759/17285 [141:05:08<13:11:13, 31.11s/it] 91%|█████████ | 15760/17285 [141:05:52<14:46:34, 34.88s/it] {'loss': 1.2742, 'learning_rate': 5.267595848502604e-06, 'epoch': 2.74} + 91%|█████████ | 15760/17285 [141:05:52<14:46:34, 34.88s/it] 91%|█████████ | 15761/17285 [141:06:30<15:07:52, 35.74s/it] 91%|█████████ | 15762/17285 [141:07:06<15:07:44, 35.76s/it] 91%|█████████ | 15763/17285 [141:07:30<13:44:00, 32.48s/it] 91%|█████████ | 15764/17285 [141:08:06<14:08:01, 33.45s/it] 91%|█████████ | 15765/17285 [141:08:41<14:15:32, 33.77s/it] 91%|█████████ | 15766/17285 [141:09:16<14:29:20, 34.34s/it] 91%|█████████ | 15767/17285 [141:09:53<14:45:10, 34.99s/it] 91%|█████████ | 15768/17285 [141:10:28<14:48:56, 35.16s/it] 91%|█████████ | 15769/17285 [141:11:06<15:09:05, 35.98s/it] 91%|█████████ | 15770/17285 [141:11:36<14:19:10, 34.03s/it] {'loss': 1.2553, 'learning_rate': 5.2064916314591646e-06, 'epoch': 2.74} + 91%|█████████ | 15770/17285 [141:11:36<14:19:10, 34.03s/it] 91%|█████████ | 15771/17285 [141:12:11<14:29:28, 34.46s/it] 91%|█████████ | 15772/17285 [141:12:45<14:26:07, 34.35s/it] 91%|█████████▏| 15773/17285 [141:13:18<14:12:30, 33.83s/it] 91%|█████████▏| 15774/17285 [141:13:54<14:29:43, 34.54s/it] 91%|█████████▏| 15775/17285 [141:14:30<14:38:46, 34.92s/it] 91%|█████████▏| 15776/17285 [141:15:06<14:48:48, 35.34s/it] 91%|█████████▏| 15777/17285 [141:15:41<14:40:51, 35.05s/it] 91%|█████████▏| 15778/17285 [141:16:10<14:01:15, 33.49s/it] 91%|█████████▏| 15779/17285 [141:16:41<13:36:15, 32.52s/it] 91%|█████████▏| 15780/17285 [141:17:14<13:41:38, 32.76s/it] {'loss': 1.2679, 'learning_rate': 5.145734416356996e-06, 'epoch': 2.74} + 91%|█████████▏| 15780/17285 [141:17:14<13:41:38, 32.76s/it] 91%|█████████▏| 15781/17285 [141:17:43<13:09:42, 31.50s/it] 91%|█████████▏| 15782/17285 [141:18:15<13:14:49, 31.73s/it] 91%|█████████▏| 15783/17285 [141:19:03<15:15:37, 36.58s/it] 91%|█████████▏| 15784/17285 [141:19:41<15:27:45, 37.09s/it] 91%|█████████▏| 15785/17285 [141:20:15<15:04:41, 36.19s/it] 91%|█████████▏| 15786/17285 [141:20:41<13:44:44, 33.01s/it] 91%|█████████▏| 15787/17285 [141:21:17<14:05:54, 33.88s/it] 91%|█████████▏| 15788/17285 [141:21:54<14:33:35, 35.01s/it] 91%|█████████▏| 15789/17285 [141:22:22<13:40:01, 32.89s/it] 91%|█████████▏| 15790/17285 [141:22:52<13:14:46, 31.90s/it] {'loss': 1.2254, 'learning_rate': 5.085324425604499e-06, 'epoch': 2.74} + 91%|█████████▏| 15790/17285 [141:22:52<13:14:46, 31.90s/it] 91%|█████████▏| 15791/17285 [141:23:27<13:37:14, 32.82s/it] 91%|█████████▏| 15792/17285 [141:24:00<13:40:08, 32.96s/it] 91%|█████████▏| 15793/17285 [141:24:36<14:05:19, 33.99s/it] 91%|█████████▏| 15794/17285 [141:25:06<13:31:08, 32.64s/it] 91%|█████████▏| 15795/17285 [141:25:40<13:42:37, 33.13s/it] 91%|█████████▏| 15796/17285 [141:26:07<12:52:32, 31.13s/it] 91%|█████████▏| 15797/17285 [141:26:38<12:53:37, 31.19s/it] 91%|█████████▏| 15798/17285 [141:27:11<13:10:02, 31.88s/it] 91%|█████████▏| 15799/17285 [141:27:42<13:02:37, 31.60s/it] 91%|█████████▏| 15800/17285 [141:28:12<12:46:34, 30.97s/it] {'loss': 1.2656, 'learning_rate': 5.025261880338994e-06, 'epoch': 2.74} + 91%|█████████▏| 15800/17285 [141:28:12<12:46:34, 30.97s/it] 91%|█████████▏| 15801/17285 [141:28:51<13:45:08, 33.36s/it] 91%|█████████▏| 15802/17285 [141:29:25<13:48:51, 33.53s/it] 91%|█████████▏| 15803/17285 [141:29:52<13:01:14, 31.63s/it] 91%|█████████▏| 15804/17285 [141:30:24<13:04:32, 31.78s/it] 91%|█████████▏| 15805/17285 [141:31:03<13:52:43, 33.76s/it] 91%|█████████▏| 15806/17285 [141:31:34<13:36:36, 33.13s/it] 91%|█████████▏| 15807/17285 [141:32:04<13:08:42, 32.02s/it] 91%|█████████▏| 15808/17285 [141:32:40<13:37:30, 33.21s/it] 91%|█████████▏| 15809/17285 [141:33:09<13:10:53, 32.15s/it] 91%|█████████▏| 15810/17285 [141:33:41<13:04:32, 31.91s/it] {'loss': 1.2524, 'learning_rate': 4.965547000425985e-06, 'epoch': 2.74} + 91%|█████████▏| 15810/17285 [141:33:41<13:04:32, 31.91s/it] 91%|█████████▏| 15811/17285 [141:34:10<12:44:10, 31.11s/it] 91%|█████████▏| 15812/17285 [141:34:40<12:37:21, 30.85s/it] 91%|█████████▏| 15813/17285 [141:35:11<12:40:25, 31.00s/it] 91%|█████████▏| 15814/17285 [141:35:37<11:58:48, 29.32s/it] 91%|█████████▏| 15815/17285 [141:36:06<11:58:03, 29.31s/it] 92%|█████████▏| 15816/17285 [141:36:37<12:08:07, 29.74s/it] 92%|█████████▏| 15817/17285 [141:37:05<11:58:26, 29.36s/it] 92%|█████████▏| 15818/17285 [141:37:38<12:25:00, 30.47s/it] 92%|█████████▏| 15819/17285 [141:38:09<12:24:49, 30.48s/it] 92%|█████████▏| 15820/17285 [141:38:36<11:57:14, 29.38s/it] {'loss': 1.2899, 'learning_rate': 4.9061800044582385e-06, 'epoch': 2.75} + 92%|█████████▏| 15820/17285 [141:38:36<11:57:14, 29.38s/it] 92%|█████████▏| 15821/17285 [141:39:09<12:25:57, 30.57s/it] 92%|█████████▏| 15822/17285 [141:39:38<12:12:43, 30.05s/it] 92%|█████████▏| 15823/17285 [141:40:06<11:59:36, 29.53s/it][2023-08-28 21:35:27,733] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 92%|█████████▏| 15824/17285 [141:40:50<13:43:29, 33.82s/it] 92%|█████████▏| 15825/17285 [141:41:16<12:42:47, 31.35s/it] 92%|█████████▏| 15826/17285 [141:41:46<12:37:10, 31.14s/it] 92%|█████████▏| 15827/17285 [141:42:23<13:18:28, 32.86s/it] 92%|█████████▏| 15828/17285 [141:42:54<13:04:37, 32.31s/it] 92%|█████████▏| 15829/17285 [141:43:24<12:44:12, 31.49s/it] 92%|█████████▏| 15830/17285 [141:43:48<11:52:15, 29.37s/it] {'loss': 1.2837, 'learning_rate': 4.853047328501259e-06, 'epoch': 2.75} + 92%|█████████▏| 15830/17285 [141:43:48<11:52:15, 29.37s/it] 92%|█████████▏| 15831/17285 [141:44:24<12:41:38, 31.43s/it] 92%|█████████▏| 15832/17285 [141:45:00<13:13:29, 32.77s/it] 92%|█████████▏| 15833/17285 [141:45:31<12:55:33, 32.05s/it] 92%|█████████▏| 15834/17285 [141:46:08<13:31:25, 33.55s/it] 92%|█████████▏| 15835/17285 [141:46:41<13:26:27, 33.37s/it] 92%|███��█████▏| 15836/17285 [141:47:21<14:13:08, 35.33s/it] 92%|█████████▏| 15837/17285 [141:47:55<14:08:38, 35.16s/it] 92%|█████████▏| 15838/17285 [141:48:32<14:16:44, 35.53s/it] 92%|█████████▏| 15839/17285 [141:49:02<13:38:48, 33.98s/it] 92%|█████████▏| 15840/17285 [141:49:37<13:43:52, 34.21s/it] {'loss': 1.2689, 'learning_rate': 4.794341909691191e-06, 'epoch': 2.75} + 92%|█████████▏| 15840/17285 [141:49:37<13:43:52, 34.21s/it] 92%|█████████▏| 15841/17285 [141:50:08<13:18:09, 33.16s/it] 92%|█████████▏| 15842/17285 [141:50:49<14:14:11, 35.52s/it] 92%|█████████▏| 15843/17285 [141:51:14<13:03:46, 32.61s/it] 92%|█████████▏| 15844/17285 [141:51:47<13:00:06, 32.48s/it] 92%|█████████▏| 15845/17285 [141:52:15<12:28:34, 31.19s/it] 92%|█████████▏| 15846/17285 [141:52:47<12:37:13, 31.57s/it] 92%|█████████▏| 15847/17285 [141:53:23<13:05:14, 32.76s/it] 92%|█████████▏| 15848/17285 [141:53:55<12:58:37, 32.51s/it] 92%|█████████▏| 15849/17285 [141:54:27<12:59:47, 32.58s/it] 92%|█████████▏| 15850/17285 [141:55:02<13:15:07, 33.25s/it] {'loss': 1.2794, 'learning_rate': 4.735985001541243e-06, 'epoch': 2.75} + 92%|█████████▏| 15850/17285 [141:55:02<13:15:07, 33.25s/it] 92%|█████████▏| 15851/17285 [141:55:37<13:24:58, 33.68s/it] 92%|█████████▏| 15852/17285 [141:56:05<12:46:14, 32.08s/it] 92%|█████████▏| 15853/17285 [141:56:44<13:34:30, 34.13s/it] 92%|█████████▏| 15854/17285 [141:57:21<13:56:25, 35.07s/it] 92%|█████████▏| 15855/17285 [141:58:08<15:19:28, 38.58s/it] 92%|█████████▏| 15856/17285 [141:58:39<14:24:47, 36.31s/it] 92%|█████████▏| 15857/17285 [141:59:09<13:37:12, 34.34s/it] 92%|█████████▏| 15858/17285 [141:59:46<13:57:22, 35.21s/it] 92%|█████████▏| 15859/17285 [142:00:18<13:29:18, 34.05s/it] 92%|█████████▏| 15860/17285 [142:01:00<14:28:34, 36.57s/it] {'loss': 1.2599, 'learning_rate': 4.677976817673235e-06, 'epoch': 2.75} + 92%|█████████▏| 15860/17285 [142:01:00<14:28:34, 36.57s/it] 92%|█████████▏| 15861/17285 [142:01:27<13:22:51, 33.83s/it] 92%|█████████▏| 15862/17285 [142:01:56<12:41:56, 32.13s/it] 92%|█████████▏| 15863/17285 [142:02:32<13:12:31, 33.44s/it] 92%|█████████▏| 15864/17285 [142:03:06<13:16:00, 33.61s/it] 92%|█████████▏| 15865/17285 [142:03:44<13:45:09, 34.87s/it] 92%|█████████▏| 15866/17285 [142:04:15<13:18:10, 33.75s/it] 92%|█████████▏| 15867/17285 [142:04:47<13:07:53, 33.34s/it] 92%|█████████▏| 15868/17285 [142:05:16<12:30:19, 31.77s/it] 92%|█████████▏| 15869/17285 [142:05:44<12:04:05, 30.68s/it] 92%|█████████▏| 15870/17285 [142:06:12<11:49:45, 30.10s/it] {'loss': 1.2905, 'learning_rate': 4.62031757043242e-06, 'epoch': 2.75} + 92%|█████████▏| 15870/17285 [142:06:12<11:49:45, 30.10s/it] 92%|█████████▏| 15871/17285 [142:06:39<11:27:13, 29.16s/it] 92%|█████████▏| 15872/17285 [142:07:06<11:08:20, 28.38s/it] 92%|█████████▏| 15873/17285 [142:07:35<11:10:57, 28.51s/it] 92%|█████████▏| 15874/17285 [142:08:05<11:24:43, 29.12s/it] 92%|█████████▏| 15875/17285 [142:08:35<11:28:07, 29.28s/it] 92%|█████████▏| 15876/17285 [142:09:07<11:49:19, 30.21s/it] 92%|█████████▏| 15877/17285 [142:09:37<11:46:37, 30.11s/it] 92%|█████████▏| 15878/17285 [142:10:09<11:58:36, 30.64s/it] 92%|█████████▏| 15879/17285 [142:10:39<11:54:40, 30.50s/it] 92%|█████████▏| 15880/17285 [142:11:14<12:26:06, 31.86s/it] {'loss': 1.2726, 'learning_rate': 4.563007470886749e-06, 'epoch': 2.76} + 92%|█████████▏| 15880/17285 [142:11:14<12:26:06, 31.86s/it] 92%|█████████▏| 15881/17285 [142:11:55<13:28:48, 34.56s/it] 92%|█████████▏| 15882/17285 [142:12:32<13:46:47, 35.36s/it] 92%|█████████▏| 15883/17285 [142:13:02<13:07:35, 33.71s/it] 92%|█████████▏| 15884/17285 [142:13:32<12:36:22, 32.39s/it] 92%|█████████▏| 15885/17285 [142:14:12<13:29:03, 34.67s/it] 92%|��████████▏| 15886/17285 [142:14:44<13:14:44, 34.08s/it] 92%|█████████▏| 15887/17285 [142:15:10<12:17:22, 31.65s/it] 92%|█████████▏| 15888/17285 [142:15:44<12:33:27, 32.36s/it] 92%|█████████▏| 15889/17285 [142:16:18<12:43:01, 32.80s/it] 92%|█████████▏| 15890/17285 [142:16:47<12:17:01, 31.70s/it] {'loss': 1.2318, 'learning_rate': 4.506046728826075e-06, 'epoch': 2.76} + 92%|█████████▏| 15890/17285 [142:16:47<12:17:01, 31.70s/it] 92%|█████████▏| 15891/17285 [142:17:25<12:56:30, 33.42s/it] 92%|█████████▏| 15892/17285 [142:17:54<12:29:04, 32.26s/it] 92%|█████████▏| 15893/17285 [142:18:21<11:49:00, 30.56s/it] 92%|█████████▏| 15894/17285 [142:18:57<12:30:34, 32.38s/it] 92%|█████████▏| 15895/17285 [142:19:29<12:26:17, 32.21s/it] 92%|█████████▏| 15896/17285 [142:19:55<11:39:01, 30.20s/it] 92%|█████████▏| 15897/17285 [142:20:24<11:32:51, 29.95s/it] 92%|█████████▏| 15898/17285 [142:20:56<11:48:08, 30.63s/it] 92%|█████████▏| 15899/17285 [142:21:31<12:14:47, 31.81s/it] 92%|█████████▏| 15900/17285 [142:22:02<12:06:36, 31.48s/it] {'loss': 1.2712, 'learning_rate': 4.449435552761372e-06, 'epoch': 2.76} + 92%|█████████▏| 15900/17285 [142:22:02<12:06:36, 31.48s/it] 92%|█████████▏| 15901/17285 [142:22:31<11:51:15, 30.84s/it][2023-08-28 22:17:40,706] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 92%|█████████▏| 15902/17285 [142:23:03<11:59:23, 31.21s/it] 92%|█████████▏| 15903/17285 [142:23:33<11:48:01, 30.74s/it] 92%|█████████▏| 15904/17285 [142:23:59<11:17:43, 29.44s/it] 92%|█████████▏| 15905/17285 [142:24:28<11:12:33, 29.24s/it] 92%|█████████▏| 15906/17285 [142:25:04<11:56:27, 31.17s/it] 92%|█████████▏| 15907/17285 [142:25:38<12:18:56, 32.17s/it] 92%|█████████▏| 15908/17285 [142:26:13<12:37:54, 33.02s/it] 92%|█████████▏| 15909/17285 [142:26:41<12:04:22, 31.59s/it] 92%|█████████▏| 15910/17285 [142:27:20<12:55:31, 33.84s/it] {'loss': 1.3048, 'learning_rate': 4.398784544532874e-06, 'epoch': 2.76} + 92%|█████████▏| 15910/17285 [142:27:20<12:55:31, 33.84s/it] 92%|█████████▏| 15911/17285 [142:27:49<12:20:17, 32.33s/it] 92%|█████████▏| 15912/17285 [142:28:16<11:40:51, 30.63s/it] 92%|█████████▏| 15913/17285 [142:28:50<12:07:19, 31.81s/it] 92%|█████████▏| 15914/17285 [142:29:20<11:52:09, 31.17s/it] 92%|█████████▏| 15915/17285 [142:29:55<12:15:44, 32.22s/it] 92%|█████████▏| 15916/17285 [142:30:29<12:32:04, 32.96s/it] 92%|█████████▏| 15917/17285 [142:31:11<13:30:03, 35.53s/it] 92%|█████████▏| 15918/17285 [142:31:37<12:23:59, 32.65s/it] 92%|█████████▏| 15919/17285 [142:32:13<12:48:12, 33.74s/it] 92%|█████████▏| 15920/17285 [142:32:49<12:58:50, 34.23s/it] {'loss': 1.2803, 'learning_rate': 4.342838113724712e-06, 'epoch': 2.76} + 92%|█████████▏| 15920/17285 [142:32:49<12:58:50, 34.23s/it] 92%|█████████▏| 15921/17285 [142:33:15<12:07:09, 31.99s/it] 92%|█████████▏| 15922/17285 [142:33:42<11:32:30, 30.48s/it] 92%|█████████▏| 15923/17285 [142:34:24<12:47:53, 33.83s/it] 92%|█████████▏| 15924/17285 [142:34:54<12:21:57, 32.71s/it] 92%|█████████▏| 15925/17285 [142:35:20<11:37:54, 30.79s/it] 92%|█████████▏| 15926/17285 [142:35:53<11:47:53, 31.25s/it] 92%|█████████▏| 15927/17285 [142:36:20<11:19:29, 30.02s/it] 92%|█████████▏| 15928/17285 [142:36:53<11:42:23, 31.06s/it] 92%|█████████▏| 15929/17285 [142:37:28<12:04:50, 32.07s/it] 92%|█████████▏| 15930/17285 [142:38:01<12:12:06, 32.42s/it] {'loss': 1.3073, 'learning_rate': 4.2872418463554055e-06, 'epoch': 2.76} + 92%|█████████▏| 15930/17285 [142:38:01<12:12:06, 32.42s/it] 92%|█████████▏| 15931/17285 [142:38:37<12:37:30, 33.57s/it] 92%|█████████▏| 15932/17285 [142:39:11<12:40:26, 33.72s/it] 92%|█████████▏| 15933/17285 [142:39:50<13:12:36, 35.18s/it] 92%|█████████▏| 15934/17285 [142:40:15<12:01:46, 32.05s/it] 92%|█████████▏| 15935/17285 [142:40:40<11:18:28, 30.15s/it] 92%|█████████▏| 15936/17285 [142:41:14<11:43:51, 31.31s/it] 92%|█████████▏| 15937/17285 [142:41:46<11:47:04, 31.47s/it] 92%|█████████▏| 15938/17285 [142:42:11<11:01:38, 29.47s/it] 92%|█████████▏| 15939/17285 [142:42:42<11:09:06, 29.83s/it] 92%|█████████▏| 15940/17285 [142:43:16<11:37:42, 31.12s/it] {'loss': 1.2495, 'learning_rate': 4.231995945941125e-06, 'epoch': 2.77} + 92%|█████████▏| 15940/17285 [142:43:16<11:37:42, 31.12s/it] 92%|█████████▏| 15941/17285 [142:43:51<12:04:49, 32.36s/it] 92%|█████████▏| 15942/17285 [142:44:19<11:37:09, 31.15s/it] 92%|█████████▏| 15943/17285 [142:44:54<11:58:46, 32.14s/it] 92%|█████████▏| 15944/17285 [142:45:27<12:03:23, 32.37s/it] 92%|█████████▏| 15945/17285 [142:46:02<12:19:42, 33.12s/it] 92%|█████████▏| 15946/17285 [142:46:33<12:06:06, 32.54s/it] 92%|█████████▏| 15947/17285 [142:47:03<11:51:16, 31.90s/it] 92%|█████████▏| 15948/17285 [142:47:30<11:16:24, 30.35s/it] 92%|█████████▏| 15949/17285 [142:47:58<11:02:09, 29.74s/it] 92%|█████████▏| 15950/17285 [142:48:31<11:22:29, 30.67s/it] {'loss': 1.2985, 'learning_rate': 4.1771006147155015e-06, 'epoch': 2.77} + 92%|█████████▏| 15950/17285 [142:48:31<11:22:29, 30.67s/it] 92%|█████████▏| 15951/17285 [142:49:07<11:54:05, 32.12s/it] 92%|█████████▏| 15952/17285 [142:49:43<12:21:43, 33.39s/it] 92%|█████████▏| 15953/17285 [142:50:18<12:31:36, 33.86s/it] 92%|█████████▏| 15954/17285 [142:50:54<12:48:06, 34.63s/it] 92%|█████████▏| 15955/17285 [142:51:22<12:02:44, 32.61s/it] 92%|█████████▏| 15956/17285 [142:51:51<11:35:09, 31.38s/it] 92%|█████████▏| 15957/17285 [142:52:16<10:55:10, 29.60s/it] 92%|█████████▏| 15958/17285 [142:52:49<11:13:10, 30.44s/it] 92%|█████████▏| 15959/17285 [142:53:21<11:24:34, 30.98s/it] 92%|█████████▏| 15960/17285 [142:53:47<10:55:33, 29.69s/it] {'loss': 1.2603, 'learning_rate': 4.122556053628868e-06, 'epoch': 2.77} + 92%|█████████▏| 15960/17285 [142:53:47<10:55:33, 29.69s/it] 92%|█████████▏| 15961/17285 [142:54:18<10:58:29, 29.84s/it] 92%|█████████▏| 15962/17285 [142:54:47<10:53:42, 29.65s/it] 92%|█████████▏| 15963/17285 [142:55:19<11:06:42, 30.26s/it] 92%|█████████▏| 15964/17285 [142:55:53<11:31:10, 31.39s/it] 92%|█████████▏| 15965/17285 [142:56:27<11:49:10, 32.24s/it] 92%|█████████▏| 15966/17285 [142:56:57<11:37:39, 31.74s/it] 92%|█████████▏| 15967/17285 [142:57:29<11:39:06, 31.83s/it] 92%|█████████▏| 15968/17285 [142:58:04<11:54:14, 32.54s/it] 92%|█████████▏| 15969/17285 [142:58:35<11:43:19, 32.07s/it] 92%|█████████▏| 15970/17285 [142:59:06<11:39:34, 31.92s/it] {'loss': 1.2751, 'learning_rate': 4.068362462347508e-06, 'epoch': 2.77} + 92%|█████████▏| 15970/17285 [142:59:06<11:39:34, 31.92s/it] 92%|█████████▏| 15971/17285 [142:59:43<12:14:47, 33.55s/it] 92%|█████████▏| 15972/17285 [143:00:12<11:40:47, 32.02s/it] 92%|█████████▏| 15973/17285 [143:00:40<11:12:14, 30.74s/it] 92%|█████████▏| 15974/17285 [143:01:06<10:40:33, 29.32s/it] 92%|█████████▏| 15975/17285 [143:01:37<10:51:31, 29.84s/it] 92%|█████████▏| 15976/17285 [143:02:15<11:46:00, 32.36s/it] 92%|█████████▏| 15977/17285 [143:02:45<11:30:10, 31.66s/it] 92%|█████████▏| 15978/17285 [143:03:22<12:01:44, 33.13s/it] 92%|█████████▏| 15979/17285 [143:03:51<11:35:52, 31.97s/it] 92%|█████████▏| 15980/17285 [143:04:25<11:49:10, 32.61s/it] {'loss': 1.2502, 'learning_rate': 4.014520039252956e-06, 'epoch': 2.77} + 92%|█████████▏| 15980/17285 [143:04:25<11:49:10, 32.61s/it] 92%|█████████▏| 15981/17285 [143:05:01<12:11:36, 33.66s/it] 92%|█████████▏| 15982/17285 [143:05:46<13:25:11, 37.08s/it] 92%|█████████▏| 15983/17285 [143:06:17<12:47:03, 35.35s/it] 92%|█████████▏| 15984/17285 [143:06:46<11:59:47, 33.20s/it] 92%|█████████▏| 15985/17285 [143:07:22<12:19:01, 34.11s/it] 92%|█████████▏| 15986/17285 [143:07:54<12:04:55, 33.48s/it] 92%|█████████▏| 15987/17285 [143:08:28<12:10:50, 33.78s/it] 92%|█████████▏| 15988/17285 [143:08:55<11:20:52, 31.50s/it] 93%|█████████▎| 15989/17285 [143:09:25<11:13:10, 31.17s/it] 93%|█████████▎| 15990/17285 [143:09:51<10:42:01, 29.75s/it] {'loss': 1.2988, 'learning_rate': 3.961028981441251e-06, 'epoch': 2.78} + 93%|█████████▎| 15990/17285 [143:09:51<10:42:01, 29.75s/it] 93%|█████████▎| 15991/17285 [143:10:24<10:57:16, 30.48s/it] 93%|█████████▎| 15992/17285 [143:10:49<10:26:26, 29.07s/it] 93%|█████████▎| 15993/17285 [143:11:23<10:56:38, 30.49s/it] 93%|█████████▎| 15994/17285 [143:11:50<10:31:07, 29.33s/it] 93%|█████████▎| 15995/17285 [143:12:21<10:39:59, 29.77s/it] 93%|█████████▎| 15996/17285 [143:12:56<11:18:29, 31.58s/it] 93%|█████████▎| 15997/17285 [143:13:23<10:47:44, 30.17s/it] 93%|█████████▎| 15998/17285 [143:14:04<11:58:22, 33.49s/it] 93%|█████████▎| 15999/17285 [143:14:35<11:40:09, 32.67s/it] 93%|█████████▎| 16000/17285 [143:15:13<12:10:27, 34.11s/it] {'loss': 1.2901, 'learning_rate': 3.907889484722238e-06, 'epoch': 2.78} + 93%|█████████▎| 16000/17285 [143:15:13<12:10:27, 34.11s/it][INFO|trainer.py:3081] 2023-08-28 23:09:50,385 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-28 23:09:50,385 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-28 23:09:50,385 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-13000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-16000 +[INFO|tokenization_utils_base.py:2210] 2023-08-28 23:11:15,621 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-16000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-28 23:11:15,625 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-16000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-16000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-16000 + 93%|█████████▎| 16001/17285 [143:17:11<21:09:14, 59.31s/it] 93%|█████████▎| 16002/17285 [143:17:43<18:11:13, 51.03s/it] 93%|█████████▎| 16003/17285 [143:18:20<16:41:23, 46.87s/it] 93%|█████████▎| 16004/17285 [143:18:54<15:22:40, 43.22s/it] 93%|█████████▎| 16005/17285 [143:19:25<13:58:40, 39.31s/it] 93%|█████████▎| 16006/17285 [143:19:58<13:19:05, 37.49s/it] 93%|█████████▎| 16007/17285 [143:20:36<13:25:29, 37.82s/it] 93%|█████████▎| 16008/17285 [143:21:09<12:50:47, 36.22s/it] 93%|█████████▎| 16009/17285 [143:21:37<11:57:22, 33.73s/it] 93%|█████████▎| 16010/17285 [143:22:15<12:25:24, 35.08s/it] {'loss': 1.278, 'learning_rate': 3.855101743618806e-06, 'epoch': 2.78} + 93%|█████████▎| 16010/17285 [143:22:15<12:25:24, 35.08s/it] 93%|█████████▎| 16011/17285 [143:22:53<12:45:29, 36.05s/it] 93%|█████████▎| 16012/17285 [143:23:19<11:38:11, 32.91s/it] 93%|█████████▎| 16013/17285 [143:23:49<11:18:10, 31.99s/it] 93%|█████████▎| 16014/17285 [143:24:20<11:14:02, 31.82s/it] 93%|█████████▎| 16015/17285 [143:24:48<10:46:55, 30.56s/it] 93%|█████████▎| 16016/17285 [143:25:18<10:45:07, 30.50s/it] 93%|█████████▎| 16017/17285 [143:25:47<10:36:27, 30.12s/it] 93%|█████████▎| 16018/17285 [143:26:20<10:49:09, 30.74s/it] 93%|█████████▎| 16019/17285 [143:26:55<11:21:03, 32.28s/it] 93%|█████████▎| 16020/17285 [143:27:26<11:07:53, 31.68s/it] {'loss': 1.2782, 'learning_rate': 3.8026659513662353e-06, 'epoch': 2.78} + 93%|█████████▎| 16020/17285 [143:27:26<11:07:53, 31.68s/it] 93%|█████████▎| 16021/17285 [143:28:00<11:22:54, 32.42s/it] 93%|█████████▎| 16022/17285 [143:28:30<11:05:09, 31.60s/it] 93%|█████████▎| 16023/17285 [143:29:06<11:35:31, 33.07s/it] 93%|█████████▎| 16024/17285 [143:29:36<11:16:22, 32.18s/it] 93%|█████████▎| 16025/17285 [143:30:16<12:06:19, 34.59s/it] 93%|█████████▎| 16026/17285 [143:30:46<11:31:38, 32.96s/it] 93%|█████████▎| 16027/17285 [143:31:23<11:56:22, 34.17s/it] 93%|█████████▎| 16028/17285 [143:31:54<11:40:28, 33.44s/it] 93%|█████████▎| 16029/17285 [143:32:27<11:38:44, 33.38s/it] 93%|█████████▎| 16030/17285 [143:32:59<11:24:33, 32.73s/it] {'loss': 1.266, 'learning_rate': 3.7505822999114206e-06, 'epoch': 2.78} + 93%|█████████▎| 16030/17285 [143:32:59<11:24:33, 32.73s/it] 93%|█████████▎| 16031/17285 [143:33:34<11:38:16, 33.41s/it] 93%|█████████▎| 16032/17285 [143:34:13<12:13:23, 35.12s/it] 93%|█████████▎| 16033/17285 [143:34:44<11:49:42, 34.01s/it] 93%|█████████▎| 16034/17285 [143:35:16<11:35:15, 33.35s/it] 93%|█████████▎| 16035/17285 [143:35:49<11:29:25, 33.09s/it] 93%|█████████▎| 16036/17285 [143:36:14<10:38:45, 30.68s/it] 93%|█████████▎| 16037/17285 [143:36:53<11:31:07, 33.23s/it] 93%|█████████▎| 16038/17285 [143:37:21<10:59:50, 31.75s/it] 93%|█████████▎| 16039/17285 [143:38:00<11:41:19, 33.77s/it] 93%|█████████▎| 16040/17285 [143:38:27<10:59:03, 31.76s/it] {'loss': 1.2606, 'learning_rate': 3.6988509799122494e-06, 'epoch': 2.78} + 93%|█████████▎| 16040/17285 [143:38:27<10:59:03, 31.76s/it] 93%|█████████▎| 16041/17285 [143:39:03<11:29:58, 33.28s/it] 93%|█████████▎| 16042/17285 [143:39:43<12:08:27, 35.16s/it] 93%|█████████▎| 16043/17285 [143:40:11<11:23:24, 33.01s/it] 93%|█████████▎| 16044/17285 [143:40:43<11:17:22, 32.75s/it] 93%|█████████▎| 16045/17285 [143:41:15<11:13:15, 32.58s/it] 93%|█████████▎| 16046/17285 [143:41:52<11:41:06, 33.95s/it] 93%|█████████▎| 16047/17285 [143:42:27<11:45:26, 34.19s/it] 93%|█████████▎| 16048/17285 [143:43:01<11:41:28, 34.02s/it] 93%|█████████▎| 16049/17285 [143:43:33<11:31:14, 33.56s/it] 93%|█████████▎| 16050/17285 [143:44:07<11:30:56, 33.57s/it] {'loss': 1.2544, 'learning_rate': 3.647472180736833e-06, 'epoch': 2.79} + 93%|█████████▎| 16050/17285 [143:44:07<11:30:56, 33.57s/it] 93%|█████████▎| 16051/17285 [143:44:37<11:09:46, 32.57s/it] 93%|█████████▎| 16052/17285 [143:45:03<10:28:47, 30.60s/it] 93%|█████████▎| 16053/17285 [143:45:33<10:21:05, 30.25s/it] 93%|█████████▎| 16054/17285 [143:46:15<11:35:48, 33.91s/it] 93%|█████████▎| 16055/17285 [143:46:47<11:23:36, 33.35s/it] 93%|█████████▎| 16056/17285 [143:47:13<10:36:00, 31.05s/it] 93%|█████████▎| 16057/17285 [143:47:44<10:33:59, 30.98s/it] 93%|█████████▎| 16058/17285 [143:48:10<10:05:53, 29.63s/it] 93%|█████████▎| 16059/17285 [143:48:48<10:58:45, 32.24s/it] 93%|█████████▎| 16060/17285 [143:49:17<10:36:56, 31.20s/it] {'loss': 1.2632, 'learning_rate': 3.5964460904628685e-06, 'epoch': 2.79} + 93%|█████████▎| 16060/17285 [143:49:17<10:36:56, 31.20s/it] 93%|█████████▎| 16061/17285 [143:49:50<10:44:29, 31.59s/it] 93%|█████████▎| 16062/17285 [143:50:20<10:34:17, 31.12s/it] 93%|█████████▎| 16063/17285 [143:50:53<10:50:17, 31.93s/it] 93%|█████████▎| 16064/17285 [143:51:18<10:06:12, 29.79s/it] 93%|█████████▎| 16065/17285 [143:51:51<10:23:33, 30.67s/it] 93%|█████████▎| 16066/17285 [143:52:17<9:57:17, 29.40s/it] 93%|█████████▎| 16067/17285 [143:52:53<10:33:06, 31.19s/it] 93%|█████████▎| 16068/17285 [143:53:21<10:13:59, 30.27s/it] 93%|█████████▎| 16069/17285 [143:53:53<10:22:24, 30.71s/it] 93%|█████████▎| 16070/17285 [143:54:30<11:02:05, 32.70s/it] {'loss': 1.2793, 'learning_rate': 3.5457728958768642e-06, 'epoch': 2.79} + 93%|█████████▎| 16070/17285 [143:54:30<11:02:05, 32.70s/it] 93%|█████████▎| 16071/17285 [143:54:59<10:38:37, 31.56s/it] 93%|█████████▎| 16072/17285 [143:55:38<11:24:19, 33.85s/it] 93%|█████████▎| 16073/17285 [143:56:08<11:00:52, 32.72s/it] 93%|█████████▎| 16074/17285 [143:56:43<11:11:04, 33.25s/it] 93%|█████████▎| 16075/17285 [143:57:09<10:29:13, 31.20s/it] 93%|█████████▎| 16076/17285 [143:57:44<10:53:53, 32.45s/it] 93%|█████████▎| 16077/17285 [143:58:19<11:08:35, 33.21s/it] 93%|█████████▎| 16078/17285 [143:58:46<10:27:32, 31.19s/it] 93%|█████████▎| 16079/17285 [143:59:18<10:34:57, 31.59s/it] 93%|█████████▎| 16080/17285 [143:59:59<11:30:57, 34.40s/it] {'loss': 1.2691, 'learning_rate': 3.495452782473596e-06, 'epoch': 2.79} + 93%|█████████▎| 16080/17285 [143:59:59<11:30:57, 34.40s/it] 93%|█████████▎| 16081/17285 [144:00:27<10:48:35, 32.32s/it] 93%|█████████▎| 16082/17285 [144:01:03<11:10:12, 33.43s/it] 93%|█████████▎| 16083/17285 [144:01:30<10:30:54, 31.49s/it] 93%|█████████▎| 16084/17285 [144:01:57<10:06:55, 30.32s/it] 93%|█████████▎| 16085/17285 [144:02:27<10:00:24, 30.02s/it] 93%|█████████▎| 16086/17285 [144:02:58<10:04:47, 30.26s/it] 93%|█████████▎| 16087/17285 [144:03:31<10:23:06, 31.21s/it] 93%|█████████▎| 16088/17285 [144:04:00<10:12:13, 30.69s/it] 93%|█████████▎| 16089/17285 [144:04:31<10:10:41, 30.64s/it] 93%|█████████▎| 16090/17285 [144:05:00<10:00:09, 30.13s/it] {'loss': 1.2889, 'learning_rate': 3.4454859344552835e-06, 'epoch': 2.79} + 93%|█████████▎| 16090/17285 [144:05:00<10:00:09, 30.13s/it] 93%|█████████▎| 16091/17285 [144:05:28<9:46:17, 29.46s/it] 93%|█████████▎| 16092/17285 [144:05:59<9:57:43, 30.06s/it] 93%|█████████▎| 16093/17285 [144:06:30<9:59:44, 30.19s/it] 93%|█████████▎| 16094/17285 [144:07:07<10:38:33, 32.17s/it][2023-08-29 00:02:20,283] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 93%|█████████▎| 16095/17285 [144:07:43<11:00:54, 33.32s/it] 93%|█████████▎| 16096/17285 [144:08:19<11:19:03, 34.27s/it] 93%|█████████▎| 16097/17285 [144:08:50<11:00:38, 33.37s/it] 93%|█████████▎| 16098/17285 [144:09:22<10:48:51, 32.80s/it] 93%|█████████▎| 16099/17285 [144:09:54<10:43:36, 32.56s/it] 93%|█████████▎| 16100/17285 [144:10:29<10:57:08, 33.27s/it] {'loss': 1.2798, 'learning_rate': 3.4008179643440496e-06, 'epoch': 2.79} + 93%|█████████▎| 16100/17285 [144:10:29<10:57:08, 33.27s/it] 93%|█████████▎| 16101/17285 [144:11:00<10:46:02, 32.74s/it] 93%|█████████▎| 16102/17285 [144:11:31<10:35:11, 32.22s/it] 93%|█████████▎| 16103/17285 [144:11:57<9:57:41, 30.34s/it] 93%|█████████▎| 16104/17285 [144:12:27<9:53:14, 30.14s/it] 93%|█████████▎| 16105/17285 [144:12:56<9:47:18, 29.86s/it] 93%|█████████▎| 16106/17285 [144:13:27<9:51:56, 30.12s/it] 93%|█████████▎| 16107/17285 [144:14:04<10:33:55, 32.29s/it] 93%|█████████▎| 16108/17285 [144:14:34<10:20:20, 31.62s/it] 93%|█████████▎| 16109/17285 [144:15:06<10:22:48, 31.78s/it] 93%|█████████▎| 16110/17285 [144:15:41<10:40:52, 32.73s/it] {'loss': 1.2751, 'learning_rate': 3.3515228234023422e-06, 'epoch': 2.8} + 93%|█████████▎| 16110/17285 [144:15:41<10:40:52, 32.73s/it][2023-08-29 00:10:49,819] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 93%|█████████▎| 16111/17285 [144:16:12<10:29:12, 32.16s/it] 93%|█████████▎| 16112/17285 [144:16:42<10:17:20, 31.58s/it] 93%|█████████▎| 16113/17285 [144:17:11<10:01:11, 30.78s/it] 93%|█████████▎| 16114/17285 [144:17:38<9:37:49, 29.61s/it] 93%|█████████▎| 16115/17285 [144:18:08<9:40:33, 29.77s/it] 93%|█████████▎| 16116/17285 [144:18:37<9:33:49, 29.45s/it] 93%|█████████▎| 16117/17285 [144:19:08<9:40:49, 29.84s/it] 93%|█████████▎| 16118/17285 [144:19:39<9:46:02, 30.13s/it] 93%|█████████▎| 16119/17285 [144:20:10<9:55:06, 30.62s/it] 93%|█████████▎| 16120/17285 [144:20:37<9:30:26, 29.38s/it] {'loss': 1.2461, 'learning_rate': 3.307459683817815e-06, 'epoch': 2.8} + 93%|█████████▎| 16120/17285 [144:20:37<9:30:26, 29.38s/it] 93%|█████████▎| 16121/17285 [144:21:07<9:32:25, 29.51s/it] 93%|█████████▎| 16122/17285 [144:21:36<9:33:07, 29.57s/it] 93%|█████████▎| 16123/17285 [144:22:06<9:34:07, 29.64s/it] 93%|█████████▎| 16124/17285 [144:22:45<10:24:45, 32.29s/it] 93%|█████████▎| 16125/17285 [144:23:12<9:57:17, 30.89s/it] 93%|█████████▎| 16126/17285 [144:23:42<9:52:05, 30.65s/it] 93%|█████████▎| 16127/17285 [144:24:17<10:12:48, 31.75s/it] 93%|█████████▎| 16128/17285 [144:24:46<9:56:36, 30.94s/it] 93%|█████████▎| 16129/17285 [144:25:16<9:49:48, 30.61s/it] 93%|█████████▎| 16130/17285 [144:25:51<10:15:36, 31.98s/it] {'loss': 1.2488, 'learning_rate': 3.2588369013774933e-06, 'epoch': 2.8} + 93%|█████████▎| 16130/17285 [144:25:51<10:15:36, 31.98s/it] 93%|█████████▎| 16131/17285 [144:26:23<10:15:53, 32.02s/it] 93%|█████████▎| 16132/17285 [144:27:04<11:07:41, 34.75s/it] 93%|█████████▎| 16133/17285 [144:27:32<10:29:05, 32.77s/it] 93%|█████████▎| 16134/17285 [144:28:06<10:37:17, 33.22s/it] 93%|█████████▎| 16135/17285 [144:28:35<10:09:02, 31.78s/it] 93%|█████████▎| 16136/17285 [144:29:07<10:09:00, 31.80s/it] 93%|█████████▎| 16137/17285 [144:29:50<11:13:09, 35.18s/it] 93%|█████████▎| 16138/17285 [144:30:16<10:21:16, 32.50s/it] 93%|█████████▎| 16139/17285 [144:31:00<11:27:46, 36.01s/it] 93%|█████████▎| 16140/17285 [144:31:35<11:19:57, 35.63s/it] {'loss': 1.2522, 'learning_rate': 3.210568250480306e-06, 'epoch': 2.8} + 93%|█████████▎| 16140/17285 [144:31:35<11:19:57, 35.63s/it] 93%|█████████▎| 16141/17285 [144:32:01<10:24:09, 32.74s/it] 93%|█████████▎| 16142/17285 [144:32:32<10:13:34, 32.21s/it] 93%|█████████▎| 16143/17285 [144:33:01<9:55:21, 31.28s/it] 93%|█████████▎| 16144/17285 [144:33:27<9:26:15, 29.78s/it] 93%|█████████▎| 16145/17285 [144:34:11<10:45:33, 33.98s/it] 93%|█████████▎| 16146/17285 [144:34:44<10:37:00, 33.56s/it] 93%|█████████▎| 16147/17285 [144:35:10<9:53:57, 31.32s/it] 93%|█████████▎| 16148/17285 [144:35:35<9:19:05, 29.50s/it] 93%|█████████▎| 16149/17285 [144:36:07<9:30:34, 30.14s/it] 93%|█████████▎| 16150/17285 [144:36:32<9:03:00, 28.71s/it] {'loss': 1.2958, 'learning_rate': 3.1626539078188687e-06, 'epoch': 2.8} + 93%|█████████▎| 16150/17285 [144:36:32<9:03:00, 28.71s/it] 93%|█████████▎| 16151/17285 [144:37:05<9:29:10, 30.12s/it] 93%|█████████▎| 16152/17285 [144:37:37<9:38:12, 30.62s/it] 93%|█████████▎| 16153/17285 [144:38:16<10:25:44, 33.17s/it] 93%|█████████▎| 16154/17285 [144:38:42<9:43:13, 30.94s/it] 93%|█████████▎| 16155/17285 [144:39:21<10:28:30, 33.37s/it] 93%|█████████▎| 16156/17285 [144:39:57<10:44:01, 34.23s/it] 93%|█████████▎| 16157/17285 [144:40:32<10:44:25, 34.28s/it] 93%|█████████▎| 16158/17285 [144:41:02<10:19:14, 32.97s/it] 93%|█████████▎| 16159/17285 [144:41:38<10:37:45, 33.98s/it] 93%|█████████▎| 16160/17285 [144:42:13<10:40:39, 34.17s/it] {'loss': 1.2353, 'learning_rate': 3.1150940487888804e-06, 'epoch': 2.8} + 93%|█████████▎| 16160/17285 [144:42:13<10:40:39, 34.17s/it] 93%|█████████▎| 16161/17285 [144:42:47<10:42:34, 34.30s/it] 94%|█████████▎| 16162/17285 [144:43:22<10:47:46, 34.61s/it] 94%|█████████▎| 16163/17285 [144:43:50<10:05:32, 32.38s/it] 94%|█████████▎| 16164/17285 [144:44:28<10:36:10, 34.05s/it] 94%|█████████▎| 16165/17285 [144:44:58<10:18:00, 33.11s/it] 94%|█████████▎| 16166/17285 [144:45:41<11:09:15, 35.89s/it] 94%|█████████▎| 16167/17285 [144:46:11<10:38:05, 34.24s/it] 94%|█████████▎| 16168/17285 [144:46:39<10:03:17, 32.41s/it] 94%|█████████▎| 16169/17285 [144:47:18<10:39:56, 34.41s/it] 94%|█████████▎| 16170/17285 [144:47:54<10:43:23, 34.62s/it] {'loss': 1.2498, 'learning_rate': 3.0678888474883316e-06, 'epoch': 2.81} + 94%|█████████▎| 16170/17285 [144:47:54<10:43:23, 34.62s/it] 94%|█████████▎| 16171/17285 [144:48:27<10:36:51, 34.30s/it] 94%|█████████▎| 16172/17285 [144:49:08<11:14:32, 36.36s/it] 94%|█████████▎| 16173/17285 [144:49:44<11:08:08, 36.05s/it] 94%|█████████▎| 16174/17285 [144:50:24<11:33:19, 37.44s/it] 94%|█████████▎| 16175/17285 [144:51:00<11:23:05, 36.92s/it] 94%|█████████▎| 16176/17285 [144:51:36<11:14:34, 36.50s/it] 94%|█████████▎| 16177/17285 [144:52:14<11:26:14, 37.16s/it] 94%|█████████▎| 16178/17285 [144:52:49<11:13:15, 36.49s/it] 94%|█████████▎| 16179/17285 [144:53:26<11:14:23, 36.59s/it] 94%|█████████▎| 16180/17285 [144:53:51<10:08:41, 33.05s/it] {'loss': 1.2708, 'learning_rate': 3.0210384767169975e-06, 'epoch': 2.81} + 94%|█████████▎| 16180/17285 [144:53:51<10:08:41, 33.05s/it] 94%|█████████▎| 16181/17285 [144:54:21<9:49:59, 32.06s/it] 94%|█████████▎| 16182/17285 [144:54:59<10:24:44, 33.98s/it] 94%|█████████▎| 16183/17285 [144:55:26<9:48:14, 32.03s/it] 94%|█████████▎| 16184/17285 [144:55:56<9:34:18, 31.30s/it] 94%|█████████▎| 16185/17285 [144:56:22<9:02:14, 29.58s/it] 94%|█████████▎| 16186/17285 [144:56:53<9:09:43, 30.01s/it] 94%|█████████▎| 16187/17285 [144:57:27<9:30:42, 31.19s/it] 94%|█████████▎| 16188/17285 [144:57:57<9:28:03, 31.07s/it] 94%|█████████▎| 16189/17285 [144:58:35<10:05:55, 33.17s/it] 94%|█████████▎| 16190/17285 [144:59:06<9:48:21, 32.24s/it] {'loss': 1.2368, 'learning_rate': 2.97454310797578e-06, 'epoch': 2.81} + 94%|█████████▎| 16190/17285 [144:59:06<9:48:21, 32.24s/it] 94%|█████████▎| 16191/17285 [144:59:38<9:51:42, 32.45s/it] 94%|█████████▎| 16192/17285 [145:00:10<9:45:45, 32.15s/it] 94%|█████████▎| 16193/17285 [145:00:36<9:12:19, 30.35s/it] 94%|█████████▎| 16194/17285 [145:01:07<9:13:47, 30.46s/it] 94%|█████████▎| 16195/17285 [145:01:38<9:19:31, 30.80s/it] 94%|█████████▎| 16196/17285 [145:02:09<9:17:54, 30.74s/it] 94%|█████████▎| 16197/17285 [145:02:36<8:55:04, 29.51s/it] 94%|█████████▎| 16198/17285 [145:03:14<9:45:37, 32.33s/it] 94%|█████████▎| 16199/17285 [145:03:47<9:47:10, 32.44s/it] 94%|█████████▎| 16200/17285 [145:04:12<9:03:18, 30.04s/it] {'loss': 1.2822, 'learning_rate': 2.9284029114660107e-06, 'epoch': 2.81} + 94%|█████████▎| 16200/17285 [145:04:12<9:03:18, 30.04s/it] 94%|█████████▎| 16201/17285 [145:04:42<9:02:36, 30.03s/it] 94%|█████████▎| 16202/17285 [145:05:08<8:41:20, 28.88s/it] 94%|█████████▎| 16203/17285 [145:05:40<8:56:50, 29.77s/it] 94%|█████████▎| 16204/17285 [145:06:24<10:17:29, 34.27s/it] 94%|█████████▍| 16205/17285 [145:06:57<10:06:08, 33.67s/it] 94%|█████████▍| 16206/17285 [145:07:27<9:46:59, 32.64s/it] 94%|█████████▍| 16207/17285 [145:07:58<9:37:43, 32.16s/it] 94%|█████████▍| 16208/17285 [145:08:36<10:09:00, 33.93s/it] 94%|█████████▍| 16209/17285 [145:09:12<10:16:36, 34.38s/it] 94%|█████████▍| 16210/17285 [145:09:46<10:14:34, 34.30s/it] {'loss': 1.2863, 'learning_rate': 2.8826180560888927e-06, 'epoch': 2.81} + 94%|█████████▍| 16210/17285 [145:09:46<10:14:34, 34.30s/it] 94%|█████████▍| 16211/17285 [145:10:25<10:38:42, 35.68s/it] 94%|█████████▍| 16212/17285 [145:10:56<10:15:48, 34.43s/it] 94%|█████████▍| 16213/17285 [145:11:30<10:09:52, 34.14s/it] 94%|█████████▍| 16214/17285 [145:12:02<10:03:06, 33.79s/it] 94%|█████████▍| 16215/17285 [145:12:39<10:15:42, 34.53s/it] 94%|█████████▍| 16216/17285 [145:13:08<9:46:02, 32.89s/it] 94%|█████████▍| 16217/17285 [145:13:39<9:34:19, 32.27s/it] 94%|█████████▍| 16218/17285 [145:14:15<9:53:23, 33.37s/it] 94%|█████████▍| 16219/17285 [145:14:40<9:09:07, 30.91s/it] 94%|█████████▍| 16220/17285 [145:15:11<9:12:54, 31.15s/it] {'loss': 1.2477, 'learning_rate': 2.837188709444882e-06, 'epoch': 2.82} + 94%|█████████▍| 16220/17285 [145:15:11<9:12:54, 31.15s/it] 94%|█████████▍| 16221/17285 [145:15:50<9:51:27, 33.35s/it] 94%|█████████▍| 16222/17285 [145:16:25<9:58:43, 33.79s/it] 94%|█████████▍| 16223/17285 [145:16:57<9:51:14, 33.40s/it] 94%|█████████▍| 16224/17285 [145:17:39<10:33:08, 35.80s/it] 94%|█████████▍| 16225/17285 [145:18:08<9:57:18, 33.81s/it] 94%|█████████▍| 16226/17285 [145:18:38<9:38:41, 32.79s/it] 94%|█████████▍| 16227/17285 [145:19:11<9:38:20, 32.80s/it] 94%|█████████▍| 16228/17285 [145:19:41<9:23:38, 31.99s/it] 94%|█████████▍| 16229/17285 [145:20:07<8:49:10, 30.07s/it] 94%|█████████▍| 16230/17285 [145:20:44<9:28:00, 32.30s/it] {'loss': 1.2577, 'learning_rate': 2.792115037833032e-06, 'epoch': 2.82} + 94%|█████████▍| 16230/17285 [145:20:44<9:28:00, 32.30s/it] 94%|█████████▍| 16231/17285 [145:21:20<9:46:43, 33.40s/it] 94%|█████████▍| 16232/17285 [145:21:52<9:37:02, 32.88s/it] 94%|█████████▍| 16233/17285 [145:22:27<9:48:01, 33.54s/it] 94%|█████████▍| 16234/17285 [145:23:00<9:47:09, 33.52s/it] 94%|█████████▍| 16235/17285 [145:23:32<9:34:39, 32.84s/it] 94%|█████████▍| 16236/17285 [145:24:04<9:32:02, 32.72s/it] 94%|█████████▍| 16237/17285 [145:24:34<9:17:05, 31.89s/it] 94%|█████████▍| 16238/17285 [145:25:02<8:53:33, 30.58s/it] 94%|█████████▍| 16239/17285 [145:25:40<9:31:51, 32.80s/it] 94%|█████████▍| 16240/17285 [145:26:10<9:18:20, 32.06s/it] {'loss': 1.2445, 'learning_rate': 2.7473972062503905e-06, 'epoch': 2.82} + 94%|█████████▍| 16240/17285 [145:26:10<9:18:20, 32.06s/it] 94%|█████████▍| 16241/17285 [145:26:43<9:24:57, 32.47s/it] 94%|█████████▍| 16242/17285 [145:27:13<9:10:42, 31.68s/it] 94%|█████████▍| 16243/17285 [145:27:45<9:12:30, 31.81s/it] 94%|█████████▍| 16244/17285 [145:28:13<8:53:11, 30.73s/it] 94%|█████████▍| 16245/17285 [145:28:44<8:53:50, 30.80s/it] 94%|█████████▍| 16246/17285 [145:29:17<9:02:40, 31.34s/it] 94%|█████████▍| 16247/17285 [145:29:52<9:23:22, 32.56s/it][2023-08-29 01:25:09,289] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 94%|█████████▍| 16248/17285 [145:30:32<9:56:52, 34.53s/it] 94%|█████████▍| 16249/17285 [145:31:08<10:03:55, 34.98s/it] 94%|█████████▍| 16250/17285 [145:31:36<9:31:30, 33.13s/it] {'loss': 1.2237, 'learning_rate': 2.707455536371439e-06, 'epoch': 2.82} + 94%|█████████▍| 16250/17285 [145:31:36<9:31:30, 33.13s/it] 94%|█████████▍| 16251/17285 [145:32:05<9:08:43, 31.84s/it] 94%|█████████▍| 16252/17285 [145:32:38<9:15:00, 32.24s/it] 94%|█████████▍| 16253/17285 [145:33:11<9:16:30, 32.36s/it] 94%|█████████▍| 16254/17285 [145:33:45<9:25:06, 32.89s/it] 94%|█████████▍| 16255/17285 [145:34:16<9:14:23, 32.29s/it] 94%|█████████▍| 16256/17285 [145:34:42<8:41:03, 30.38s/it] 94%|█████████▍| 16257/17285 [145:35:17<9:05:46, 31.85s/it] 94%|█████████▍| 16258/17285 [145:35:53<9:23:24, 32.92s/it] 94%|█████████▍| 16259/17285 [145:36:24<9:12:31, 32.31s/it] 94%|█████████▍| 16260/17285 [145:37:02<9:43:17, 34.14s/it] {'loss': 1.2587, 'learning_rate': 2.6634142507455885e-06, 'epoch': 2.82} + 94%|█████████▍| 16260/17285 [145:37:02<9:43:17, 34.14s/it] 94%|█████████▍| 16261/17285 [145:37:37<9:49:17, 34.53s/it] 94%|█████████▍| 16262/17285 [145:38:05<9:14:17, 32.51s/it] 94%|█████████▍| 16263/17285 [145:38:39<9:20:24, 32.90s/it] 94%|█████████▍| 16264/17285 [145:39:22<10:12:32, 36.00s/it] 94%|█████████▍| 16265/17285 [145:40:01<10:24:29, 36.73s/it] 94%|█████████▍| 16266/17285 [145:40:38<10:25:33, 36.83s/it] 94%|█████████▍| 16267/17285 [145:41:17<10:34:41, 37.41s/it][2023-08-29 01:36:22,161] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 94%|█████████▍| 16268/17285 [145:41:44<9:45:51, 34.56s/it] 94%|█████████▍| 16269/17285 [145:42:13<9:13:24, 32.68s/it] 94%|█████████▍| 16270/17285 [145:42:57<10:10:31, 36.09s/it] {'loss': 1.2759, 'learning_rate': 2.624081735149897e-06, 'epoch': 2.82} + 94%|█████████▍| 16270/17285 [145:42:57<10:10:31, 36.09s/it] 94%|█████████▍| 16271/17285 [145:43:31<9:57:47, 35.37s/it] 94%|█████████▍| 16272/17285 [145:44:17<10:54:30, 38.77s/it] 94%|█████████▍| 16273/17285 [145:44:52<10:32:39, 37.51s/it] 94%|█████████▍| 16274/17285 [145:45:20<9:44:21, 34.68s/it] 94%|█████████▍| 16275/17285 [145:45:50<9:21:55, 33.38s/it] 94%|█████████▍| 16276/17285 [145:46:24<9:22:42, 33.46s/it] 94%|█████████▍| 16277/17285 [145:46:51<8:48:34, 31.46s/it] 94%|█████████▍| 16278/17285 [145:47:18<8:27:00, 30.21s/it] 94%|█████████▍| 16279/17285 [145:47:56<9:04:00, 32.45s/it] 94%|█████████▍| 16280/17285 [145:48:33<9:30:52, 34.08s/it] {'loss': 1.2786, 'learning_rate': 2.580717577477021e-06, 'epoch': 2.83} + 94%|█████████▍| 16280/17285 [145:48:34<9:30:52, 34.08s/it] 94%|█████████▍| 16281/17285 [145:49:02<9:01:58, 32.39s/it] 94%|█████████▍| 16282/17285 [145:49:37<9:13:54, 33.14s/it] 94%|█████████▍| 16283/17285 [145:50:06<8:53:00, 31.92s/it] 94%|█████████▍| 16284/17285 [145:50:36<8:44:36, 31.45s/it] 94%|█████████▍| 16285/17285 [145:51:04<8:25:39, 30.34s/it] 94%|█████████▍| 16286/17285 [145:51:32<8:11:51, 29.54s/it] 94%|█████████▍| 16287/17285 [145:52:00<8:04:00, 29.10s/it] 94%|█████████▍| 16288/17285 [145:52:37<8:44:50, 31.59s/it] 94%|█████████▍| 16289/17285 [145:53:08<8:43:14, 31.52s/it] 94%|█████████▍| 16290/17285 [145:53:45<9:07:19, 33.00s/it] {'loss': 1.272, 'learning_rate': 2.5377100336767545e-06, 'epoch': 2.83} + 94%|█████████▍| 16290/17285 [145:53:45<9:07:19, 33.00s/it] 94%|█████████▍| 16291/17285 [145:54:16<8:55:23, 32.32s/it] 94%|█████████▍| 16292/17285 [145:54:51<9:08:40, 33.15s/it] 94%|█████████▍| 16293/17285 [145:55:24<9:07:27, 33.11s/it] 94%|█████████▍| 16294/17285 [145:56:01<9:25:48, 34.26s/it] 94%|█████████▍| 16295/17285 [145:56:34<9:22:38, 34.10s/it] 94%|█████████▍| 16296/17285 [145:57:13<9:44:18, 35.45s/it] 94%|█████████▍| 16297/17285 [145:57:43<9:14:43, 33.69s/it] 94%|█████████▍| 16298/17285 [145:58:15<9:08:01, 33.31s/it] 94%|█████████▍| 16299/17285 [145:58:57<9:48:41, 35.82s/it] 94%|█████████▍| 16300/17285 [145:59:30<9:33:37, 34.94s/it] {'loss': 1.2404, 'learning_rate': 2.495059261182886e-06, 'epoch': 2.83} + 94%|█████████▍| 16300/17285 [145:59:30<9:33:37, 34.94s/it] 94%|█████████▍| 16301/17285 [146:00:06<9:41:08, 35.44s/it] 94%|█████████▍| 16302/17285 [146:00:38<9:21:52, 34.30s/it] 94%|█████████▍| 16303/17285 [146:01:12<9:22:57, 34.40s/it] 94%|█████████▍| 16304/17285 [146:01:41<8:54:11, 32.67s/it] 94%|█████████▍| 16305/17285 [146:02:10<8:32:42, 31.39s/it] 94%|█████████▍| 16306/17285 [146:02:36<8:07:19, 29.87s/it] 94%|█████████▍| 16307/17285 [146:03:09<8:21:55, 30.79s/it] 94%|█████████▍| 16308/17285 [146:03:38<8:14:03, 30.34s/it] 94%|█████████▍| 16309/17285 [146:04:05<7:55:43, 29.25s/it] 94%|█████████▍| 16310/17285 [146:04:34<7:57:05, 29.36s/it] {'loss': 1.2751, 'learning_rate': 2.452765416123215e-06, 'epoch': 2.83} + 94%|█████████▍| 16310/17285 [146:04:34<7:57:05, 29.36s/it] 94%|█████████▍| 16311/17285 [146:05:06<8:09:07, 30.13s/it] 94%|█████████▍| 16312/17285 [146:05:38<8:15:11, 30.54s/it] 94%|█████████▍| 16313/17285 [146:06:10<8:22:17, 31.01s/it] 94%|█████████▍| 16314/17285 [146:06:45<8:42:05, 32.26s/it] 94%|█████████▍| 16315/17285 [146:07:11<8:09:36, 30.29s/it] 94%|█████████▍| 16316/17285 [146:07:39<7:57:30, 29.57s/it] 94%|█████████▍| 16317/17285 [146:08:12<8:16:36, 30.78s/it] 94%|█████████▍| 16318/17285 [146:08:41<8:06:33, 30.19s/it] 94%|█████████▍| 16319/17285 [146:09:18<8:36:12, 32.06s/it] 94%|█████████▍| 16320/17285 [146:09:43<8:04:55, 30.15s/it] {'loss': 1.2624, 'learning_rate': 2.4108286533189527e-06, 'epoch': 2.83} + 94%|█████████▍| 16320/17285 [146:09:43<8:04:55, 30.15s/it] 94%|█████████▍| 16321/17285 [146:10:14<8:07:53, 30.37s/it] 94%|█████████▍| 16322/17285 [146:10:47<8:18:56, 31.09s/it] 94%|█████████▍| 16323/17285 [146:11:21<8:35:07, 32.13s/it] 94%|█████████▍| 16324/17285 [146:11:51<8:24:29, 31.50s/it] 94%|█████████▍| 16325/17285 [146:12:27<8:44:32, 32.78s/it] 94%|█████████▍| 16326/17285 [146:13:11<9:35:30, 36.01s/it] 94%|█████████▍| 16327/17285 [146:13:35<8:38:44, 32.49s/it] 94%|█████████▍| 16328/17285 [146:14:10<8:51:50, 33.34s/it] 94%|█████████▍| 16329/17285 [146:14:37<8:18:02, 31.26s/it] 94%|█████████▍| 16330/17285 [146:15:09<8:22:20, 31.56s/it] {'loss': 1.2965, 'learning_rate': 2.3692491262841785e-06, 'epoch': 2.83} + 94%|█████████▍| 16330/17285 [146:15:09<8:22:20, 31.56s/it] 94%|█████████▍| 16331/17285 [146:15:35<7:54:00, 29.81s/it] 94%|█████████▍| 16332/17285 [146:16:15<8:42:12, 32.88s/it] 94%|█████████▍| 16333/17285 [146:16:41<8:12:13, 31.02s/it] 94%|█████████▍| 16334/17285 [146:17:18<8:36:44, 32.60s/it] 95%|█████████▍| 16335/17285 [146:17:50<8:32:40, 32.38s/it] 95%|█████████▍| 16336/17285 [146:18:19<8:19:40, 31.59s/it] 95%|█████████▍| 16337/17285 [146:18:57<8:49:27, 33.51s/it] 95%|█████████▍| 16338/17285 [146:19:27<8:32:16, 32.46s/it] 95%|█████████▍| 16339/17285 [146:19:58<8:24:22, 31.99s/it] 95%|█████████▍| 16340/17285 [146:20:33<8:35:52, 32.75s/it] {'loss': 1.2947, 'learning_rate': 2.3280269872252847e-06, 'epoch': 2.84} + 95%|████��████▍| 16340/17285 [146:20:33<8:35:52, 32.75s/it] 95%|█████████▍| 16341/17285 [146:21:16<9:26:22, 36.00s/it] 95%|█████████▍| 16342/17285 [146:21:44<8:47:32, 33.57s/it] 95%|█████████▍| 16343/17285 [146:22:13<8:24:03, 32.11s/it] 95%|█████████▍| 16344/17285 [146:22:45<8:21:36, 31.98s/it] 95%|█████████▍| 16345/17285 [146:23:19<8:33:15, 32.76s/it] 95%|█████████▍| 16346/17285 [146:23:55<8:48:38, 33.78s/it] 95%|█████████▍| 16347/17285 [146:24:24<8:24:16, 32.26s/it] 95%|█████████▍| 16348/17285 [146:24:53<8:06:41, 31.16s/it] 95%|█████████▍| 16349/17285 [146:25:20<7:45:40, 29.85s/it] 95%|█████████▍| 16350/17285 [146:25:48<7:38:43, 29.44s/it] {'loss': 1.2839, 'learning_rate': 2.287162387040365e-06, 'epoch': 2.84} + 95%|█████████▍| 16350/17285 [146:25:48<7:38:43, 29.44s/it] 95%|█████████▍| 16351/17285 [146:26:26<8:17:09, 31.94s/it] 95%|█████████▍| 16352/17285 [146:26:59<8:21:54, 32.28s/it] 95%|█████████▍| 16353/17285 [146:27:29<8:09:22, 31.50s/it][2023-08-29 02:22:34,691] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 95%|█████████▍| 16354/17285 [146:27:57<7:54:46, 30.60s/it] 95%|█████████▍| 16355/17285 [146:28:28<7:53:57, 30.58s/it] 95%|█████████▍| 16356/17285 [146:28:59<7:55:46, 30.73s/it] 95%|█████████▍| 16357/17285 [146:29:38<8:34:34, 33.27s/it] 95%|█████████▍| 16358/17285 [146:30:19<9:10:52, 35.66s/it] 95%|█████████▍| 16359/17285 [146:30:44<8:22:44, 32.57s/it] 95%|█████████▍| 16360/17285 [146:31:17<8:22:33, 32.60s/it] {'loss': 1.2637, 'learning_rate': 2.2506900662738086e-06, 'epoch': 2.84} + 95%|█████████▍| 16360/17285 [146:31:17<8:22:33, 32.60s/it] 95%|█████████▍| 16361/17285 [146:31:51<8:26:44, 32.90s/it] 95%|█████████▍| 16362/17285 [146:32:26<8:35:42, 33.52s/it] 95%|█████████▍| 16363/17285 [146:32:55<8:16:10, 32.29s/it] 95%|█████████▍| 16364/17285 [146:33:28<8:16:20, 32.34s/it] 95%|█████████▍| 16365/17285 [146:33:57<8:02:27, 31.46s/it] 95%|█████████▍| 16366/17285 [146:34:32<8:19:32, 32.61s/it] 95%|█████████▍| 16367/17285 [146:35:10<8:43:14, 34.20s/it] 95%|█████████▍| 16368/17285 [146:35:38<8:14:07, 32.33s/it] 95%|█████████▍| 16369/17285 [146:36:09<8:07:28, 31.93s/it] 95%|█████████▍| 16370/17285 [146:36:46<8:30:31, 33.48s/it] {'loss': 1.2615, 'learning_rate': 2.210505200985846e-06, 'epoch': 2.84} + 95%|█████████▍| 16370/17285 [146:36:46<8:30:31, 33.48s/it] 95%|█████████▍| 16371/17285 [146:37:18<8:23:53, 33.08s/it] 95%|█████████▍| 16372/17285 [146:37:44<7:50:06, 30.89s/it] 95%|█████████▍| 16373/17285 [146:38:18<8:01:04, 31.65s/it] 95%|█████████▍| 16374/17285 [146:38:48<7:55:06, 31.29s/it] 95%|█████████▍| 16375/17285 [146:39:16<7:41:15, 30.41s/it] 95%|█████████▍| 16376/17285 [146:39:47<7:42:36, 30.54s/it] 95%|█████████▍| 16377/17285 [146:40:23<8:07:23, 32.21s/it] 95%|█████████▍| 16378/17285 [146:40:50<7:42:26, 30.59s/it] 95%|█████████▍| 16379/17285 [146:41:24<7:56:39, 31.57s/it] 95%|█████████▍| 16380/17285 [146:42:06<8:42:20, 34.63s/it] {'loss': 1.2743, 'learning_rate': 2.1706783047731326e-06, 'epoch': 2.84} + 95%|█████████▍| 16380/17285 [146:42:06<8:42:20, 34.63s/it] 95%|█████████▍| 16381/17285 [146:42:46<9:08:33, 36.41s/it] 95%|█████████▍| 16382/17285 [146:43:24<9:15:42, 36.92s/it] 95%|█████████▍| 16383/17285 [146:43:54<8:41:12, 34.67s/it] 95%|█████████▍| 16384/17285 [146:44:26<8:30:05, 33.97s/it] 95%|█████████▍| 16385/17285 [146:45:03<8:40:37, 34.71s/it] 95%|█████████▍| 16386/17285 [146:45:34<8:25:54, 33.76s/it] 95%|█████████▍| 16387/17285 [146:46:06<8:14:55, 33.07s/it] 95%|█████████▍| 16388/17285 [146:46:35<7:59:04, 32.05s/it] 95%|█████████▍| 16389/17285 [146:47:11<8:15:57, 33.21s/it] 95%|█████████▍| 16390/17285 [146:47:45<8:18:42, 33.43s/it] {'loss': 1.2368, 'learning_rate': 2.1312095234263807e-06, 'epoch': 2.84} + 95%|█████████▍| 16390/17285 [146:47:45<8:18:42, 33.43s/it] 95%|█████████▍| 16391/17285 [146:48:12<7:49:24, 31.50s/it] 95%|█████████▍| 16392/17285 [146:48:48<8:08:43, 32.84s/it] 95%|█████████▍| 16393/17285 [146:49:21<8:09:00, 32.89s/it] 95%|█████████▍| 16394/17285 [146:49:52<7:58:53, 32.25s/it] 95%|█████████▍| 16395/17285 [146:50:20<7:40:59, 31.08s/it] 95%|█████████▍| 16396/17285 [146:50:45<7:12:03, 29.16s/it] 95%|█████████▍| 16397/17285 [146:51:17<7:24:41, 30.05s/it] 95%|█████████▍| 16398/17285 [146:51:50<7:39:01, 31.05s/it] 95%|█████████▍| 16399/17285 [146:52:20<7:31:34, 30.58s/it] 95%|█████████▍| 16400/17285 [146:52:50<7:29:57, 30.51s/it] {'loss': 1.2542, 'learning_rate': 2.0920990014253185e-06, 'epoch': 2.85} + 95%|█████████▍| 16400/17285 [146:52:50<7:29:57, 30.51s/it] 95%|█████████▍| 16401/17285 [146:53:23<7:37:25, 31.05s/it] 95%|█████████▍| 16402/17285 [146:53:54<7:38:11, 31.13s/it] 95%|█████████▍| 16403/17285 [146:54:29<7:53:41, 32.22s/it] 95%|█████████▍| 16404/17285 [146:55:03<8:04:27, 32.99s/it] 95%|█████████▍| 16405/17285 [146:55:40<8:18:19, 33.98s/it] 95%|█████████▍| 16406/17285 [146:56:16<8:26:19, 34.56s/it] 95%|█████████▍| 16407/17285 [146:56:54<8:40:40, 35.58s/it] 95%|█████████▍| 16408/17285 [146:57:33<8:57:04, 36.74s/it] 95%|█████████▍| 16409/17285 [146:58:02<8:21:23, 34.34s/it] 95%|█████████▍| 16410/17285 [146:58:32<8:04:04, 33.19s/it] {'loss': 1.2367, 'learning_rate': 2.0533468819382893e-06, 'epoch': 2.85} + 95%|█████████▍| 16410/17285 [146:58:32<8:04:04, 33.19s/it] 95%|█████████▍| 16411/17285 [146:59:06<8:04:07, 33.23s/it] 95%|█████████▍| 16412/17285 [146:59:45<8:28:51, 34.97s/it] 95%|█████████▍| 16413/17285 [147:00:10<7:48:15, 32.22s/it] 95%|█████████▍| 16414/17285 [147:00:45<7:57:08, 32.87s/it] 95%|█████████▍| 16415/17285 [147:01:21<8:11:16, 33.88s/it] 95%|█████████▍| 16416/17285 [147:01:59<8:26:28, 34.97s/it] 95%|█████████▍| 16417/17285 [147:02:38<8:47:13, 36.44s/it] 95%|█████████▍| 16418/17285 [147:03:05<8:02:59, 33.43s/it] 95%|█████████▍| 16419/17285 [147:03:34<7:42:12, 32.02s/it] 95%|█████████▍| 16420/17285 [147:04:04<7:32:53, 31.41s/it] {'loss': 1.2418, 'learning_rate': 2.014953306821632e-06, 'epoch': 2.85} + 95%|█████████▍| 16420/17285 [147:04:04<7:32:53, 31.41s/it] 95%|█████████▌| 16421/17285 [147:04:40<7:55:48, 33.04s/it] 95%|█████████▌| 16422/17285 [147:05:09<7:35:29, 31.67s/it] 95%|█████████▌| 16423/17285 [147:05:48<8:07:22, 33.92s/it] 95%|█████████▌| 16424/17285 [147:06:18<7:50:22, 32.78s/it] 95%|█████████▌| 16425/17285 [147:06:49<7:42:17, 32.25s/it] 95%|█████████▌| 16426/17285 [147:07:25<7:57:10, 33.33s/it] 95%|█████████▌| 16427/17285 [147:08:08<8:37:39, 36.20s/it] 95%|█████████▌| 16428/17285 [147:08:50<9:02:05, 37.95s/it] 95%|█████████▌| 16429/17285 [147:09:19<8:21:33, 35.16s/it] 95%|█████████▌| 16430/17285 [147:09:48<7:54:42, 33.31s/it] {'loss': 1.2555, 'learning_rate': 1.976918416619211e-06, 'epoch': 2.85} + 95%|█████████▌| 16430/17285 [147:09:48<7:54:42, 33.31s/it] 95%|█████████▌| 16431/17285 [147:10:22<7:58:50, 33.64s/it] 95%|█████████▌| 16432/17285 [147:10:53<7:48:16, 32.94s/it] 95%|█████████▌| 16433/17285 [147:11:28<7:54:53, 33.44s/it] 95%|█████████▌| 16434/17285 [147:11:57<7:37:34, 32.26s/it] 95%|█████████▌| 16435/17285 [147:12:32<7:47:23, 32.99s/it] 95%|█████████▌| 16436/17285 [147:13:01<7:28:58, 31.73s/it] 95%|█████████▌| 16437/17285 [147:13:29<7:12:57, 30.63s/it] 95%|█████████▌| 16438/17285 [147:14:02<7:23:30, 31.42s/it] 95%|█████████▌| 16439/17285 [147:14:30<7:07:03, 30.29s/it] 95%|█████████▌| 16440/17285 [147:15:06<7:31:44, 32.08s/it] {'loss': 1.2583, 'learning_rate': 1.939242350561854e-06, 'epoch': 2.85} + 95%|█████████▌| 16440/17285 [147:15:06<7:31:44, 32.08s/it] 95%|█████████▌| 16441/17285 [147:15:39<7:33:07, 32.21s/it] 95%|█████████▌| 16442/17285 [147:16:08<7:18:33, 31.21s/it] 95%|█████████▌| 16443/17285 [147:16:40<7:24:51, 31.70s/it] 95%|█████████▌| 16444/17285 [147:17:14<7:32:03, 32.25s/it] 95%|█████████▌| 16445/17285 [147:17:47<7:35:07, 32.51s/it] 95%|█████████▌| 16446/17285 [147:18:18<7:29:33, 32.15s/it] 95%|█████████▌| 16447/17285 [147:18:53<7:38:59, 32.86s/it] 95%|█████████▌| 16448/17285 [147:19:25<7:37:05, 32.77s/it] 95%|█████████▌| 16449/17285 [147:20:03<7:56:30, 34.20s/it] 95%|█████████▌| 16450/17285 [147:20:35<7:48:30, 33.66s/it] {'loss': 1.2734, 'learning_rate': 1.9019252465669046e-06, 'epoch': 2.86} + 95%|█████████▌| 16450/17285 [147:20:35<7:48:30, 33.66s/it] 95%|█████████▌| 16451/17285 [147:21:10<7:53:18, 34.05s/it] 95%|█████████▌| 16452/17285 [147:21:48<8:06:10, 35.02s/it] 95%|█████████▌| 16453/17285 [147:22:23<8:07:39, 35.17s/it] 95%|█████████▌| 16454/17285 [147:22:57<8:03:23, 34.90s/it] 95%|█████████▌| 16455/17285 [147:23:27<7:39:23, 33.21s/it] 95%|█████████▌| 16456/17285 [147:24:01<7:44:14, 33.60s/it] 95%|█████████▌| 16457/17285 [147:24:32<7:32:18, 32.78s/it] 95%|█████████▌| 16458/17285 [147:25:06<7:36:02, 33.09s/it] 95%|█████████▌| 16459/17285 [147:25:46<8:05:41, 35.28s/it] 95%|█████████▌| 16460/17285 [147:26:19<7:55:52, 34.61s/it] {'loss': 1.2125, 'learning_rate': 1.8649672412376916e-06, 'epoch': 2.86} + 95%|█████████▌| 16460/17285 [147:26:19<7:55:52, 34.61s/it] 95%|█████████▌| 16461/17285 [147:26:47<7:27:12, 32.56s/it] 95%|█████████▌| 16462/17285 [147:27:12<6:54:48, 30.24s/it] 95%|█████████▌| 16463/17285 [147:27:40<6:45:12, 29.58s/it] 95%|█████████▌| 16464/17285 [147:28:06<6:30:22, 28.53s/it] 95%|█████████▌| 16465/17285 [147:28:37<6:38:21, 29.15s/it] 95%|█████████▌| 16466/17285 [147:29:08<6:47:46, 29.87s/it] 95%|█████████▌| 16467/17285 [147:29:40<6:56:25, 30.54s/it] 95%|█████████▌| 16468/17285 [147:30:27<8:02:48, 35.46s/it] 95%|█████████▌| 16469/17285 [147:30:57<7:38:19, 33.70s/it] 95%|█████████▌| 16470/17285 [147:31:29<7:30:50, 33.19s/it] {'loss': 1.2853, 'learning_rate': 1.8283684698629843e-06, 'epoch': 2.86} + 95%|█████████▌| 16470/17285 [147:31:29<7:30:50, 33.19s/it] 95%|█████████▌| 16471/17285 [147:31:57<7:10:11, 31.71s/it] 95%|█████████▌| 16472/17285 [147:32:32<7:21:14, 32.56s/it] 95%|█████████▌| 16473/17285 [147:33:11<7:49:43, 34.71s/it] 95%|█████████▌| 16474/17285 [147:33:41<7:30:31, 33.33s/it] 95%|█████████▌| 16475/17285 [147:34:07<6:59:00, 31.04s/it] 95%|█████████▌| 16476/17285 [147:34:50<7:45:59, 34.56s/it] 95%|█████████▌| 16477/17285 [147:35:27<7:56:10, 35.36s/it] 95%|█████████▌| 16478/17285 [147:35:58<7:38:02, 34.06s/it] 95%|█████████▌| 16479/17285 [147:36:24<7:03:18, 31.51s/it] 95%|█████████▌| 16480/17285 [147:36:58<7:13:18, 32.30s/it] {'loss': 1.2621, 'learning_rate': 1.7921290664165923e-06, 'epoch': 2.86} + 95%|█████████▌| 16480/17285 [147:36:58<7:13:18, 32.30s/it] 95%|█████████▌| 16481/17285 [147:37:38<7:45:46, 34.76s/it] 95%|█████████▌| 16482/17285 [147:38:15<7:52:33, 35.31s/it] 95%|█████████▌| 16483/17285 [147:38:52<7:59:07, 35.85s/it] 95%|█████████▌| 16484/17285 [147:39:30<8:07:54, 36.55s/it] 95%|█████████▌| 16485/17285 [147:39:58<7:31:34, 33.87s/it] 95%|█████████▌| 16486/17285 [147:40:28<7:14:18, 32.61s/it] 95%|█████████▌| 16487/17285 [147:41:06<7:37:26, 34.39s/it] 95%|█████████▌| 16488/17285 [147:41:42<7:44:24, 34.96s/it] 95%|█████████▌| 16489/17285 [147:42:14<7:30:12, 33.94s/it] 95%|█████████▌| 16490/17285 [147:42:49<7:34:37, 34.31s/it] {'loss': 1.2599, 'learning_rate': 1.756249163556778e-06, 'epoch': 2.86} + 95%|█████████▌| 16490/17285 [147:42:49<7:34:37, 34.31s/it] 95%|█████████▌| 16491/17285 [147:43:24<7:36:47, 34.52s/it] 95%|█████████▌| 16492/17285 [147:43:56<7:25:41, 33.72s/it] 95%|█████████▌| 16493/17285 [147:44:29<7:22:24, 33.52s/it] 95%|█████████▌| 16494/17285 [147:45:04<7:25:45, 33.81s/it] 95%|█████████▌| 16495/17285 [147:45:39<7:33:38, 34.45s/it] 95%|█████████▌| 16496/17285 [147:46:13<7:30:16, 34.24s/it] 95%|█████████▌| 16497/17285 [147:46:45<7:20:04, 33.51s/it] 95%|█████████▌| 16498/17285 [147:47:19<7:23:12, 33.79s/it] 95%|█████████▌| 16499/17285 [147:47:46<6:54:27, 31.64s/it] 95%|█████████▌| 16500/17285 [147:48:23<7:15:51, 33.31s/it] {'loss': 1.2865, 'learning_rate': 1.7207288926258225e-06, 'epoch': 2.86} + 95%|█████████▌| 16500/17285 [147:48:23<7:15:51, 33.31s/it] 95%|█████████▌| 16501/17285 [147:48:53<7:02:50, 32.36s/it] 95%|█████████▌| 16502/17285 [147:49:23<6:50:24, 31.45s/it] 95%|█████████▌| 16503/17285 [147:50:03<7:25:38, 34.19s/it] 95%|█████████▌| 16504/17285 [147:50:33<7:08:06, 32.89s/it] 95%|█████████▌| 16505/17285 [147:51:09<7:20:13, 33.86s/it] 95%|█████████▌| 16506/17285 [147:51:38<6:58:41, 32.25s/it] 95%|█████████▌| 16507/17285 [147:52:05<6:38:37, 30.74s/it] 96%|█████████▌| 16508/17285 [147:52:39<6:48:43, 31.56s/it] 96%|█████████▌| 16509/17285 [147:53:06<6:32:41, 30.36s/it] 96%|█████████▌| 16510/17285 [147:53:43<6:57:50, 32.35s/it] {'loss': 1.238, 'learning_rate': 1.6855683836495383e-06, 'epoch': 2.87} + 96%|█████████▌| 16510/17285 [147:53:43<6:57:50, 32.35s/it] 96%|█████████▌| 16511/17285 [147:54:23<7:27:00, 34.65s/it] 96%|█████████▌| 16512/17285 [147:55:06<7:59:22, 37.21s/it] 96%|█████████▌| 16513/17285 [147:55:36<7:30:34, 35.02s/it] 96%|█████████▌| 16514/17285 [147:56:09<7:21:00, 34.32s/it] 96%|█████████▌| 16515/17285 [147:56:42<7:14:52, 33.89s/it] 96%|█████████▌| 16516/17285 [147:57:17<7:20:57, 34.41s/it] 96%|█████████▌| 16517/17285 [147:57:49<7:08:41, 33.49s/it] 96%|█████████▌| 16518/17285 [147:58:22<7:05:27, 33.28s/it] 96%|█████████▌| 16519/17285 [147:59:04<7:41:45, 36.17s/it] 96%|█████████▌| 16520/17285 [147:59:33<7:11:10, 33.82s/it] {'loss': 1.2989, 'learning_rate': 1.6507677653367915e-06, 'epoch': 2.87} + 96%|█████████▌| 16520/17285 [147:59:33<7:11:10, 33.82s/it] 96%|█████████▌| 16521/17285 [148:00:02<6:53:22, 32.46s/it] 96%|█████████▌| 16522/17285 [148:00:35<6:53:34, 32.52s/it] 96%|█████████▌| 16523/17285 [148:00:59<6:22:02, 30.08s/it] 96%|█████████▌| 16524/17285 [148:01:39<6:56:58, 32.88s/it] 96%|█████████▌| 16525/17285 [148:02:17<7:16:57, 34.50s/it] 96%|█████████▌| 16526/17285 [148:02:50<7:10:53, 34.06s/it] 96%|█████████▌| 16527/17285 [148:03:27<7:21:50, 34.97s/it] 96%|█████████▌| 16528/17285 [148:03:59<7:08:47, 33.99s/it] 96%|█████████▌| 16529/17285 [148:04:33<7:09:50, 34.11s/it] 96%|█████████▌| 16530/17285 [148:04:59<6:37:47, 31.61s/it] {'loss': 1.2784, 'learning_rate': 1.6163271650790456e-06, 'epoch': 2.87} + 96%|█████████▌| 16530/17285 [148:04:59<6:37:47, 31.61s/it] 96%|█████████▌| 16531/17285 [148:05:30<6:36:51, 31.58s/it] 96%|█████████▌| 16532/17285 [148:05:56<6:12:22, 29.67s/it] 96%|█████████▌| 16533/17285 [148:06:30<6:30:39, 31.17s/it] 96%|█████████▌| 16534/17285 [148:06:57<6:13:16, 29.82s/it] 96%|█████████▌| 16535/17285 [148:07:30<6:25:05, 30.81s/it] 96%|█████████▌| 16536/17285 [148:08:05<6:39:41, 32.02s/it] 96%|█████████▌| 16537/17285 [148:08:32<6:22:24, 30.67s/it] 96%|█████████▌| 16538/17285 [148:09:02<6:17:47, 30.35s/it] 96%|█████████▌| 16539/17285 [148:09:29<6:06:35, 29.48s/it] 96%|█████████▌| 16540/17285 [148:09:56<5:53:56, 28.51s/it] {'loss': 1.2912, 'learning_rate': 1.5822467089498304e-06, 'epoch': 2.87} + 96%|█████████▌| 16540/17285 [148:09:56<5:53:56, 28.51s/it] 96%|█████████▌| 16541/17285 [148:10:30<6:16:57, 30.40s/it] 96%|█████████▌| 16542/17285 [148:11:02<6:19:16, 30.63s/it] 96%|█████████▌| 16543/17285 [148:11:31<6:13:50, 30.23s/it] 96%|█████████▌| 16544/17285 [148:12:01<6:12:00, 30.12s/it] 96%|█████████▌| 16545/17285 [148:12:31<6:13:26, 30.28s/it] 96%|█████████▌| 16546/17285 [148:13:08<6:37:58, 32.31s/it] 96%|█████████▌| 16547/17285 [148:13:46<6:56:53, 33.89s/it] 96%|█████████▌| 16548/17285 [148:14:21<6:58:42, 34.09s/it] 96%|█████████▌| 16549/17285 [148:14:57<7:05:17, 34.67s/it] 96%|█████████▌| 16550/17285 [148:15:27<6:48:24, 33.34s/it] {'loss': 1.2584, 'learning_rate': 1.5485265217043854e-06, 'epoch': 2.87} + 96%|█████████▌| 16550/17285 [148:15:27<6:48:24, 33.34s/it] 96%|█████████▌| 16551/17285 [148:15:58<6:39:08, 32.63s/it] 96%|█████████▌| 16552/17285 [148:16:25<6:19:59, 31.10s/it] 96%|█████████▌| 16553/17285 [148:17:03<6:43:10, 33.05s/it] 96%|█████████▌| 16554/17285 [148:17:32<6:28:15, 31.87s/it] 96%|█████████▌| 16555/17285 [148:18:08<6:43:50, 33.19s/it][2023-08-29 04:13:13,148] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 96%|█████████▌| 16556/17285 [148:18:35<6:21:01, 31.36s/it] 96%|█████████▌| 16557/17285 [148:19:06<6:16:31, 31.03s/it] 96%|█████████▌| 16558/17285 [148:19:32<5:57:32, 29.51s/it] 96%|█████████▌| 16559/17285 [148:20:08<6:22:26, 31.61s/it] 96%|█████████▌| 16560/17285 [148:20:36<6:06:23, 30.32s/it] {'loss': 1.2535, 'learning_rate': 1.5184864851265469e-06, 'epoch': 2.87} + 96%|█████████▌| 16560/17285 [148:20:36<6:06:23, 30.32s/it] 96%|█████████▌| 16561/17285 [148:21:08<6:13:39, 30.97s/it] 96%|█████████▌| 16562/17285 [148:21:45<6:36:35, 32.91s/it] 96%|█████████▌| 16563/17285 [148:22:16<6:27:09, 32.17s/it] 96%|█████████▌| 16564/17285 [148:22:46<6:17:51, 31.45s/it] 96%|█████████▌| 16565/17285 [148:23:17<6:17:04, 31.42s/it] 96%|█████████▌| 16566/17285 [148:23:45<6:03:34, 30.34s/it] 96%|█████████▌| 16567/17285 [148:24:20<6:20:00, 31.76s/it] 96%|█████████▌| 16568/17285 [148:24:53<6:23:15, 32.07s/it] 96%|█████████▌| 16569/17285 [148:25:19<6:02:49, 30.40s/it] 96%|█████████▌| 16570/17285 [148:25:45<5:44:25, 28.90s/it] {'loss': 1.3007, 'learning_rate': 1.4854511477372047e-06, 'epoch': 2.88} + 96%|█████████▌| 16570/17285 [148:25:45<5:44:25, 28.90s/it] 96%|█████████▌| 16571/17285 [148:26:23<6:16:45, 31.66s/it] 96%|█████████▌| 16572/17285 [148:26:53<6:12:59, 31.39s/it] 96%|█████████▌| 16573/17285 [148:27:25<6:13:47, 31.50s/it] 96%|█████████▌| 16574/17285 [148:27:52<5:55:28, 30.00s/it] 96%|█████████▌| 16575/17285 [148:28:26<6:09:44, 31.25s/it] 96%|█████████▌| 16576/17285 [148:28:59<6:15:54, 31.81s/it] 96%|█████████▌| 16577/17285 [148:29:33<6:23:46, 32.52s/it][2023-08-29 04:24:41,141] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 96%|█████████▌| 16578/17285 [148:30:03<6:15:19, 31.85s/it] 96%|█████████▌| 16579/17285 [148:30:33<6:06:29, 31.15s/it] 96%|█████████▌| 16580/17285 [148:31:08<6:18:20, 32.20s/it] {'loss': 1.2791, 'learning_rate': 1.456027673515925e-06, 'epoch': 2.88} + 96%|█████████▌| 16580/17285 [148:31:08<6:18:20, 32.20s/it] 96%|█████████▌| 16581/17285 [148:31:47<6:43:38, 34.40s/it] 96%|█████████▌| 16582/17285 [148:32:19<6:33:29, 33.58s/it] 96%|█████████▌| 16583/17285 [148:32:52<6:30:54, 33.41s/it] 96%|█████████▌| 16584/17285 [148:33:20<6:11:01, 31.76s/it] 96%|█████████▌| 16585/17285 [148:33:52<6:10:36, 31.77s/it] 96%|█████████▌| 16586/17285 [148:34:28<6:26:41, 33.19s/it] 96%|█████████▌| 16587/17285 [148:35:00<6:20:19, 32.69s/it] 96%|█████████▌| 16588/17285 [148:35:25<5:53:44, 30.45s/it] 96%|█████████▌| 16589/17285 [148:35:53<5:43:49, 29.64s/it] 96%|█████████▌| 16590/17285 [148:36:27<5:58:31, 30.95s/it] {'loss': 1.2686, 'learning_rate': 1.4236776225376336e-06, 'epoch': 2.88} + 96%|█████████▌| 16590/17285 [148:36:27<5:58:31, 30.95s/it] 96%|█████████▌| 16591/17285 [148:37:00<6:06:48, 31.71s/it] 96%|█████████▌| 16592/17285 [148:37:38<6:26:38, 33.48s/it] 96%|█████████▌| 16593/17285 [148:38:07<6:10:45, 32.15s/it] 96%|█████████▌| 16594/17285 [148:38:43<6:24:03, 33.35s/it] 96%|█████████▌| 16595/17285 [148:39:21<6:40:18, 34.81s/it] 96%|█████████▌| 16596/17285 [148:40:04<7:07:50, 37.26s/it] 96%|█████████▌| 16597/17285 [148:40:29<6:26:02, 33.67s/it] 96%|█████████▌| 16598/17285 [148:40:57<6:06:10, 31.98s/it] 96%|█████████▌| 16599/17285 [148:41:32<6:13:44, 32.69s/it] 96%|█████████▌| 16600/17285 [148:42:04<6:10:13, 32.43s/it] {'loss': 1.2315, 'learning_rate': 1.3916884209024705e-06, 'epoch': 2.88} + 96%|█████████▌| 16600/17285 [148:42:04<6:10:13, 32.43s/it] 96%|█████████▌| 16601/17285 [148:42:35<6:04:48, 32.00s/it] 96%|█████████▌| 16602/17285 [148:43:02<5:47:34, 30.53s/it] 96%|█████████▌| 16603/17285 [148:43:30<5:40:28, 29.95s/it] 96%|█████████▌| 16604/17285 [148:43:58<5:31:34, 29.21s/it] 96%|█████████▌| 16605/17285 [148:44:27<5:32:01, 29.30s/it] 96%|█████████▌| 16606/17285 [148:45:00<5:43:18, 30.34s/it] 96%|█████████▌| 16607/17285 [148:45:32<5:47:24, 30.74s/it] 96%|█████████▌| 16608/17285 [148:46:06<5:59:27, 31.86s/it] 96%|█████████▌| 16609/17285 [148:46:32<5:37:19, 29.94s/it] 96%|█████████▌| 16610/17285 [148:46:58<5:23:34, 28.76s/it] {'loss': 1.2747, 'learning_rate': 1.3600601857104101e-06, 'epoch': 2.88} + 96%|█████████▌| 16610/17285 [148:46:58<5:23:34, 28.76s/it] 96%|█████████▌| 16611/17285 [148:47:33<5:45:41, 30.77s/it] 96%|█████████▌| 16612/17285 [148:48:05<5:49:16, 31.14s/it] 96%|█████████▌| 16613/17285 [148:48:38<5:56:10, 31.80s/it] 96%|█████████▌| 16614/17285 [148:49:21<6:31:29, 35.01s/it] 96%|█████████▌| 16615/17285 [148:49:47<6:00:45, 32.31s/it] 96%|█████████▌| 16616/17285 [148:50:25<6:20:09, 34.10s/it] 96%|█████████▌| 16617/17285 [148:50:55<6:03:53, 32.68s/it] 96%|█████████▌| 16618/17285 [148:51:24<5:50:59, 31.57s/it] 96%|█████████▌| 16619/17285 [148:51:59<6:02:07, 32.62s/it] 96%|█████████▌| 16620/17285 [148:52:27<5:48:59, 31.49s/it] {'loss': 1.2595, 'learning_rate': 1.3287930327400167e-06, 'epoch': 2.88} + 96%|█████████▌| 16620/17285 [148:52:27<5:48:59, 31.49s/it] 96%|█████████▌| 16621/17285 [148:52:57<5:43:07, 31.00s/it] 96%|█████████▌| 16622/17285 [148:53:26<5:34:16, 30.25s/it] 96%|█████████▌| 16623/17285 [148:53:57<5:35:58, 30.45s/it] 96%|█████████▌| 16624/17285 [148:54:29<5:40:02, 30.87s/it] 96%|█████████▌| 16625/17285 [148:55:04<5:53:55, 32.18s/it] 96%|█████████▌| 16626/17285 [148:55:35<5:49:34, 31.83s/it] 96%|█████████▌| 16627/17285 [148:56:02<5:34:10, 30.47s/it] 96%|█████████▌| 16628/17285 [148:56:28<5:18:38, 29.10s/it] 96%|█████████▌| 16629/17285 [148:57:03<5:38:25, 30.95s/it] 96%|█████████▌| 16630/17285 [148:57:45<6:12:45, 34.15s/it] {'loss': 1.2505, 'learning_rate': 1.2978870764481232e-06, 'epoch': 2.89} + 96%|█████████▌| 16630/17285 [148:57:45<6:12:45, 34.15s/it] 96%|█████████▌| 16631/17285 [148:58:19<6:13:23, 34.26s/it] 96%|█████████▌| 16632/17285 [148:58:51<6:02:49, 33.34s/it] 96%|█████████▌| 16633/17285 [148:59:33<6:33:05, 36.17s/it] 96%|█████████▌| 16634/17285 [149:00:05<6:18:44, 34.91s/it] 96%|█████████▌| 16635/17285 [149:00:33<5:55:47, 32.84s/it] 96%|█████████▌| 16636/17285 [149:01:05<5:50:17, 32.38s/it] 96%|█████████▋| 16637/17285 [149:01:38<5:52:33, 32.64s/it] 96%|█████████▋| 16638/17285 [149:02:04<5:30:50, 30.68s/it] 96%|█████████▋| 16639/17285 [149:02:39<5:43:40, 31.92s/it] 96%|█████████▋| 16640/17285 [149:03:12<5:47:42, 32.34s/it] {'loss': 1.2814, 'learning_rate': 1.2673424299693204e-06, 'epoch': 2.89} + 96%|█████████▋| 16640/17285 [149:03:12<5:47:42, 32.34s/it] 96%|█████████▋| 16641/17285 [149:03:48<5:58:31, 33.40s/it] 96%|█████████▋| 16642/17285 [149:04:20<5:54:31, 33.08s/it] 96%|█████████▋| 16643/17285 [149:04:54<5:56:00, 33.27s/it] 96%|█████████▋| 16644/17285 [149:05:28<5:56:50, 33.40s/it] 96%|█████████▋| 16645/17285 [149:06:00<5:53:56, 33.18s/it] 96%|█████████▋| 16646/17285 [149:06:26<5:28:46, 30.87s/it] 96%|█████████▋| 16647/17285 [149:06:52<5:13:52, 29.52s/it] 96%|█████████▋| 16648/17285 [149:07:21<5:10:03, 29.20s/it] 96%|█████████▋| 16649/17285 [149:08:00<5:39:56, 32.07s/it] 96%|█████████▋| 16650/17285 [149:08:35<5:49:40, 33.04s/it] {'loss': 1.2427, 'learning_rate': 1.2371592051156345e-06, 'epoch': 2.89} + 96%|█████████▋| 16650/17285 [149:08:35<5:49:40, 33.04s/it] 96%|█████████▋| 16651/17285 [149:09:05<5:41:31, 32.32s/it] 96%|█████████▋| 16652/17285 [149:09:36<5:34:31, 31.71s/it] 96%|█████████▋| 16653/17285 [149:10:14<5:54:09, 33.62s/it] 96%|█████████▋| 16654/17285 [149:10:46<5:48:47, 33.16s/it] 96%|█████████▋| 16655/17285 [149:11:13<5:29:04, 31.34s/it] 96%|█████████▋| 16656/17285 [149:11:49<5:41:36, 32.59s/it] 96%|█████████▋| 16657/17285 [149:12:22<5:42:45, 32.75s/it] 96%|█████████▋| 16658/17285 [149:12:52<5:35:30, 32.11s/it] 96%|█████████▋| 16659/17285 [149:13:26<5:40:25, 32.63s/it] 96%|█████████▋| 16660/17285 [149:14:02<5:50:40, 33.66s/it] {'loss': 1.2477, 'learning_rate': 1.2073375123760168e-06, 'epoch': 2.89} + 96%|█████████▋| 16660/17285 [149:14:02<5:50:40, 33.66s/it] 96%|█████████▋| 16661/17285 [149:14:29<5:27:17, 31.47s/it] 96%|█████████▋| 16662/17285 [149:15:00<5:25:57, 31.39s/it] 96%|█████████▋| 16663/17285 [149:15:27<5:13:58, 30.29s/it] 96%|█████████▋| 16664/17285 [149:16:00<5:18:55, 30.81s/it] 96%|█████████▋| 16665/17285 [149:16:30<5:16:13, 30.60s/it] 96%|█████████▋| 16666/17285 [149:17:06<5:32:48, 32.26s/it] 96%|█████████▋| 16667/17285 [149:17:34<5:20:19, 31.10s/it] 96%|█████████▋| 16668/17285 [149:18:18<5:58:35, 34.87s/it] 96%|█████████▋| 16669/17285 [149:18:58<6:13:18, 36.36s/it] 96%|█████████▋| 16670/17285 [149:19:31<6:01:56, 35.31s/it] {'loss': 1.2516, 'learning_rate': 1.1778774609160436e-06, 'epoch': 2.89} + 96%|█████████▋| 16670/17285 [149:19:31<6:01:56, 35.31s/it] 96%|█████████▋| 16671/17285 [149:20:04<5:55:03, 34.70s/it] 96%|█████████▋| 16672/17285 [149:20:33<5:36:41, 32.96s/it] 96%|█████████▋| 16673/17285 [149:21:11<5:51:56, 34.50s/it] 96%|█████████▋| 16674/17285 [149:21:39<5:31:08, 32.52s/it] 96%|█████████▋| 16675/17285 [149:22:13<5:36:57, 33.14s/it] 96%|█████████▋| 16676/17285 [149:22:40<5:18:13, 31.35s/it] 96%|█████████▋| 16677/17285 [149:23:15<5:26:15, 32.20s/it] 96%|█████████▋| 16678/17285 [149:23:51<5:38:22, 33.45s/it] 96%|█████████▋| 16679/17285 [149:24:25<5:40:12, 33.68s/it] 96%|█████████▋| 16680/17285 [149:24:54<5:26:04, 32.34s/it] {'loss': 1.2804, 'learning_rate': 1.1487791585774176e-06, 'epoch': 2.89} + 96%|█████████▋| 16680/17285 [149:24:54<5:26:04, 32.34s/it] 97%|█████████▋| 16681/17285 [149:25:29<5:33:38, 33.14s/it] 97%|█████████▋| 16682/17285 [149:26:06<5:42:41, 34.10s/it] 97%|█████████▋| 16683/17285 [149:26:44<5:55:05, 35.39s/it] 97%|█████████▋| 16684/17285 [149:27:16<5:42:21, 34.18s/it] 97%|█████████▋| 16685/17285 [149:27:58<6:07:16, 36.73s/it] 97%|█████████▋| 16686/17285 [149:28:24<5:33:52, 33.44s/it] 97%|█████████▋| 16687/17285 [149:28:49<5:08:25, 30.95s/it] 97%|█████████▋| 16688/17285 [149:29:27<5:27:18, 32.90s/it] 97%|█████████▋| 16689/17285 [149:29:54<5:09:18, 31.14s/it] 97%|█████████▋| 16690/17285 [149:30:23<5:03:25, 30.60s/it] {'loss': 1.2826, 'learning_rate': 1.1200427118776224e-06, 'epoch': 2.9} + 97%|█████████▋| 16690/17285 [149:30:23<5:03:25, 30.60s/it] 97%|█████████▋| 16691/17285 [149:30:49<4:50:37, 29.36s/it] 97%|█████████▋| 16692/17285 [149:31:21<4:55:29, 29.90s/it] 97%|█████████▋| 16693/17285 [149:31:55<5:09:12, 31.34s/it] 97%|█████████▋| 16694/17285 [149:32:27<5:10:02, 31.48s/it] 97%|█████████▋| 16695/17285 [149:33:01<5:18:11, 32.36s/it] 97%|█████████▋| 16696/17285 [149:33:36<5:25:08, 33.12s/it] 97%|█████████▋| 16697/17285 [149:34:03<5:05:31, 31.18s/it] 97%|█████████▋| 16698/17285 [149:34:37<5:12:53, 31.98s/it] 97%|█████████▋| 16699/17285 [149:35:04<4:58:04, 30.52s/it] 97%|█████████▋| 16700/17285 [149:35:46<5:32:44, 34.13s/it] {'loss': 1.2703, 'learning_rate': 1.0916682260095789e-06, 'epoch': 2.9} + 97%|█████████▋| 16700/17285 [149:35:47<5:32:44, 34.13s/it] 97%|█████████▋| 16701/17285 [149:36:15<5:16:42, 32.54s/it] 97%|█████████▋| 16702/17285 [149:36:50<5:21:19, 33.07s/it] 97%|█████████▋| 16703/17285 [149:37:22<5:17:27, 32.73s/it] 97%|█████████▋| 16704/17285 [149:37:51<5:08:11, 31.83s/it] 97%|█████████▋| 16705/17285 [149:38:22<5:05:33, 31.61s/it] 97%|█████████▋| 16706/17285 [149:38:48<4:47:03, 29.75s/it] 97%|█████████▋| 16707/17285 [149:39:21<4:56:28, 30.78s/it] 97%|█████████▋| 16708/17285 [149:39:56<5:07:44, 32.00s/it] 97%|█████████▋| 16709/17285 [149:40:24<4:56:24, 30.88s/it] 97%|█████████▋| 16710/17285 [149:40:53<4:51:20, 30.40s/it] {'loss': 1.2501, 'learning_rate': 1.063655804841146e-06, 'epoch': 2.9} + 97%|█████████▋| 16710/17285 [149:40:53<4:51:20, 30.40s/it] 97%|█████████▋| 16711/17285 [149:41:24<4:50:08, 30.33s/it] 97%|█████████▋| 16712/17285 [149:41:56<4:55:07, 30.90s/it] 97%|█████████▋| 16713/17285 [149:42:36<5:20:09, 33.58s/it] 97%|█████████▋| 16714/17285 [149:43:07<5:12:35, 32.85s/it] 97%|█████████▋| 16715/17285 [149:43:39<5:09:21, 32.56s/it] 97%|█████████▋| 16716/17285 [149:44:07<4:56:14, 31.24s/it] 97%|█████████▋| 16717/17285 [149:44:39<4:59:48, 31.67s/it] 97%|█████████▋| 16718/17285 [149:45:10<4:55:41, 31.29s/it] 97%|█████████▋| 16719/17285 [149:45:43<5:00:03, 31.81s/it] 97%|█████████▋| 16720/17285 [149:46:15<5:01:42, 32.04s/it] {'loss': 1.2323, 'learning_rate': 1.0360055509148535e-06, 'epoch': 2.9} + 97%|█████████▋| 16720/17285 [149:46:15<5:01:42, 32.04s/it] 97%|█████████▋| 16721/17285 [149:46:41<4:42:08, 30.02s/it] 97%|█████████▋| 16722/17285 [149:47:15<4:53:07, 31.24s/it] 97%|█████████▋| 16723/17285 [149:47:44<4:45:57, 30.53s/it] 97%|█████████▋| 16724/17285 [149:48:08<4:28:38, 28.73s/it] 97%|█████████▋| 16725/17285 [149:48:39<4:33:31, 29.31s/it] 97%|█████████▋| 16726/17285 [149:49:16<4:53:47, 31.53s/it] 97%|█████████▋| 16727/17285 [149:50:02<5:35:29, 36.07s/it] 97%|█████████▋| 16728/17285 [149:50:34<5:23:46, 34.88s/it] 97%|█████████▋| 16729/17285 [149:51:06<5:12:45, 33.75s/it] 97%|█████████▋| 16730/17285 [149:51:47<5:33:57, 36.10s/it] {'loss': 1.2773, 'learning_rate': 1.008717565447448e-06, 'epoch': 2.9} + 97%|█████████▋| 16730/17285 [149:51:47<5:33:57, 36.10s/it] 97%|█████████▋| 16731/17285 [149:52:21<5:27:46, 35.50s/it] 97%|█████████▋| 16732/17285 [149:52:52<5:13:13, 33.98s/it] 97%|█████████▋| 16733/17285 [149:53:26<5:14:57, 34.23s/it] 97%|█████████▋| 16734/17285 [149:53:53<4:53:19, 31.94s/it] 97%|█████████▋| 16735/17285 [149:54:21<4:42:23, 30.81s/it][2023-08-29 05:49:32,216] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 97%|█████████▋| 16736/17285 [149:54:55<4:48:44, 31.56s/it][2023-08-29 05:50:03,666] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 97%|█████████▋| 16737/17285 [149:55:26<4:47:55, 31.52s/it] 97%|█████████▋| 16738/17285 [149:56:01<4:56:47, 32.56s/it] 97%|█████████▋| 16739/17285 [149:56:35<5:00:39, 33.04s/it] 97%|█████████▋| 16740/17285 [149:57:00<4:38:49, 30.70s/it] {'loss': 1.3079, 'learning_rate': 9.871480775350161e-07, 'epoch': 2.91} + 97%|█████████▋| 16740/17285 [149:57:00<4:38:49, 30.70s/it] 97%|█████████▋| 16741/17285 [149:57:30<4:36:26, 30.49s/it] 97%|█████████▋| 16742/17285 [149:57:59<4:30:11, 29.86s/it] 97%|█████████▋| 16743/17285 [149:58:26<4:22:24, 29.05s/it] 97%|█████████▋| 16744/17285 [149:59:08<4:56:07, 32.84s/it] 97%|█████████▋| 16745/17285 [149:59:41<4:57:41, 33.08s/it] 97%|█████████▋| 16746/17285 [150:00:08<4:40:21, 31.21s/it] 97%|█████████▋| 16747/17285 [150:00:40<4:41:04, 31.35s/it] 97%|█████████▋| 16748/17285 [150:01:15<4:52:17, 32.66s/it] 97%|█████████▋| 16749/17285 [150:01:46<4:46:17, 32.05s/it] 97%|█████████▋| 16750/17285 [150:02:18<4:44:25, 31.90s/it] {'loss': 1.2767, 'learning_rate': 9.605124261266474e-07, 'epoch': 2.91} + 97%|█████████▋| 16750/17285 [150:02:18<4:44:25, 31.90s/it] 97%|█████████▋| 16751/17285 [150:02:47<4:38:11, 31.26s/it] 97%|█████████▋| 16752/17285 [150:03:22<4:46:11, 32.22s/it] 97%|█████████▋| 16753/17285 [150:03:48<4:29:23, 30.38s/it] 97%|█████████▋| 16754/17285 [150:04:21<4:36:00, 31.19s/it] 97%|█████████▋| 16755/17285 [150:04:51<4:31:18, 30.71s/it] 97%|█████████▋| 16756/17285 [150:05:25<4:39:33, 31.71s/it] 97%|█████████▋| 16757/17285 [150:05:55<4:35:07, 31.26s/it] 97%|█████████▋| 16758/17285 [150:06:32<4:50:55, 33.12s/it] 97%|█████████▋| 16759/17285 [150:07:02<4:40:05, 31.95s/it] 97%|█████████▋| 16760/17285 [150:07:36<4:45:48, 32.66s/it] {'loss': 1.2475, 'learning_rate': 9.34239319527963e-07, 'epoch': 2.91} + 97%|█████████▋| 16760/17285 [150:07:36<4:45:48, 32.66s/it] 97%|█████████▋| 16761/17285 [150:08:05<4:37:09, 31.74s/it] 97%|█████████▋| 16762/17285 [150:08:50<5:11:07, 35.69s/it] 97%|█████████▋| 16763/17285 [150:09:21<4:56:26, 34.07s/it] 97%|█████████▋| 16764/17285 [150:09:55<4:56:45, 34.18s/it] 97%|█████████▋| 16765/17285 [150:10:23<4:39:58, 32.31s/it] 97%|█████████▋| 16766/17285 [150:10:53<4:32:36, 31.52s/it] 97%|█████████▋| 16767/17285 [150:11:27<4:40:27, 32.49s/it] 97%|█████████▋| 16768/17285 [150:11:52<4:19:27, 30.11s/it] 97%|█████████▋| 16769/17285 [150:12:23<4:21:47, 30.44s/it] 97%|█████████▋| 16770/17285 [150:12:54<4:21:07, 30.42s/it] {'loss': 1.2586, 'learning_rate': 9.083288539145196e-07, 'epoch': 2.91} + 97%|█████████▋| 16770/17285 [150:12:54<4:21:07, 30.42s/it] 97%|█████████▋| 16771/17285 [150:13:23<4:16:53, 29.99s/it] 97%|█████████▋| 16772/17285 [150:13:50<4:08:34, 29.07s/it] 97%|█████████▋| 16773/17285 [150:14:26<4:26:21, 31.21s/it] 97%|█████████▋| 16774/17285 [150:14:57<4:24:50, 31.10s/it] 97%|█████████▋| 16775/17285 [150:15:24<4:15:01, 30.00s/it] 97%|█████████▋| 16776/17285 [150:15:56<4:20:43, 30.73s/it] 97%|█████████▋| 16777/17285 [150:16:30<4:27:28, 31.59s/it] 97%|█████████▋| 16778/17285 [150:16:57<4:15:39, 30.26s/it] 97%|█████████▋| 16779/17285 [150:17:24<4:06:08, 29.19s/it] 97%|█████████▋| 16780/17285 [150:18:01<4:26:35, 31.67s/it] {'loss': 1.2465, 'learning_rate': 8.827811241344131e-07, 'epoch': 2.91} + 97%|█████████▋| 16780/17285 [150:18:01<4:26:35, 31.67s/it] 97%|█████████▋| 16781/17285 [150:18:34<4:27:43, 31.87s/it] 97%|█████████▋| 16782/17285 [150:19:14<4:49:36, 34.55s/it] 97%|█████████▋| 16783/17285 [150:19:46<4:40:35, 33.54s/it] 97%|█████████▋| 16784/17285 [150:20:14<4:27:40, 32.06s/it] 97%|█████████▋| 16785/17285 [150:20:41<4:14:24, 30.53s/it] 97%|█████████▋| 16786/17285 [150:21:12<4:13:56, 30.53s/it] 97%|█████████▋| 16787/17285 [150:21:40<4:08:38, 29.96s/it] 97%|█████████▋| 16788/17285 [150:22:09<4:05:30, 29.64s/it] 97%|█████████▋| 16789/17285 [150:22:44<4:18:29, 31.27s/it] 97%|█████████▋| 16790/17285 [150:23:15<4:16:50, 31.13s/it] {'loss': 1.2841, 'learning_rate': 8.575962237078572e-07, 'epoch': 2.91} + 97%|█████████▋| 16790/17285 [150:23:15<4:16:50, 31.13s/it] 97%|█████████▋| 16791/17285 [150:23:52<4:31:26, 32.97s/it] 97%|█████████▋| 16792/17285 [150:24:22<4:21:42, 31.85s/it] 97%|█████████▋| 16793/17285 [150:24:49<4:10:54, 30.60s/it] 97%|█████████▋| 16794/17285 [150:25:22<4:15:20, 31.20s/it] 97%|█████████▋| 16795/17285 [150:25:53<4:13:40, 31.06s/it] 97%|█████████▋| 16796/17285 [150:26:20<4:04:58, 30.06s/it] 97%|█████████▋| 16797/17285 [150:26:54<4:13:48, 31.21s/it] 97%|█████████▋| 16798/17285 [150:27:23<4:08:01, 30.56s/it] 97%|█████████▋| 16799/17285 [150:27:54<4:07:50, 30.60s/it] 97%|█████████▋| 16800/17285 [150:28:19<3:52:50, 28.80s/it] {'loss': 1.2984, 'learning_rate': 8.327742448269394e-07, 'epoch': 2.92} + 97%|█████████▋| 16800/17285 [150:28:19<3:52:50, 28.80s/it] 97%|█████████▋| 16801/17285 [150:28:48<3:53:12, 28.91s/it] 97%|█████████▋| 16802/17285 [150:29:18<3:56:49, 29.42s/it] 97%|█████████▋| 16803/17285 [150:29:58<4:20:01, 32.37s/it] 97%|█████████▋| 16804/17285 [150:30:23<4:01:54, 30.18s/it] 97%|█████████▋| 16805/17285 [150:30:52<3:59:12, 29.90s/it] 97%|█████████▋| 16806/17285 [150:31:18<3:48:42, 28.65s/it] 97%|█████████▋| 16807/17285 [150:31:53<4:04:23, 30.68s/it] 97%|█████████▋| 16808/17285 [150:32:31<4:21:29, 32.89s/it] 97%|█████████▋| 16809/17285 [150:33:06<4:26:46, 33.63s/it] 97%|█████████▋| 16810/17285 [150:33:39<4:24:02, 33.35s/it] {'loss': 1.2587, 'learning_rate': 8.083152783552095e-07, 'epoch': 2.92} + 97%|█████████▋| 16810/17285 [150:33:39<4:24:02, 33.35s/it] 97%|█████████▋| 16811/17285 [150:34:22<4:47:00, 36.33s/it] 97%|█████████▋| 16812/17285 [150:34:58<4:44:37, 36.10s/it] 97%|█████████▋| 16813/17285 [150:35:24<4:20:54, 33.17s/it] 97%|█████████▋| 16814/17285 [150:35:56<4:17:54, 32.85s/it] 97%|█████████▋| 16815/17285 [150:36:28<4:14:33, 32.50s/it] 97%|█████████▋| 16816/17285 [150:36:58<4:08:55, 31.85s/it] 97%|█████████▋| 16817/17285 [150:37:29<4:04:47, 31.38s/it] 97%|█████████▋| 16818/17285 [150:37:57<3:56:21, 30.37s/it] 97%|█████████▋| 16819/17285 [150:38:25<3:50:54, 29.73s/it] 97%|█████████▋| 16820/17285 [150:39:04<4:12:09, 32.54s/it] {'loss': 1.2659, 'learning_rate': 7.842194138273584e-07, 'epoch': 2.92} + 97%|█████████▋| 16820/17285 [150:39:04<4:12:09, 32.54s/it] 97%|█████████▋| 16821/17285 [150:39:40<4:18:42, 33.45s/it] 97%|█████████▋| 16822/17285 [150:40:12<4:16:19, 33.22s/it][2023-08-29 06:35:16,215] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 + 97%|█████████▋| 16823/17285 [150:40:39<3:59:29, 31.10s/it] 97%|█████████▋| 16824/17285 [150:41:17<4:16:34, 33.39s/it] 97%|█████████▋| 16825/17285 [150:41:49<4:11:21, 32.78s/it] 97%|█████████▋| 16826/17285 [150:42:23<4:14:43, 33.30s/it] 97%|█████████▋| 16827/17285 [150:42:57<4:15:29, 33.47s/it] 97%|█████████▋| 16828/17285 [150:43:32<4:18:07, 33.89s/it] 97%|█████████▋| 16829/17285 [150:43:57<3:57:29, 31.25s/it] 97%|█████████▋| 16830/17285 [150:44:26<3:51:09, 30.48s/it] {'loss': 1.2654, 'learning_rate': 7.628436608436595e-07, 'epoch': 2.92} + 97%|█████████▋| 16830/17285 [150:44:26<3:51:09, 30.48s/it] 97%|█████████▋| 16831/17285 [150:44:55<3:47:51, 30.11s/it] 97%|█████████▋| 16832/17285 [150:45:29<3:57:01, 31.39s/it] 97%|█████████▋| 16833/17285 [150:46:06<4:07:54, 32.91s/it] 97%|█████████▋| 16834/17285 [150:46:35<4:00:02, 31.93s/it] 97%|█████████▋| 16835/17285 [150:47:09<4:03:26, 32.46s/it] 97%|███��█████▋| 16836/17285 [150:47:41<4:00:53, 32.19s/it] 97%|█████████▋| 16837/17285 [150:48:13<4:01:38, 32.36s/it] 97%|█████████▋| 16838/17285 [150:48:48<4:05:08, 32.90s/it] 97%|█████████▋| 16839/17285 [150:49:19<4:00:27, 32.35s/it] 97%|█████████▋| 16840/17285 [150:49:48<3:53:10, 31.44s/it] {'loss': 1.2415, 'learning_rate': 7.3943793191662e-07, 'epoch': 2.92} + 97%|█████████▋| 16840/17285 [150:49:48<3:53:10, 31.44s/it] 97%|█████████▋| 16841/17285 [150:50:26<4:07:53, 33.50s/it] 97%|█████████▋| 16842/17285 [150:50:59<4:04:46, 33.15s/it] 97%|█████████▋| 16843/17285 [150:51:37<4:15:11, 34.64s/it] 97%|█████████▋| 16844/17285 [150:52:07<4:05:21, 33.38s/it] 97%|█████████▋| 16845/17285 [150:52:32<3:46:06, 30.83s/it] 97%|█████████▋| 16846/17285 [150:53:13<4:08:03, 33.90s/it] 97%|█████████▋| 16847/17285 [150:53:50<4:13:12, 34.69s/it] 97%|█████████▋| 16848/17285 [150:54:20<4:03:38, 33.45s/it] 97%|█████████▋| 16849/17285 [150:54:49<3:53:34, 32.14s/it] 97%|█████████▋| 16850/17285 [150:55:16<3:42:06, 30.64s/it] {'loss': 1.2256, 'learning_rate': 7.163955570664738e-07, 'epoch': 2.92} + 97%|█████████▋| 16850/17285 [150:55:16<3:42:06, 30.64s/it] 97%|█████████▋| 16851/17285 [150:55:49<3:46:21, 31.29s/it] 97%|█████████▋| 16852/17285 [150:56:18<3:41:11, 30.65s/it] 98%|█████████▊| 16853/17285 [150:56:49<3:40:45, 30.66s/it] 98%|█████████▊| 16854/17285 [150:57:30<4:01:57, 33.68s/it] 98%|█████████▊| 16855/17285 [150:57:55<3:44:06, 31.27s/it] 98%|█████████▊| 16856/17285 [150:58:28<3:47:24, 31.80s/it] 98%|█████████▊| 16857/17285 [150:58:56<3:36:38, 30.37s/it] 98%|█████████▊| 16858/17285 [150:59:36<3:57:12, 33.33s/it] 98%|█████████▊| 16859/17285 [151:00:16<4:11:13, 35.38s/it] 98%|█████████▊| 16860/17285 [151:00:45<3:56:40, 33.41s/it] {'loss': 1.2541, 'learning_rate': 6.937166206423485e-07, 'epoch': 2.93} + 98%|█████████▊| 16860/17285 [151:00:45<3:56:40, 33.41s/it] 98%|█████████▊| 16861/17285 [151:01:15<3:49:25, 32.47s/it] 98%|█████████▊| 16862/17285 [151:01:42<3:37:54, 30.91s/it] 98%|█████████▊| 16863/17285 [151:02:09<3:29:20, 29.76s/it] 98%|█████████▊| 16864/17285 [151:02:42<3:34:06, 30.51s/it] 98%|█████████▊| 16865/17285 [151:03:18<3:45:58, 32.28s/it] 98%|█████████▊| 16866/17285 [151:03:52<3:48:18, 32.69s/it] 98%|█████████▊| 16867/17285 [151:04:25<3:49:17, 32.91s/it] 98%|█████████▊| 16868/17285 [151:05:00<3:52:19, 33.43s/it] 98%|█████████▊| 16869/17285 [151:05:33<3:51:26, 33.38s/it] 98%|█████████▊| 16870/17285 [151:06:01<3:40:30, 31.88s/it] {'loss': 1.2869, 'learning_rate': 6.714012056629693e-07, 'epoch': 2.93} + 98%|█████████▊| 16870/17285 [151:06:01<3:40:30, 31.88s/it] 98%|█████████▊| 16871/17285 [151:06:32<3:37:48, 31.57s/it] 98%|█████████▊| 16872/17285 [151:07:11<3:52:51, 33.83s/it] 98%|█████████▊| 16873/17285 [151:07:44<3:49:09, 33.37s/it] 98%|█████████▊| 16874/17285 [151:08:09<3:32:26, 31.01s/it] 98%|█████████▊| 16875/17285 [151:08:39<3:29:11, 30.61s/it] 98%|█████████▊| 16876/17285 [151:09:10<3:29:04, 30.67s/it] 98%|█████████▊| 16877/17285 [151:09:40<3:27:24, 30.50s/it] 98%|█████████▊| 16878/17285 [151:10:11<3:28:59, 30.81s/it] 98%|█████████▊| 16879/17285 [151:10:41<3:26:20, 30.49s/it] 98%|█████████▊| 16880/17285 [151:11:11<3:25:42, 30.48s/it] {'loss': 1.2867, 'learning_rate': 6.494493938163038e-07, 'epoch': 2.93} + 98%|█████████▊| 16880/17285 [151:11:11<3:25:42, 30.48s/it] 98%|█████████▊| 16881/17285 [151:11:48<3:37:29, 32.30s/it] 98%|█████████▊| 16882/17285 [151:12:18<3:32:24, 31.62s/it] 98%|█████████▊| 16883/17285 [151:12:52<3:36:24, 32.30s/it] 98%|█████████▊| 16884/17285 [151:13:25<3:37:56, 32.61s/it] 98%|█████████▊| 16885/17285 [151:13:57<3:36:17, 32.44s/it] 98%|█████████▊| 16886/17285 [151:14:35<3:46:48, 34.11s/it] 98%|█████████▊| 16887/17285 [151:15:05<3:37:42, 32.82s/it] 98%|█████████▊| 16888/17285 [151:15:35<3:31:24, 31.95s/it] 98%|█████████▊| 16889/17285 [151:16:07<3:30:55, 31.96s/it] 98%|█████████▊| 16890/17285 [151:16:40<3:33:20, 32.41s/it] {'loss': 1.242, 'learning_rate': 6.278612654593729e-07, 'epoch': 2.93} + 98%|█████████▊| 16890/17285 [151:16:40<3:33:20, 32.41s/it] 98%|█████████▊| 16891/17285 [151:17:10<3:27:30, 31.60s/it] 98%|█████████▊| 16892/17285 [151:17:37<3:17:05, 30.09s/it] 98%|█████████▊| 16893/17285 [151:18:08<3:18:54, 30.44s/it] 98%|█████████▊| 16894/17285 [151:18:43<3:27:19, 31.81s/it] 98%|█████████▊| 16895/17285 [151:19:11<3:19:32, 30.70s/it] 98%|█████████▊| 16896/17285 [151:19:38<3:11:28, 29.53s/it] 98%|█████████▊| 16897/17285 [151:20:20<3:35:40, 33.35s/it] 98%|█████████▊| 16898/17285 [151:20:47<3:21:49, 31.29s/it] 98%|█████████▊| 16899/17285 [151:21:16<3:17:38, 30.72s/it] 98%|█████████▊| 16900/17285 [151:21:41<3:06:37, 29.08s/it] {'loss': 1.2183, 'learning_rate': 6.066368996178517e-07, 'epoch': 2.93} + 98%|█████████▊| 16900/17285 [151:21:41<3:06:37, 29.08s/it] 98%|█████████▊| 16901/17285 [151:22:07<2:59:21, 28.02s/it] 98%|█████████▊| 16902/17285 [151:22:38<3:03:52, 28.81s/it] 98%|█████████▊| 16903/17285 [151:23:03<2:57:22, 27.86s/it] 98%|█████████▊| 16904/17285 [151:23:30<2:54:48, 27.53s/it] 98%|█████████▊| 16905/17285 [151:24:04<3:06:33, 29.46s/it] 98%|█████████▊| 16906/17285 [151:24:36<3:10:59, 30.24s/it] 98%|█████████▊| 16907/17285 [151:25:20<3:36:54, 34.43s/it] 98%|█████████▊| 16908/17285 [151:25:50<3:28:23, 33.17s/it] 98%|█████████▊| 16909/17285 [151:26:30<3:40:35, 35.20s/it] 98%|█████████▊| 16910/17285 [151:26:59<3:28:03, 33.29s/it] {'loss': 1.2517, 'learning_rate': 5.85776373985858e-07, 'epoch': 2.93} + 98%|█████████▊| 16910/17285 [151:26:59<3:28:03, 33.29s/it] 98%|█████████▊| 16911/17285 [151:27:30<3:23:14, 32.61s/it] 98%|█████████▊| 16912/17285 [151:28:01<3:19:17, 32.06s/it] 98%|█████████▊| 16913/17285 [151:28:36<3:25:05, 33.08s/it] 98%|█████████▊| 16914/17285 [151:29:08<3:21:12, 32.54s/it] 98%|█████████▊| 16915/17285 [151:29:42<3:24:28, 33.16s/it] 98%|█████████▊| 16916/17285 [151:30:16<3:24:31, 33.26s/it] 98%|█████████▊| 16917/17285 [151:30:50<3:25:29, 33.50s/it] 98%|█████████▊| 16918/17285 [151:31:25<3:27:53, 33.99s/it] 98%|█████████▊| 16919/17285 [151:32:00<3:28:37, 34.20s/it] 98%|█████████▊| 16920/17285 [151:32:35<3:29:55, 34.51s/it] {'loss': 1.2363, 'learning_rate': 5.652797649255969e-07, 'epoch': 2.94} + 98%|█████████▊| 16920/17285 [151:32:35<3:29:55, 34.51s/it] 98%|█████████▊| 16921/17285 [151:33:13<3:34:58, 35.44s/it] 98%|█████████▊| 16922/17285 [151:33:44<3:27:49, 34.35s/it] 98%|█████████▊| 16923/17285 [151:34:18<3:26:35, 34.24s/it] 98%|█████████▊| 16924/17285 [151:34:49<3:19:31, 33.16s/it] 98%|█████████▊| 16925/17285 [151:35:31<3:35:39, 35.94s/it] 98%|█████████▊| 16926/17285 [151:36:02<3:25:58, 34.43s/it] 98%|█████████▊| 16927/17285 [151:36:34<3:20:36, 33.62s/it] 98%|█████████▊| 16928/17285 [151:37:02<3:10:27, 32.01s/it] 98%|█████████▊| 16929/17285 [151:37:31<3:04:20, 31.07s/it] 98%|█████████▊| 16930/17285 [151:37:57<2:54:50, 29.55s/it] {'loss': 1.2635, 'learning_rate': 5.4514714746714e-07, 'epoch': 2.94} + 98%|█████████▊| 16930/17285 [151:37:57<2:54:50, 29.55s/it] 98%|█████████▊| 16931/17285 [151:38:28<2:55:52, 29.81s/it] 98%|█████████▊| 16932/17285 [151:39:00<3:00:23, 30.66s/it] 98%|█████████▊| 16933/17285 [151:39:35<3:06:31, 31.79s/it] 98%|█████████▊| 16934/17285 [151:40:09<3:10:39, 32.59s/it] 98%|█████████▊| 16935/17285 [151:40:42<3:10:54, 32.73s/it] 98%|█████████▊| 16936/17285 [151:41:17<3:14:53, 33.51s/it] 98%|█████████▊| 16937/17285 [151:41:49<3:10:30, 32.85s/it] 98%|█████████▊| 16938/17285 [151:42:17<3:01:25, 31.37s/it] 98%|█████████▊| 16939/17285 [151:42:49<3:02:26, 31.64s/it] 98%|█████████▊| 16940/17285 [151:43:20<3:00:30, 31.39s/it] {'loss': 1.2782, 'learning_rate': 5.253785953081125e-07, 'epoch': 2.94} + 98%|█████████▊| 16940/17285 [151:43:20<3:00:30, 31.39s/it] 98%|█████████▊| 16941/17285 [151:43:55<3:06:40, 32.56s/it] 98%|█████████▊| 16942/17285 [151:44:29<3:07:46, 32.85s/it] 98%|█████████▊| 16943/17285 [151:45:03<3:09:08, 33.18s/it] 98%|█████████▊| 16944/17285 [151:45:32<3:02:32, 32.12s/it] 98%|█████████▊| 16945/17285 [151:46:06<3:04:45, 32.61s/it] 98%|█████████▊| 16946/17285 [151:46:44<3:12:45, 34.12s/it] 98%|█████████▊| 16947/17285 [151:47:15<3:07:10, 33.23s/it] 98%|█████████▊| 16948/17285 [151:47:46<3:03:33, 32.68s/it] 98%|█████████▊| 16949/17285 [151:48:15<2:57:26, 31.69s/it] 98%|█████████▊| 16950/17285 [151:48:49<2:59:52, 32.22s/it] {'loss': 1.3006, 'learning_rate': 5.059741808134621e-07, 'epoch': 2.94} + 98%|█████████▊| 16950/17285 [151:48:49<2:59:52, 32.22s/it] 98%|█████████▊| 16951/17285 [151:49:20<2:57:06, 31.82s/it] 98%|█████████▊| 16952/17285 [151:49:52<2:56:51, 31.87s/it] 98%|█████████▊| 16953/17285 [151:50:23<2:54:38, 31.56s/it] 98%|█████████▊| 16954/17285 [151:51:02<3:07:34, 34.00s/it] 98%|█████████▊| 16955/17285 [151:51:36<3:06:55, 33.99s/it] 98%|█████████▊| 16956/17285 [151:52:06<2:59:14, 32.69s/it] 98%|█████████▊| 16957/17285 [151:52:43<3:05:49, 33.99s/it] 98%|█████████▊| 16958/17285 [151:53:16<3:03:30, 33.67s/it] 98%|█████████▊| 16959/17285 [151:53:54<3:10:01, 34.97s/it] 98%|█████████▊| 16960/17285 [151:54:31<3:12:40, 35.57s/it] {'loss': 1.2425, 'learning_rate': 4.869339750151469e-07, 'epoch': 2.94} + 98%|█████████▊| 16960/17285 [151:54:31<3:12:40, 35.57s/it] 98%|█████████▊| 16961/17285 [151:54:58<2:58:32, 33.06s/it] 98%|█████████▊| 16962/17285 [151:55:29<2:55:10, 32.54s/it] 98%|█████████▊| 16963/17285 [151:55:57<2:46:58, 31.11s/it] 98%|█████████▊| 16964/17285 [151:56:25<2:40:19, 29.97s/it] 98%|█████████▊| 16965/17285 [151:57:00<2:48:09, 31.53s/it] 98%|█████████▊| 16966/17285 [151:57:26<2:39:44, 30.04s/it] 98%|█████████▊| 16967/17285 [151:57:58<2:42:27, 30.65s/it] 98%|█████████▊| 16968/17285 [151:58:29<2:41:38, 30.59s/it] 98%|█████████▊| 16969/17285 [151:59:04<2:48:05, 31.92s/it] 98%|█████████▊| 16970/17285 [151:59:31<2:39:40, 30.42s/it] {'loss': 1.276, 'learning_rate': 4.682580476119247e-07, 'epoch': 2.95} + 98%|█████████▊| 16970/17285 [151:59:31<2:39:40, 30.42s/it] 98%|█████████▊| 16971/17285 [152:00:07<2:47:44, 32.05s/it] 98%|█████████▊| 16972/17285 [152:00:40<2:49:23, 32.47s/it] 98%|█████████▊| 16973/17285 [152:01:14<2:51:06, 32.90s/it] 98%|█████████▊| 16974/17285 [152:01:52<2:58:36, 34.46s/it] 98%|█████████▊| 16975/17285 [152:02:32<3:06:24, 36.08s/it] 98%|█████████▊| 16976/17285 [152:03:00<2:52:51, 33.56s/it] 98%|█████████▊| 16977/17285 [152:03:30<2:47:39, 32.66s/it] 98%|█████████▊| 16978/17285 [152:04:16<3:07:48, 36.70s/it] 98%|█████████▊| 16979/17285 [152:04:50<3:03:05, 35.90s/it] 98%|█████████▊| 16980/17285 [152:05:23<2:58:03, 35.03s/it] {'loss': 1.2827, 'learning_rate': 4.499464669690423e-07, 'epoch': 2.95} + 98%|█████████▊| 16980/17285 [152:05:23<2:58:03, 35.03s/it] 98%|█████████▊| 16981/17285 [152:05:51<2:47:05, 32.98s/it] 98%|█████████▊| 16982/17285 [152:06:24<2:45:55, 32.86s/it] 98%|█████████▊| 16983/17285 [152:06:50<2:34:37, 30.72s/it] 98%|█████████▊| 16984/17285 [152:07:22<2:36:37, 31.22s/it] 98%|█████████▊| 16985/17285 [152:08:03<2:51:13, 34.25s/it] 98%|█████████▊| 16986/17285 [152:08:43<2:59:04, 35.93s/it] 98%|█████████▊| 16987/17285 [152:09:13<2:48:51, 34.00s/it] 98%|██��██████▊| 16988/17285 [152:09:48<2:49:43, 34.29s/it] 98%|█████████▊| 16989/17285 [152:10:22<2:48:46, 34.21s/it] 98%|█████████▊| 16990/17285 [152:10:47<2:34:51, 31.50s/it] {'loss': 1.3223, 'learning_rate': 4.3199930011802446e-07, 'epoch': 2.95} + 98%|█████████▊| 16990/17285 [152:10:47<2:34:51, 31.50s/it] 98%|█████████▊| 16991/17285 [152:11:15<2:28:54, 30.39s/it] 98%|█████████▊| 16992/17285 [152:11:45<2:28:48, 30.47s/it] 98%|█████████▊| 16993/17285 [152:12:11<2:21:48, 29.14s/it] 98%|█████████▊| 16994/17285 [152:12:50<2:34:42, 31.90s/it] 98%|█████████▊| 16995/17285 [152:13:23<2:35:18, 32.13s/it] 98%|█████████▊| 16996/17285 [152:13:49<2:26:07, 30.34s/it] 98%|█████████▊| 16997/17285 [152:14:26<2:35:17, 32.35s/it] 98%|█████████▊| 16998/17285 [152:15:01<2:38:52, 33.21s/it] 98%|█████████▊| 16999/17285 [152:15:33<2:36:52, 32.91s/it] 98%|█████████▊| 17000/17285 [152:16:05<2:35:11, 32.67s/it] {'loss': 1.2453, 'learning_rate': 4.1441661275645195e-07, 'epoch': 2.95} + 98%|█████████▊| 17000/17285 [152:16:05<2:35:11, 32.67s/it][INFO|trainer.py:3081] 2023-08-29 08:10:42,982 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-29 08:10:42,983 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-29 08:10:42,983 >> Batch size = 2 + + 0%| | 0/33 [00:00> Deleting older checkpoint [20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-14000] due to args.save_total_limit +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-17000 +[INFO|tokenization_utils_base.py:2210] 2023-08-29 08:12:08,047 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-17000/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-29 08:12:08,055 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-17000/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-17000 +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/checkpoint-17000 + 98%|█████████▊| 17001/17285 [152:18:12<4:47:49, 60.81s/it] 98%|█████████▊| 17002/17285 [152:18:47<4:10:30, 53.11s/it] 98%|█████████▊| 17003/17285 [152:19:11<3:28:53, 44.44s/it] 98%|█████████▊| 17004/17285 [152:19:42<3:09:02, 40.37s/it] 98%|█████████▊| 17005/17285 [152:20:08<2:48:45, 36.16s/it] 98%|█████████▊| 17006/17285 [152:20:41<2:43:22, 35.14s/it] 98%|█████████▊| 17007/17285 [152:21:06<2:28:47, 32.11s/it] 98%|█████████▊| 17008/17285 [152:21:36<2:24:46, 31.36s/it] 98%|█████████▊| 17009/17285 [152:22:05<2:22:04, 30.89s/it] 98%|█████████▊| 17010/17285 [152:22:31<2:13:56, 29.22s/it] {'loss': 1.2683, 'learning_rate': 3.971984692476394e-07, 'epoch': 2.95} + 98%|█████████▊| 17010/17285 [152:22:31<2:13:56, 29.22s/it] 98%|█████████▊| 17011/17285 [152:23:01<2:14:38, 29.48s/it] 98%|█████████▊| 17012/17285 [152:23:30<2:13:09, 29.27s/it][2023-08-29 08:18:38,459] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 + 98%|█████████▊| 17013/17285 [152:24:01<2:15:10, 29.82s/it] 98%|█████████▊| 17014/17285 [152:24:32<2:16:51, 30.30s/it] 98%|█████████▊| 17015/17285 [152:25:11<2:28:23, 32.98s/it] 98%|█████████▊| 17016/17285 [152:25:41<2:22:50, 31.86s/it] 98%|█████████▊| 17017/17285 [152:26:09<2:16:59, 30.67s/it] 98%|█████████▊| 17018/17285 [152:26:40<2:17:34, 30.91s/it] 98%|█████████▊| 17019/17285 [152:27:21<2:29:51, 33.80s/it] 98%|█████████▊| 17020/17285 [152:27:52<2:25:46, 33.01s/it] {'loss': 1.2824, 'learning_rate': 3.820138772047788e-07, 'epoch': 2.95} + 98%|█████████▊| 17020/17285 [152:27:52<2:25:46, 33.01s/it] 98%|█████████▊| 17021/17285 [152:28:19<2:17:26, 31.24s/it] 98%|█████████▊| 17022/17285 [152:28:46<2:11:43, 30.05s/it] 98%|█████████▊| 17023/17285 [152:29:23<2:20:45, 32.23s/it] 98%|█████████▊| 17024/17285 [152:29:56<2:19:57, 32.18s/it] 98%|█████████▊| 17025/17285 [152:30:21<2:11:15, 30.29s/it] 99%|█████████▊| 17026/17285 [152:30:51<2:09:56, 30.10s/it] 99%|█████████▊| 17027/17285 [152:31:25<2:14:52, 31.37s/it] 99%|█████████▊| 17028/17285 [152:31:54<2:10:32, 30.48s/it] 99%|█████████▊| 17029/17285 [152:32:23<2:08:51, 30.20s/it] 99%|█████████▊| 17030/17285 [152:33:00<2:16:11, 32.05s/it] {'loss': 1.271, 'learning_rate': 3.6548853955771235e-07, 'epoch': 2.96} + 99%|█████████▊| 17030/17285 [152:33:00<2:16:11, 32.05s/it] 99%|█████████▊| 17031/17285 [152:33:30<2:13:10, 31.46s/it] 99%|█████████▊| 17032/17285 [152:34:07<2:19:19, 33.04s/it] 99%|█████████▊| 17033/17285 [152:34:38<2:17:20, 32.70s/it] 99%|█████████▊| 17034/17285 [152:35:13<2:19:04, 33.25s/it] 99%|█████████▊| 17035/17285 [152:35:43<2:14:46, 32.35s/it] 99%|█████████▊| 17036/17285 [152:36:20<2:19:13, 33.55s/it] 99%|█████████▊| 17037/17285 [152:36:47<2:11:23, 31.79s/it] 99%|█████████▊| 17038/17285 [152:37:26<2:19:31, 33.89s/it] 99%|█████████▊| 17039/17285 [152:37:52<2:08:51, 31.43s/it] 99%|█████████▊| 17040/17285 [152:38:17<2:00:38, 29.54s/it] {'loss': 1.298, 'learning_rate': 3.493279248699355e-07, 'epoch': 2.96} + 99%|█████████▊| 17040/17285 [152:38:17<2:00:38, 29.54s/it] 99%|█████████▊| 17041/17285 [152:38:50<2:04:27, 30.61s/it] 99%|█████████▊| 17042/17285 [152:39:27<2:11:57, 32.58s/it] 99%|█████████▊| 17043/17285 [152:40:04<2:16:21, 33.81s/it] 99%|█████████▊| 17044/17285 [152:40:34<2:12:02, 32.87s/it] 99%|█████████▊| 17045/17285 [152:41:08<2:12:22, 33.09s/it] 99%|█████████▊| 17046/17285 [152:41:39<2:09:44, 32.57s/it] 99%|█████████▊| 17047/17285 [152:42:09<2:05:46, 31.71s/it] 99%|█████████▊| 17048/17285 [152:42:43<2:07:37, 32.31s/it] 99%|█████████▊| 17049/17285 [152:43:12<2:03:32, 31.41s/it] 99%|█████████▊| 17050/17285 [152:43:48<2:08:40, 32.85s/it] {'loss': 1.2695, 'learning_rate': 3.3353209229913806e-07, 'epoch': 2.96} + 99%|█████████▊| 17050/17285 [152:43:48<2:08:40, 32.85s/it] 99%|█████████▊| 17051/17285 [152:44:24<2:10:59, 33.59s/it] 99%|█████████▊| 17052/17285 [152:44:56<2:08:23, 33.06s/it] 99%|█████████▊| 17053/17285 [152:45:26<2:05:06, 32.36s/it] 99%|█████████▊| 17054/17285 [152:45:56<2:01:25, 31.54s/it] 99%|█████████▊| 17055/17285 [152:46:20<1:52:48, 29.43s/it] 99%|█████████▊| 17056/17285 [152:46:50<1:52:16, 29.42s/it] 99%|█████████▊| 17057/17285 [152:47:24<1:57:16, 30.86s/it] 99%|█████████▊| 17058/17285 [152:47:59<2:01:19, 32.07s/it] 99%|█████████▊| 17059/17285 [152:48:35<2:05:43, 33.38s/it] 99%|█████████▊| 17060/17285 [152:49:12<2:08:38, 34.30s/it] {'loss': 1.2698, 'learning_rate': 3.181010996677003e-07, 'epoch': 2.96} + 99%|█████████▊| 17060/17285 [152:49:12<2:08:38, 34.30s/it] 99%|█████████▊| 17061/17285 [152:49:36<1:56:25, 31.19s/it] 99%|█████████▊| 17062/17285 [152:50:06<1:54:28, 30.80s/it] 99%|█████████▊| 17063/17285 [152:50:38<1:55:24, 31.19s/it] 99%|█████████▊| 17064/17285 [152:51:18<2:04:31, 33.81s/it] 99%|█████████▊| 17065/17285 [152:51:49<2:01:21, 33.10s/it] 99%|█████████▊| 17066/17285 [152:52:16<1:54:10, 31.28s/it] 99%|█████████▊| 17067/17285 [152:52:49<1:55:02, 31.66s/it] 99%|█████████▊| 17068/17285 [152:53:15<1:49:15, 30.21s/it] 99%|█████████▉| 17069/17285 [152:53:51<1:54:57, 31.93s/it] 99%|█████████▉| 17070/17285 [152:54:33<2:04:51, 34.84s/it] {'loss': 1.266, 'learning_rate': 3.030350034624374e-07, 'epoch': 2.96} + 99%|█████████▉| 17070/17285 [152:54:33<2:04:51, 34.84s/it] 99%|█████████▉| 17071/17285 [152:54:58<1:53:54, 31.94s/it] 99%|█████████▉| 17072/17285 [152:55:31<1:54:34, 32.28s/it] 99%|█████████▉| 17073/17285 [152:56:02<1:51:53, 31.67s/it] 99%|█████████▉| 17074/17285 [152:56:27<1:45:07, 29.89s/it] 99%|█████████▉| 17075/17285 [152:56:54<1:41:15, 28.93s/it] 99%|█████████▉| 17076/17285 [152:57:33<1:51:23, 31.98s/it] 99%|█████████▉| 17077/17285 [152:58:05<1:50:27, 31.86s/it] 99%|█████████▉| 17078/17285 [152:58:51<2:04:32, 36.10s/it] 99%|█████████▉| 17079/17285 [152:59:28<2:04:46, 36.34s/it] 99%|█████████▉| 17080/17285 [152:59:59<1:59:19, 34.92s/it] {'loss': 1.3003, 'learning_rate': 2.88333858834422e-07, 'epoch': 2.96} + 99%|█████████▉| 17080/17285 [152:59:59<1:59:19, 34.92s/it] 99%|█████████▉| 17081/17285 [153:00:35<1:59:46, 35.23s/it] 99%|█████████▉| 17082/17285 [153:01:06<1:54:38, 33.89s/it] 99%|█████████▉| 17083/17285 [153:01:36<1:50:34, 32.85s/it] 99%|█████████▉| 17084/17285 [153:02:12<1:52:40, 33.63s/it] 99%|█████████▉| 17085/17285 [153:02:44<1:50:21, 33.11s/it] 99%|█████████▉| 17086/17285 [153:03:14<1:47:00, 32.26s/it] 99%|█████████▉| 17087/17285 [153:03:44<1:44:46, 31.75s/it] 99%|█████████▉| 17088/17285 [153:04:13<1:40:40, 30.66s/it] 99%|█████████▉| 17089/17285 [153:04:44<1:41:21, 31.03s/it] 99%|█████████▉| 17090/17285 [153:05:12<1:37:33, 30.02s/it] {'loss': 1.2441, 'learning_rate': 2.7399771959880637e-07, 'epoch': 2.97} + 99%|█████████▉| 17090/17285 [153:05:12<1:37:33, 30.02s/it] 99%|█████████▉| 17091/17285 [153:05:36<1:30:56, 28.13s/it] 99%|█████████▉| 17092/17285 [153:06:10<1:35:58, 29.84s/it] 99%|█████████▉| 17093/17285 [153:06:34<1:30:28, 28.27s/it] 99%|█████████▉| 17094/17285 [153:07:07<1:34:18, 29.62s/it] 99%|█████████▉| 17095/17285 [153:07:33<1:30:19, 28.52s/it] 99%|█████████▉| 17096/17285 [153:08:05<1:33:04, 29.55s/it] 99%|█████████▉| 17097/17285 [153:08:33<1:31:26, 29.19s/it] 99%|█████████▉| 17098/17285 [153:09:07<1:35:39, 30.69s/it] 99%|█████████▉| 17099/17285 [153:09:37<1:33:36, 30.20s/it] 99%|█████████▉| 17100/17285 [153:10:16<1:41:48, 33.02s/it] {'loss': 1.2973, 'learning_rate': 2.600266382345895e-07, 'epoch': 2.97} + 99%|█████████▉| 17100/17285 [153:10:16<1:41:48, 33.02s/it] 99%|█████████▉| 17101/17285 [153:10:44<1:36:33, 31.49s/it] 99%|█████████▉| 17102/17285 [153:11:14<1:35:03, 31.16s/it] 99%|█████████▉| 17103/17285 [153:11:40<1:29:45, 29.59s/it] 99%|█████████▉| 17104/17285 [153:12:12<1:31:05, 30.19s/it] 99%|█████████▉| 17105/17285 [153:12:47<1:34:35, 31.53s/it] 99%|█████████▉| 17106/17285 [153:13:21<1:36:48, 32.45s/it] 99%|█████████▉| 17107/17285 [153:13:48<1:31:28, 30.83s/it] 99%|█████████▉| 17108/17285 [153:14:20<1:31:24, 30.99s/it] 99%|█████████▉| 17109/17285 [153:14:52<1:31:53, 31.33s/it] 99%|█████████▉| 17110/17285 [153:15:24<1:32:30, 31.71s/it] {'loss': 1.2442, 'learning_rate': 2.4642066588441705e-07, 'epoch': 2.97} + 99%|█████████▉| 17110/17285 [153:15:24<1:32:30, 31.71s/it] 99%|█████████▉| 17111/17285 [153:15:56<1:31:57, 31.71s/it] 99%|█████████▉| 17112/17285 [153:16:34<1:36:46, 33.56s/it] 99%|█████████▉| 17113/17285 [153:17:07<1:35:26, 33.30s/it] 99%|█████████▉| 17114/17285 [153:17:37<1:32:02, 32.30s/it] 99%|█████████▉| 17115/17285 [153:18:03<1:26:19, 30.47s/it] 99%|█████████▉| 17116/17285 [153:18:28<1:21:39, 28.99s/it] 99%|█████████▉| 17117/17285 [153:18:59<1:22:44, 29.55s/it] 99%|█████████▉| 17118/17285 [153:19:31<1:24:02, 30.19s/it] 99%|█████████▉| 17119/17285 [153:20:06<1:27:55, 31.78s/it] 99%|█████████▉| 17120/17285 [153:20:41<1:29:47, 32.65s/it] {'loss': 1.2901, 'learning_rate': 2.3317985235443707e-07, 'epoch': 2.97} + 99%|█████████▉| 17120/17285 [153:20:41<1:29:47, 32.65s/it] 99%|█████████▉| 17121/17285 [153:21:14<1:29:13, 32.65s/it] 99%|█████████▉| 17122/17285 [153:21:47<1:29:14, 32.85s/it] 99%|█████████▉| 17123/17285 [153:22:20<1:29:08, 33.02s/it][2023-08-29 09:17:29,216] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 + 99%|█████████▉| 17124/17285 [153:22:52<1:27:03, 32.45s/it] 99%|█████████▉| 17125/17285 [153:23:28<1:29:29, 33.56s/it] 99%|█████████▉| 17126/17285 [153:23:58<1:26:11, 32.52s/it] 99%|█████████▉| 17127/17285 [153:24:28<1:23:35, 31.74s/it] 99%|█████████▉| 17128/17285 [153:25:00<1:23:25, 31.88s/it] 99%|█████████▉| 17129/17285 [153:25:32<1:23:05, 31.96s/it] 99%|█████████▉| 17130/17285 [153:26:01<1:19:57, 30.95s/it] {'loss': 1.2668, 'learning_rate': 2.215753710563373e-07, 'epoch': 2.97} + 99%|█████████▉| 17130/17285 [153:26:01<1:19:57, 30.95s/it] 99%|█████████▉| 17131/17285 [153:26:35<1:22:06, 31.99s/it] 99%|█████████▉| 17132/17285 [153:27:04<1:19:01, 30.99s/it] 99%|█████████▉| 17133/17285 [153:27:35<1:18:38, 31.04s/it] 99%|█████████▉| 17134/17285 [153:28:05<1:17:44, 30.89s/it] 99%|█████████▉| 17135/17285 [153:28:33<1:14:28, 29.79s/it] 99%|█████████▉| 17136/17285 [153:29:02<1:13:18, 29.52s/it] 99%|█████████▉| 17137/17285 [153:29:34<1:14:48, 30.33s/it] 99%|█████████▉| 17138/17285 [153:30:12<1:19:48, 32.58s/it] 99%|█████████▉| 17139/17285 [153:30:38<1:15:03, 30.85s/it] 99%|█████████▉| 17140/17285 [153:31:06<1:12:32, 30.01s/it] {'loss': 1.2799, 'learning_rate': 2.0902849171310356e-07, 'epoch': 2.97} + 99%|█████████▉| 17140/17285 [153:31:06<1:12:32, 30.01s/it] 99%|█████████▉| 17141/17285 [153:31:41<1:15:02, 31.27s/it] 99%|█████████▉| 17142/17285 [153:32:15<1:16:46, 32.22s/it] 99%|█████████▉| 17143/17285 [153:32:47<1:16:21, 32.27s/it] 99%|█████████▉| 17144/17285 [153:33:20<1:16:14, 32.44s/it] 99%|█████████▉| 17145/17285 [153:33:55<1:17:03, 33.03s/it] 99%|█████████▉| 17146/17285 [153:34:26<1:15:07, 32.43s/it] 99%|█████████▉| 17147/17285 [153:34:56<1:13:22, 31.90s/it] 99%|█████████▉| 17148/17285 [153:35:26<1:11:00, 31.10s/it] 99%|█████���███▉| 17149/17285 [153:36:02<1:13:46, 32.55s/it] 99%|█████████▉| 17150/17285 [153:36:35<1:13:38, 32.73s/it] {'loss': 1.2652, 'learning_rate': 1.968469080681823e-07, 'epoch': 2.98} + 99%|█████████▉| 17150/17285 [153:36:35<1:13:38, 32.73s/it] 99%|█████████▉| 17151/17285 [153:37:09<1:14:07, 33.19s/it] 99%|█████████▉| 17152/17285 [153:37:39<1:11:21, 32.19s/it] 99%|█████████▉| 17153/17285 [153:38:14<1:12:45, 33.07s/it] 99%|█████████▉| 17154/17285 [153:38:44<1:10:03, 32.09s/it] 99%|█████████▉| 17155/17285 [153:39:15<1:09:13, 31.95s/it] 99%|█████████▉| 17156/17285 [153:39:48<1:09:24, 32.28s/it] 99%|█████████▉| 17157/17285 [153:40:12<1:03:11, 29.62s/it] 99%|█████████▉| 17158/17285 [153:40:47<1:06:02, 31.20s/it][2023-08-29 09:35:49,452] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 + 99%|█████████▉| 17159/17285 [153:41:12<1:01:37, 29.35s/it] 99%|█████████▉| 17160/17285 [153:41:45<1:03:45, 30.61s/it] {'loss': 1.2909, 'learning_rate': 1.8619584749273167e-07, 'epoch': 2.98} + 99%|█████████▉| 17160/17285 [153:41:45<1:03:45, 30.61s/it] 99%|█████████▉| 17161/17285 [153:42:11<1:00:14, 29.15s/it] 99%|█████████▉| 17162/17285 [153:42:38<58:37, 28.60s/it] 99%|█████████▉| 17163/17285 [153:43:13<1:01:41, 30.34s/it] 99%|█████████▉| 17164/17285 [153:43:49<1:04:30, 31.99s/it] 99%|█████████▉| 17165/17285 [153:44:23<1:05:24, 32.70s/it] 99%|█████████▉| 17166/17285 [153:44:54<1:04:06, 32.32s/it] 99%|█████████▉| 17167/17285 [153:45:25<1:02:23, 31.72s/it] 99%|█████████▉| 17168/17285 [153:45:51<58:37, 30.06s/it] 99%|█████████▉| 17169/17285 [153:46:23<59:02, 30.54s/it] 99%|█████████▉| 17170/17285 [153:46:58<1:01:05, 31.87s/it] {'loss': 1.2791, 'learning_rate': 1.747084474202576e-07, 'epoch': 2.98} + 99%|█████████▉| 17170/17285 [153:46:58<1:01:05, 31.87s/it] 99%|█████████▉| 17171/17285 [153:47:38<1:05:27, 34.45s/it] 99%|█████████▉| 17172/17285 [153:48:09<1:02:50, 33.37s/it] 99%|█████████▉| 17173/17285 [153:48:42<1:02:16, 33.36s/it] 99%|█████████▉| 17174/17285 [153:49:11<58:59, 31.89s/it] 99%|█████████▉| 17175/17285 [153:49:40<57:15, 31.23s/it] 99%|█████████▉| 17176/17285 [153:50:11<56:15, 30.97s/it] 99%|█████████▉| 17177/17285 [153:50:54<1:02:17, 34.61s/it] 99%|█████████▉| 17178/17285 [153:51:25<59:46, 33.52s/it] 99%|█████████▉| 17179/17285 [153:51:58<59:07, 33.46s/it] 99%|█████████▉| 17180/17285 [153:52:28<56:55, 32.53s/it] {'loss': 1.269, 'learning_rate': 1.6358646867835615e-07, 'epoch': 2.98} + 99%|█████████▉| 17180/17285 [153:52:29<56:55, 32.53s/it] 99%|█████████▉| 17181/17285 [153:53:04<57:53, 33.40s/it] 99%|█████████▉| 17182/17285 [153:53:36<56:36, 32.98s/it] 99%|█████████▉| 17183/17285 [153:54:07<54:56, 32.32s/it] 99%|█████████▉| 17184/17285 [153:54:33<51:30, 30.60s/it] 99%|█████████▉| 17185/17285 [153:55:04<50:52, 30.53s/it] 99%|█████████▉| 17186/17285 [153:55:41<53:43, 32.56s/it] 99%|█████████▉| 17187/17285 [153:56:11<51:57, 31.81s/it] 99%|█████████▉| 17188/17285 [153:56:39<49:40, 30.73s/it] 99%|█████████▉| 17189/17285 [153:57:17<52:45, 32.98s/it] 99%|█████████▉| 17190/17285 [153:57:47<50:47, 32.08s/it] {'loss': 1.2189, 'learning_rate': 1.5282995198021565e-07, 'epoch': 2.98} + 99%|█████████▉| 17190/17285 [153:57:47<50:47, 32.08s/it] 99%|█████████▉| 17191/17285 [153:58:13<47:19, 30.21s/it] 99%|█████████▉| 17192/17285 [153:58:38<44:15, 28.55s/it] 99%|█████████▉| 17193/17285 [153:59:12<46:08, 30.09s/it] 99%|█████████▉| 17194/17285 [153:59:49<48:56, 32.27s/it] 99%|█████████▉| 17195/17285 [154:00:20<47:37, 31.75s/it] 99%|█████████▉| 17196/17285 [154:00:50<46:24, 31.29s/it] 99%|█████████▉| 17197/17285 [154:01:24<47:23, 32.31s/it] 99%|█████████▉| 17198/17285 [154:01:54<45:37, 31.47s/it] 100%|█████████▉| 17199/17285 [154:02:30<46:53, 32.71s/it] 100%|█████████▉| 17200/17285 [154:03:03<46:32, 32.85s/it] {'loss': 1.2991, 'learning_rate': 1.424389367012613e-07, 'epoch': 2.99} + 100%|█████████▉| 17200/17285 [154:03:03<46:32, 32.85s/it] 100%|█████████▉| 17201/17285 [154:03:33<44:54, 32.08s/it] 100%|█████████▉| 17202/17285 [154:04:13<47:29, 34.34s/it] 100%|█████████▉| 17203/17285 [154:04:41<44:28, 32.54s/it] 100%|█████████▉| 17204/17285 [154:05:07<41:27, 30.71s/it] 100%|█████████▉| 17205/17285 [154:05:39<41:13, 30.92s/it] 100%|█████████▉| 17206/17285 [154:06:15<42:51, 32.55s/it] 100%|█████████▉| 17207/17285 [154:06:40<39:26, 30.34s/it] 100%|█████████▉| 17208/17285 [154:07:08<37:51, 29.49s/it] 100%|█████████▉| 17209/17285 [154:07:44<39:50, 31.45s/it] 100%|█████████▉| 17210/17285 [154:08:14<38:53, 31.11s/it] {'loss': 1.2893, 'learning_rate': 1.3241346087892182e-07, 'epoch': 2.99} + 100%|█████████▉| 17210/17285 [154:08:14<38:53, 31.11s/it] 100%|█████████▉| 17211/17285 [154:08:44<37:54, 30.74s/it] 100%|█████████▉| 17212/17285 [154:09:14<37:07, 30.52s/it] 100%|█████████▉| 17213/17285 [154:09:51<38:51, 32.38s/it] 100%|█████████▉| 17214/17285 [154:10:27<39:30, 33.39s/it] 100%|█████████▉| 17215/17285 [154:11:05<40:34, 34.78s/it] 100%|█████████▉| 17216/17285 [154:11:39<39:59, 34.77s/it] 100%|█████████▉| 17217/17285 [154:12:10<38:08, 33.66s/it] 100%|█████████▉| 17218/17285 [154:12:35<34:33, 30.94s/it] 100%|█████████▉| 17219/17285 [154:13:05<33:45, 30.69s/it] 100%|█████████▉| 17220/17285 [154:13:31<31:47, 29.34s/it] {'loss': 1.2596, 'learning_rate': 1.2275356121254077e-07, 'epoch': 2.99} + 100%|█████████▉| 17220/17285 [154:13:31<31:47, 29.34s/it] 100%|█████████▉| 17221/17285 [154:14:04<32:18, 30.28s/it] 100%|█████████▉| 17222/17285 [154:14:40<33:39, 32.06s/it] 100%|█████████▉| 17223/17285 [154:15:09<32:20, 31.29s/it] 100%|█████████▉| 17224/17285 [154:15:42<32:11, 31.67s/it] 100%|█████████▉| 17225/17285 [154:16:07<29:42, 29.71s/it] 100%|█████████▉| 17226/17285 [154:16:37<29:21, 29.85s/it] 100%|█████████▉| 17227/17285 [154:17:09<29:30, 30.53s/it] 100%|█████████▉| 17228/17285 [154:17:47<30:58, 32.60s/it] 100%|█████████▉| 17229/17285 [154:18:14<28:47, 30.84s/it] 100%|█████████▉| 17230/17285 [154:18:55<31:16, 34.12s/it] {'loss': 1.2393, 'learning_rate': 1.1345927306323224e-07, 'epoch': 2.99} + 100%|█████████▉| 17230/17285 [154:18:55<31:16, 34.12s/it] 100%|█████████▉| 17231/17285 [154:19:29<30:28, 33.86s/it] 100%|█████████▉| 17232/17285 [154:20:04<30:22, 34.40s/it] 100%|█████████▉| 17233/17285 [154:20:33<28:14, 32.58s/it] 100%|█████████▉| 17234/17285 [154:21:04<27:20, 32.17s/it] 100%|█████████▉| 17235/17285 [154:21:32<25:44, 30.89s/it] 100%|█████████▉| 17236/17285 [154:21:58<24:01, 29.42s/it] 100%|█████████▉| 17237/17285 [154:22:28<23:37, 29.54s/it] 100%|█████████▉| 17238/17285 [154:23:01<24:03, 30.72s/it] 100%|█████████▉| 17239/17285 [154:23:32<23:36, 30.79s/it] 100%|█████████▉| 17240/17285 [154:24:01<22:41, 30.25s/it] {'loss': 1.2814, 'learning_rate': 1.0453063045375855e-07, 'epoch': 2.99} + 100%|█████████▉| 17240/17285 [154:24:01<22:41, 30.25s/it] 100%|█████████▉| 17241/17285 [154:24:35<22:55, 31.25s/it] 100%|█████████▉| 17242/17285 [154:25:07<22:44, 31.73s/it] 100%|█████████▉| 17243/17285 [154:25:36<21:37, 30.89s/it] 100%|█████████▉| 17244/17285 [154:26:06<20:48, 30.46s/it] 100%|█████████▉| 17245/17285 [154:26:45<22:00, 33.02s/it] 100%|█████████▉| 17246/17285 [154:27:11<20:10, 31.04s/it] 100%|█████████▉| 17247/17285 [154:27:46<20:24, 32.22s/it] 100%|█████████▉| 17248/17285 [154:28:15<19:15, 31.24s/it] 100%|█████████▉| 17249/17285 [154:28:55<20:12, 33.68s/it] 100%|█████████▉| 17250/17285 [154:29:26<19:21, 33.17s/it] {'loss': 1.2632, 'learning_rate': 9.596766606836393e-08, 'epoch': 2.99} + 100%|█████████▉| 17250/17285 [154:29:27<19:21, 33.17s/it] 100%|█████████▉| 17251/17285 [154:30:10<20:30, 36.20s/it] 100%|█████████▉| 17252/17285 [154:30:41<19:03, 34.64s/it] 100%|█████████▉| 17253/17285 [154:31:08<17:18, 32.47s/it] 100%|█████████▉| 17254/17285 [154:31:40<16:38, 32.21s/it] 100%|█████████▉| 17255/17285 [154:32:14<16:26, 32.88s/it] 100%|█████████▉| 17256/17285 [154:32:41<14:59, 31.02s/it] 100%|█████████▉| 17257/17285 [154:33:13<14:40, 31.44s/it] 100%|█████████▉| 17258/17285 [154:33:40<13:29, 29.99s/it] 100%|█████████▉| 17259/17285 [154:34:07<12:36, 29.08s/it] 100%|█████████▉| 17260/17285 [154:34:32<11:39, 27.96s/it] {'loss': 1.2705, 'learning_rate': 8.777041125273e-08, 'epoch': 3.0} + 100%|█████████▉| 17260/17285 [154:34:32<11:39, 27.96s/it] 100%|█████████▉| 17261/17285 [154:35:02<11:22, 28.42s/it] 100%|█████████▉| 17262/17285 [154:35:33<11:13, 29.28s/it] 100%|█████████▉| 17263/17285 [154:36:13<11:56, 32.58s/it] 100%|█████████▉| 17264/17285 [154:36:45<11:20, 32.41s/it] 100%|█████████▉| 17265/17285 [154:37:19<10:58, 32.90s/it] 100%|█████████▉| 17266/17285 [154:37:52<10:24, 32.87s/it] 100%|█████████▉| 17267/17285 [154:38:32<10:26, 34.83s/it] 100%|█████████▉| 17268/17285 [154:39:05<09:43, 34.34s/it] 100%|█████████▉| 17269/17285 [154:39:35<08:48, 33.00s/it] 100%|█████████▉| 17270/17285 [154:40:07<08:13, 32.91s/it] {'loss': 1.2207, 'learning_rate': 7.993889601378701e-08, 'epoch': 3.0} + 100%|█████████▉| 17270/17285 [154:40:07<08:13, 32.91s/it] 100%|█████████▉| 17271/17285 [154:40:39<07:33, 32.40s/it] 100%|█████████▉| 17272/17285 [154:41:11<06:59, 32.28s/it] 100%|█████████▉| 17273/17285 [154:41:36<06:02, 30.22s/it] 100%|█████████▉| 17274/17285 [154:42:05<05:28, 29.85s/it] 100%|█████████▉| 17275/17285 [154:42:34<04:56, 29.68s/it] 100%|█████████▉| 17276/17285 [154:43:10<04:42, 31.43s/it] 100%|█████████▉| 17277/17285 [154:43:50<04:32, 34.06s/it] 100%|█████████▉| 17278/17285 [154:44:18<03:45, 32.26s/it] 100%|█████████▉| 17279/17285 [154:44:56<03:24, 34.04s/it] 100%|█████████▉| 17280/17285 [154:45:27<02:45, 33.03s/it] {'loss': 1.2858, 'learning_rate': 7.24731490196584e-08, 'epoch': 3.0} + 100%|█████████▉| 17280/17285 [154:45:27<02:45, 33.03s/it] 100%|█████████▉| 17281/17285 [154:46:02<02:14, 33.63s/it] 100%|█████████▉| 17282/17285 [154:46:35<01:40, 33.53s/it] 100%|█████████▉| 17283/17285 [154:47:03<01:03, 31.82s/it] 100%|█████████▉| 17284/17285 [154:47:40<00:33, 33.34s/it] 100%|██████████| 17285/17285 [154:48:10<00:00, 32.31s/it][INFO|trainer.py:1934] 2023-08-29 10:42:47,498 >> + +Training completed. Do not forget to share your model on huggingface.co/models =) + + + {'train_runtime': 557290.3071, 'train_samples_per_second': 3.97, 'train_steps_per_second': 0.031, 'train_loss': 1.5514631569575188, 'epoch': 3.0} + 100%|██████████| 17285/17285 [154:48:10<00:00, 32.31s/it] 100%|██████████| 17285/17285 [154:48:10<00:00, 32.24s/it] +***** train metrics ***** + epoch = 3.0 + train_loss = 1.5515 + train_runtime = 6 days, 10:48:10.30 + train_samples = 737503 + train_samples_per_second = 3.97 + train_steps_per_second = 0.031 +Save Peft Config at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/last_model +[INFO|tokenization_utils_base.py:2210] 2023-08-29 10:43:10,663 >> tokenizer config file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/last_model/tokenizer_config.json +[INFO|tokenization_utils_base.py:2217] 2023-08-29 10:43:10,668 >> Special tokens file saved in 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/last_model/special_tokens_map.json +Save Tokenizer at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/last_model +Save adapter model at 20230817_Llama-2-chat-13B_Belle-alpaca-50W-LawQA-20W-CodeReviewPython_37421_737503/last_model +08/29/2023 10:43:14 - INFO - __main__ - *** Evaluate *** +[INFO|trainer.py:3081] 2023-08-29 10:43:14,684 >> ***** Running Evaluation ***** +[INFO|trainer.py:3083] 2023-08-29 10:43:14,684 >> Num examples = 524 +[INFO|trainer.py:3086] 2023-08-29 10:43:14,684 >> Batch size = 2 + 0%| | 0/33 [00:00