03/11/2022 18:06:50 - INFO - __main__ - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Use FP16 precision: False 03/11/2022 18:06:51 - WARNING - datasets.builder - Using custom data configuration default-4cec708a4f8db7dd 03/11/2022 18:06:51 - WARNING - datasets.builder - Reusing dataset csv (/home/splend1dchan/.cache/huggingface/datasets/csv/default-4cec708a4f8db7dd/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff) loading configuration file https://huggingface.co/microsoft/deberta-large/resolve/main/config.json from cache at /home/splend1dchan/.cache/huggingface/transformers/7c686202d9db9b0aee3e649d42a50257a76d278858dc7ad32b886f02cf8303e4.5286a902fea63d3276108ffa66a65e2b4355a7df6cfab5be091bf20f7eae85f8 Model config DebertaConfig { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2" }, "initializer_range": 0.02, "intermediate_size": 4096, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2 }, "layer_norm_eps": 1e-07, "max_position_embeddings": 512, "max_relative_positions": -1, "model_type": "deberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 0, "pooler_dropout": 0, "pooler_hidden_act": "gelu", "pooler_hidden_size": 1024, "pos_att_type": [ "c2p", "p2c" ], "position_biased_input": false, "relative_attention": true, "transformers_version": "4.12.2", "type_vocab_size": 0, "vocab_size": 50265 } loading configuration file https://huggingface.co/microsoft/deberta-large/resolve/main/config.json from cache at /home/splend1dchan/.cache/huggingface/transformers/7c686202d9db9b0aee3e649d42a50257a76d278858dc7ad32b886f02cf8303e4.5286a902fea63d3276108ffa66a65e2b4355a7df6cfab5be091bf20f7eae85f8 Model config DebertaConfig { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-07, "max_position_embeddings": 512, "max_relative_positions": -1, "model_type": "deberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 0, "pooler_dropout": 0, "pooler_hidden_act": "gelu", "pooler_hidden_size": 1024, "pos_att_type": [ "c2p", "p2c" ], "position_biased_input": false, "relative_attention": true, "transformers_version": "4.12.2", "type_vocab_size": 0, "vocab_size": 50265 } loading file https://huggingface.co/microsoft/deberta-large/resolve/main/vocab.json from cache at /home/splend1dchan/.cache/huggingface/transformers/4614a858d4552a0a399dc77bafbbeb75b20fe49259f690eb561898f8975626fa.e8ad27cc324bb0dc448d4d95f63e48f72688fb318a4c4c3f623485621b0b515c loading file https://huggingface.co/microsoft/deberta-large/resolve/main/merges.txt from cache at /home/splend1dchan/.cache/huggingface/transformers/7a87aa12b220b9a983b98dbd9ad35624b3fe2ce2e83d1ce621eddcdac1c04654.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b loading file https://huggingface.co/microsoft/deberta-large/resolve/main/tokenizer.json from cache at None loading file https://huggingface.co/microsoft/deberta-large/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/microsoft/deberta-large/resolve/main/special_tokens_map.json from cache at None loading file https://huggingface.co/microsoft/deberta-large/resolve/main/tokenizer_config.json from cache at /home/splend1dchan/.cache/huggingface/transformers/fa4e12e9e6e1a899fe94275a0e60bdc59474baa2cc8e6fa0c207c7d9caaa2598.a39abb1c6179fb264c2db685f9a056b7cb8d4bc48d729888d292a2280debf8e2 loading configuration file https://huggingface.co/microsoft/deberta-large/resolve/main/config.json from cache at /home/splend1dchan/.cache/huggingface/transformers/7c686202d9db9b0aee3e649d42a50257a76d278858dc7ad32b886f02cf8303e4.5286a902fea63d3276108ffa66a65e2b4355a7df6cfab5be091bf20f7eae85f8 Model config DebertaConfig { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-07, "max_position_embeddings": 512, "max_relative_positions": -1, "model_type": "deberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 0, "pooler_dropout": 0, "pooler_hidden_act": "gelu", "pooler_hidden_size": 1024, "pos_att_type": [ "c2p", "p2c" ], "position_biased_input": false, "relative_attention": true, "transformers_version": "4.12.2", "type_vocab_size": 0, "vocab_size": 50265 } loading configuration file https://huggingface.co/microsoft/deberta-large/resolve/main/config.json from cache at /home/splend1dchan/.cache/huggingface/transformers/7c686202d9db9b0aee3e649d42a50257a76d278858dc7ad32b886f02cf8303e4.5286a902fea63d3276108ffa66a65e2b4355a7df6cfab5be091bf20f7eae85f8 Model config DebertaConfig { "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-07, "max_position_embeddings": 512, "max_relative_positions": -1, "model_type": "deberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 0, "pooler_dropout": 0, "pooler_hidden_act": "gelu", "pooler_hidden_size": 1024, "pos_att_type": [ "c2p", "p2c" ], "position_biased_input": false, "relative_attention": true, "transformers_version": "4.12.2", "type_vocab_size": 0, "vocab_size": 50265 } loading weights file https://huggingface.co/microsoft/deberta-large/resolve/main/pytorch_model.bin from cache at /home/splend1dchan/.cache/huggingface/transformers/236b63dfb5e690fb2e194403aebda39508d60877a8903da58f4fff7a147ec0dd.6b3bbe51288bd6f66709e0f7e78d686b6074a5673aab102bbb32999a2f02e79a Some weights of the model checkpoint at microsoft/deberta-large were not used when initializing DebertaForSequenceClassification: ['lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight'] - This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of DebertaForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-large and are newly initialized: ['pooler.dense.bias', 'classifier.bias', 'classifier.weight', 'pooler.dense.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 03/11/2022 18:07:09 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/splend1dchan/.cache/huggingface/datasets/csv/default-4cec708a4f8db7dd/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff/cache-889b3969c83c1172.arrow 03/11/2022 18:07:09 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/splend1dchan/.cache/huggingface/datasets/csv/default-4cec708a4f8db7dd/0.0.0/9144e0a4e8435090117cea53e6c7537173ef2304525df4a077c435d8ee7828ff/cache-f128cc30aa93c23a.arrow 03/11/2022 18:07:09 - INFO - __main__ - Sample 2652 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [1, 37460, 5, 183, 23, 10, 380, 5145, 334, 11, 784, 24639, 8, 172, 939, 74, 213, 8, 3008, 23, 363, 8, 8, 292, 2697, 10, 186, 13, 4655, 2], 'labels': 1, 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. 03/11/2022 18:07:09 - INFO - __main__ - Sample 1235 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [1, 405, 21, 15, 4007, 1970, 172, 939, 95, 222, 24, 11, 5573, 5030, 16306, 23, 5, 37463, 2], 'labels': 1, 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. 03/11/2022 18:07:09 - INFO - __main__ - Sample 3234 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'input_ids': [1, 13724, 196, 23, 69, 223, 39, 33727, 53, 51, 51, 51, 218, 75, 7, 1642, 19, 5, 2], 'labels': 1, 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. 03/11/2022 18:07:14 - INFO - __main__ - ***** Running training ***** 03/11/2022 18:07:14 - INFO - __main__ - Num examples = 5729 03/11/2022 18:07:14 - INFO - __main__ - Num Epochs = 50 03/11/2022 18:07:14 - INFO - __main__ - Instantaneous batch size per device = 4 03/11/2022 18:07:14 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16 03/11/2022 18:07:14 - INFO - __main__ - Gradient Accumulation steps = 4 03/11/2022 18:07:14 - INFO - __main__ - Total optimization steps = 17950 0%| | 0/17950 [00:00