WARNING:root:Dropping 0 rows Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. /baie/nfs-cluster-1/data1/raid1/homedirs/eliot.maes/multimodal-itmodels/experiments/./ES_corlec is already a clone of https://huggingface.co/maesneako/ES_corlec. Make sure you pull the latest changes with `repo.git_pull()`. WARNING:huggingface_hub.repository:/baie/nfs-cluster-1/data1/raid1/homedirs/eliot.maes/multimodal-itmodels/experiments/./ES_corlec is already a clone of https://huggingface.co/maesneako/ES_corlec. Make sure you pull the latest changes with `repo.git_pull()`. The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: file, text_input_ids_full, start_idx, text, speaker, text_u_full, text_u, index, text_input_ids, __index_level_0__, length. If file, text_input_ids_full, start_idx, text, speaker, text_u_full, text_u, index, text_input_ids, __index_level_0__, length are not expected by `GPT2LMHeadModel.forward`, you can safely ignore this message. /baie/nfs-cluster-1/data1/raid1/homedirs/eliot.maes/env/lib/python3.6/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ***** Running training ***** Num examples = 80691 Num Epochs = 7 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 Total optimization steps = 35308 0%| | 0/35308 [00:00