diff --git "a/stderr.slurm" "b/stderr.slurm" --- "a/stderr.slurm" +++ "b/stderr.slurm" @@ -2,60996 +2,2554 @@ WARNING:root:Dropping 0 rows Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. /baie/nfs-cluster-1/data1/raid1/homedirs/eliot.maes/multimodal-itmodels/experiments/./ES_corlec is already a clone of https://huggingface.co/maesneako/ES_corlec. Make sure you pull the latest changes with `repo.git_pull()`. WARNING:huggingface_hub.repository:/baie/nfs-cluster-1/data1/raid1/homedirs/eliot.maes/multimodal-itmodels/experiments/./ES_corlec is already a clone of https://huggingface.co/maesneako/ES_corlec. Make sure you pull the latest changes with `repo.git_pull()`. -The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: text_input_ids, text_u_full, index, text, text_u, start_idx, speaker, text_input_ids_full, __index_level_0__, length, file. If text_input_ids, text_u_full, index, text, text_u, start_idx, speaker, text_input_ids_full, __index_level_0__, length, file are not expected by `GPT2LMHeadModel.forward`, you can safely ignore this message. +The following columns in the training set don't have a corresponding argument in `GPT2LMHeadModel.forward` and have been ignored: file, text_input_ids_full, start_idx, text, speaker, text_u_full, text_u, index, text_input_ids, __index_level_0__, length. If file, text_input_ids_full, start_idx, text, speaker, text_u_full, text_u, index, text_input_ids, __index_level_0__, length are not expected by `GPT2LMHeadModel.forward`, you can safely ignore this message. /baie/nfs-cluster-1/data1/raid1/homedirs/eliot.maes/env/lib/python3.6/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ***** Running training ***** Num examples = 80691 - Num Epochs = 10 + Num Epochs = 7 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 - Total optimization steps = 50440 - 0%| | 0/50440 [00:00