Train script doesn't work

#7
by erickdp - opened

Hi, I encountered an issue while attempting to train a distillation model using a script. I'm using Colab Pro with a V100 GPU, but I received an error message stating that 'Distributed training is not currently supported.' Could you please assist me with this problem?"

image.png

Owner

Hi,

It seems like there are some unusual behaviors in the function parser.parse_args_into_dataclasses()here that result in training_args.local_rank = 0. It should default to -1 as stated here.

Also, we can't override that value by passing an argument to the script like:

!python distill_classifier.py \
--data_file ./multilingual-sentiments/train_unlabeled.txt \
--class_names_file ./multilingual-sentiments/class_names.txt \
--hypothesis_template "The sentiment of this text is {}." \
--teacher_name_or_path MoritzLaurer/mDeBERTa-v3-base-mnli-xnli \
--teacher_batch_size 32 \
--student_name_or_path distilbert-base-multilingual-cased \
--output_dir ./distilbert-base-multilingual-cased-sentiments-student \
--per_device_train_batch_size 16 \
--local_rank -1 \  <--- doenst work
--fp1

The simplest solution is to hardcode that value by adding this line of code to L:268:

training_args.local_rank = -1
print(f"[DEBUG]: {training_args.local_rank = }")
if training_args.local_rank != -1:
    raise ValueError("Distributed training is not currently supported.")
if training_args.tpu_num_cores is not None:
    raise ValueError("TPU acceleration is not currently supported.")

logger.info(f"Training/evaluation parameters {training_args}")

I don't have time to dig deeper to find the root cause, but the workaround should be enough for your case and fix your issue.
Please let me know if you have any other issues.

Now it works, thanks.

erickdp changed discussion status to closed

Sign up or log in to comment