Accuracy results seems to be wrong

#2
by ayazdan - opened

Thanks for the model. The accuracy of the model seems to be 79.42 (slightly lower). I am using the following script to evaluate:

python3 transformer-sparsity/examples/pytorch/text-classification/run_glue.py \
        --model_name_or_path ${ckpt} \
        --task_name "rte" \
        --do_eval \
        --max_seq_length 512 \
        --per_device_eval_batch_size 32 \
        --evaluation_strategy steps \
        --logging_steps ${eval_steps} \
        --logging_strategy steps \
        --overwrite_output_dir \
        --output_dir ${ckpt_path} 2>&1 | tee ~/${ckpt_path}/finetune_run_$(date +"%Y_%m_%d_%I_%M_%p").log

AutoEvaluator provided by Huggingface gives 0.791, which means that your results are actually slightly better.
The results that I posted was auto-generated by Trainer. I am assuming that the discrepancy should be imputed to the per_device_eval_batch_size argument. By default, Trainer would discard examples that cannot fill a full batch.

JeremiahZ changed discussion status to closed

Sign up or log in to comment