Accuracy results seems to be wrong
#2
by
ayazdan
- opened
Thanks for the model. The accuracy of the model seems to be 79.42 (slightly lower). I am using the following script to evaluate:
python3 transformer-sparsity/examples/pytorch/text-classification/run_glue.py \
--model_name_or_path ${ckpt} \
--task_name "rte" \
--do_eval \
--max_seq_length 512 \
--per_device_eval_batch_size 32 \
--evaluation_strategy steps \
--logging_steps ${eval_steps} \
--logging_strategy steps \
--overwrite_output_dir \
--output_dir ${ckpt_path} 2>&1 | tee ~/${ckpt_path}/finetune_run_$(date +"%Y_%m_%d_%I_%M_%p").log
AutoEvaluator
provided by Huggingface
gives 0.791
, which means that your results are actually slightly better.
The results that I posted was auto-generated by Trainer
. I am assuming that the discrepancy should be imputed to the per_device_eval_batch_size
argument. By default, Trainer
would discard examples that cannot fill a full batch.
JeremiahZ
changed discussion status to
closed