No max over multiple references?

#2
by glecorve - opened

Hi,

Isn't there any problem with the "mutliple references" case? I think the METEOR should compute the METEOR score for each reference of each hypothesis and keep the maximum score for each hypothesis, then return the mean over all predictions.

  • In NLTK, the meteor_score() function iterates over the multiple references and computes the max, but no mean is performed (this is up to the user).

  • In HuggingFace's wrapper, the single_meteor_score() function is directly accessed for each hypothesis and a mean is performed, but there is no loop over multiple references and no max.

But maybe I'm wrong :-).

Thanks,
Gwénolé.

Evaluate Metric org

Currently, multiple references are not supported but @sasha is working on adding them: https://github.com/huggingface/evaluate/pull/164

The Hugging Face wrapper only works with a single reference per sample and then the mean over all samples is taken.

Thanks for the confirmation. I patched it with another wrapper :-).

I think this would be helpful to mention this limitation in the documentation. Currently, it is written "references: a list of references for each prediction. Each reference should be a string with tokens separated by spaces.".

Thanks again.

This has been patched now. Thank you.

glecorve changed discussion status to closed

Sign up or log in to comment