Spaces:

evaluate-metric
/

meteor

Running

lvwerra HF Staff commited on 9 days ago

Commit

4b57ab1

1 Parent(s): 8ce4d9c

Update Space (evaluate main: b4d71080)

Files changed (2) hide show

README.md CHANGED Viewed

@@ -116,6 +116,9 @@ While the correlation between METEOR and human judgments was measured for Chines
 Furthermore, while the alignment and matching done in METEOR is based on unigrams, using multiple word entities (e.g. bigrams) could contribute to improving its accuracy -- this has been proposed in [more recent publications](https://www.cs.cmu.edu/~alavie/METEOR/pdf/meteor-naacl-2010.pdf) on the subject.
 ## Citation

 Furthermore, while the alignment and matching done in METEOR is based on unigrams, using multiple word entities (e.g. bigrams) could contribute to improving its accuracy -- this has been proposed in [more recent publications](https://www.cs.cmu.edu/~alavie/METEOR/pdf/meteor-naacl-2010.pdf) on the subject.
+Scores differ by up to **±10 points** across v1.0↔v1.5 and flag combinations (`-l`, `-norm`, `-vOut`).
+Pin the Java package and document your flags. This uses the NLTK implementation (METEOR v1.0).
+[Lübbers, 2024](https://github.com/cluebbers/Reproducibility-METEOR-NLP)
 ## Citation

requirements.txt CHANGED Viewed

	@@ -1,2 +1,2 @@
1	- git+https://github.com/huggingface/evaluate@~~56af7abbb160fa2a5a3c0d268f2bfd3baff8015c~~
2	nltk


1	+ git+https://github.com/huggingface/evaluate@b4d710804b601459dc9266ad4622dcbe9b056d26
2	nltk