Spaces:

lsy641
/

distinct

Runtime error

lsy641 commited on Jul 7, 2023

Commit

4046cce

•

1 Parent(s): 9b985f5

distinct

Files changed (2) hide show

README.md CHANGED Viewed

@@ -77,7 +77,9 @@ Example of calculating original Distinct. This will return Distinct-1,2,and 3.
 ```
 ## Limitations and Bias
-TODO
 ## Citation
 ```bibtex

 ```
 ## Limitations and Bias
+As EAD (Expectation-Adjusted-Distinct) is based on the idealized assumption that does not take language distribution into account, we further discuss this problem and propose a potential practical way of Expectation-Adjusted Distinct in real situations. Before applying EAD, it is necessary to explore the relationship between
+score and text length (Figure 1) and check the performance of EAD on the training data. To our knowledge, if the training data is from large-scale open-domain sources such as OpenSubtitles and Reddit, EAD can maintain its value on different lengths. Hence, it can be directly used for evaluating models trained on these datasets. However, we found our experiments on datasets such as Twitter showed a decline in EAD on lengthier texts. This is
+probably because input length limitations on these platforms (e.g. 280 words on Twitter), which induces users to say as much information as possible within a shorter length. In these situations, it is unfair to use EAD to evaluate methods that tend to generate lengthier texts.
 ## Citation
 ```bibtex

distinct.py CHANGED Viewed

@@ -54,7 +54,7 @@ _DESCRIPTION = """\
 Distinct metric is to calculate corpus-level diversity of language. We provide two versions of distinct score. Expectation-Adjusted-Distinct (EAD) is the default one, which removes
 the biases of the original distinct score on lengthier sentences. Distinct is the original version.
-![Comparison between original distinct and and EAD ](https://huggingface.co/spaces/lsy641/distinct/blob/main/distinct%20compare%20pic.jpg)
 For the use of Expectation-Adjusted-Distinct, vocab_size is required.

 Distinct metric is to calculate corpus-level diversity of language. We provide two versions of distinct score. Expectation-Adjusted-Distinct (EAD) is the default one, which removes
 the biases of the original distinct score on lengthier sentences. Distinct is the original version.
+![Comparison between original distinct and and EAD ](https://huggingface.co/spaces/lsy641/distinct/resolve/main/distinct_compare_pic.jpg)
 For the use of Expectation-Adjusted-Distinct, vocab_size is required.