title: distinct
datasets:
- None
tags:
- evaluate
- measurement
description: 'TODO: add a description here'
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
Measurement Card for distinct
Module Card Instructions: Fill out the following subsections. Feel free to take a look at existing measurement cards if you'd like examples.
Measurement Description
Give a brief overview of this measurement, including what task(s) it is usually used for, if any.
How to Use
Give general statement of how to use the measurement
Provide simplest possible example for using the measurement
Inputs
List all input arguments in the format below
- predictions (list of strings): list of sentences to test diversity. Each prediction should be a string.
- mode (string): 'Expectation-Adjusted-Distinct' or 'Distinct' for diversity calculationg. If the value is 'Expectation-Adjusted-Distinct', the scores of the both modes will be returned. Default value is 'Expectation-Adjusted-Distinct'
- vocab_size (int): vocab_size for calculating 'Expectation-Adjusted-Distinct'. When calculating 'Expectation-Adjusted-Distinct', either vocab_size or dataForVocabCal should not be None. Default value is None
- dataForVocabCal (list of string): dataForVocabCal for calculating the vocab_size for 'Expectation-Adjusted-Distinct'. Typically, it should be a list of sentences consisting the task dataset. When calculating 'Expectation-Adjusted-Distinct', either vocab_size or dataForVocabCal should not be None. Default value is None
- tokenizer (string or tokenizer class): tokenizer for splitting sentences into words. Default value is "white_space". NLTK tokenizer is available.
Output Values
Explain what this measurement outputs and provide an example of what the measurement output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}
State the range of possible values that the measurement's output can take, as well as what in that range is considered good. For example: "This measurement can take on any value between 0 and 100, inclusive. Higher scores are better."
Values from Popular Papers
The Expectation-Adjusted-Distinct paper (Liu and Sabour et al. 2022) compares Expectation-Adjusted-Distinct scores of ten different methods with the original Distinct. These scores get higher human correlation from 0.56 to 0.65.
Examples
Example of calculate Expectation-Adjusted-Distinct byy giving voab_size or data for calculating vocab_size. This will also return Distinct-1,2,and 3.
>>> my_new_module = evaluate.load("lsy641/distinct")
>>> results = my_new_module.compute(references=["Hi.", "I'm sorry to hear that", "I don't know"], vocab_size=50257)
>>> print(results)
>>> dataset = ["This is my friend jack", "I'm sorry to hear that", "But you know I am the one who always support you", "Welcome to our family"]
>>> results = my_new_module.compute(references=["Hi.", "I'm sorry to hear that", "I don't know"], dataForVocabCal = dataset)
>>> print(results)
Example of calculate original Distinct. This will return Distinct-1,2,and 3.
>>> my_new_module = evaluate.load("lsy641/distinct")
>>> results = my_new_module.compute(references=["Hi.", "I'm sorry to hear that", "I don't know"], mode="Distinct")
>>> print(results)
Limitations and Bias
Citation
@inproceedings{liu-etal-2022-rethinking,
title = "Rethinking and Refining the Distinct Metric",
author = "Liu, Siyang and
Sabour, Sahand and
Zheng, Yinhe and
Ke, Pei and
Zhu, Xiaoyan and
Huang, Minlie",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
year = "2022",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-short.86",
doi = "10.18653/v1/2022.acl-short.86",
}
@inproceedings{li-etal-2016-diversity,
title = "A Diversity-Promoting Objective Function for Neural Conversation Models",
author = "Li, Jiwei and
Galley, Michel and
Brockett, Chris and
Gao, Jianfeng and
Dolan, Bill",
booktitle = "Proceedings of the 2016 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
year = "2016",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N16-1014",
doi = "10.18653/v1/N16-1014",
}
Further References
Add any useful further references.