Adding SummEdits to leaderboard?

#12
by philippelaban - opened

Hey,

First of all great initiative!
I'm doing some self-advertising to propose adding a relevant benchmark to the leaderboard, which is SummEdits.
SummEdits is a benchmark we introduced at EMNLP, which frames hallucination detection specifically in summarization, on ten textual domains (all English).

Humans achieve super high performance on the benchmark, and there's still a gap between GPT4 and humans. It's framed as a binary classification, which is very easy to eval. In total, there are ~6,000 annotated samples, but they could be subsampled if needed.

We've already put the data on HF here: https://huggingface.co/datasets/Salesforce/summedits

And if you guys are interested, I'm happy to help integrate if it'd be helpful (I imagine it'd be fairly easy).

Cheers,
Philippe

hallucinations-leaderboard org
edited Feb 3

@philippelaban since you are at it, can you please add it to https://github.com/EdinburghNLP/awesome-hallucination-detection with a pull request? :)

And if you guys are interested, I'm happy to help integrate if it'd be helpful (I imagine it'd be fairly easy).

Sure, we can also do it together in 15-30 min! If you can add it to https://github.com/EleutherAI/lm-evaluation-harness, adding it to the leaderboard will be immediate.

Sign up or log in to comment