hallucinations-leaderboard/leaderboard · Adding SummEdits to leaderboard?

Hey,

First of all great initiative!
I'm doing some self-advertising to propose adding a relevant benchmark to the leaderboard, which is SummEdits.
SummEdits is a benchmark we introduced at EMNLP, which frames hallucination detection specifically in summarization, on ten textual domains (all English).

Humans achieve super high performance on the benchmark, and there's still a gap between GPT4 and humans. It's framed as a binary classification, which is very easy to eval. In total, there are ~6,000 annotated samples, but they could be subsampled if needed.

We've already put the data on HF here: https://huggingface.co/datasets/Salesforce/summedits

And if you guys are interested, I'm happy to help integrate if it'd be helpful (I imagine it'd be fairly easy).

Cheers,
Philippe