Spaces:

hallucinations-leaderboard
/

leaderboard

Running on CPU Upgrade

Longform QA

by shehzaadzd - opened Jan 29

Jan 29

The FactScore paper (https://arxiv.org/pdf/2305.14251.pdf) offers an automatic method to evaluate hallucination on long-form QA. They also provide a benchmark relating to biographies with a mix of entities.
Can this be integrated into the leaderboard?

pminervini

hallucinations-leaderboard org Feb 5

@shehzaadzd from a quick glance, it requires access to an OpenAI key -- for example, see this snippet from https://github.com/shmsw25/FActScore:

from factscore.factscorer import FactScorer

fs = FactScorer(openai_key="...")

# topics: list of strings (human entities used to generate bios)
# generations: list of strings (model generations)
out = fs.get_score(topics, generations, gamma=10)
print (out["score"]) # FActScore
[..]

How would you implement/include it?

shehzaadzd

Feb 7

The FActScore Llama-7b model only uses InstructGPT for splitting sentences into facts. It could be possible to train an open-source model to this if openAI models cannot be included in this benchmark.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment