How are Faithfulness and Factuality calculated?

#22
by UjjwalP - opened

I am aware that metrics like ROUGE and FactKB are used to determine faithfulness and factuality respectively. But it is unclear as to how the 'Faithfulness' and 'Factuality' columns were computed in the leaderboard.

hallucinations-leaderboard org

@GWHed can you chime in please?

hallucinations-leaderboard org

Hi! We classify each task into Faithfulness and Factuality tasks based on their characteristics, and calculate the Faithfulness and Factuality scores by averaging the evaluation metrics for tasks within each category. We are also planning to try normalising the score for each task before averaging. Detailed information such as how we classified the tasks and more can be found in our paper here: https://arxiv.org/abs/2404.05904

Sign up or log in to comment