some models get close to zero on some topics

#7
by gblazex - opened

especially finance and legal.

Is this normal? Doesn't look right to me

Screenshot 2024-02-07 at 12.43.45.png

Patronus AI org

Hey! Thanks for flagging this. We did notice this issue before and we will be updating the financebench and legalbench evaluations over the weekend for all the models on the leaderboard. We realized the max_new_tokens was too short of the legal confidentiality task for a few models.

Sign up or log in to comment