How was the evaluation executed?

#3
by iarbel - opened

Thanks for sharing this model and data. I've read through the article, however I couldn't find clear references to the benchmark. It states that
we center our analysis on the legal domain, with a specific focus on: international law, professional law, and jurisprudence. Those tasks respectively contain 120, 1500, and 110 examples.
How can I find these examples to benchmark models?

Equall.AI org

cais/mmlu ? here ? we did not used this one though

PierreColombo changed discussion status to closed

Sign up or log in to comment