How was the evaluation executed?
#3
by
iarbel
- opened
Thanks for sharing this model and data. I've read through the article, however I couldn't find clear references to the benchmark. It states thatwe center our analysis on the legal domain, with a specific focus on: international law, professional law, and jurisprudence. Those tasks respectively contain 120, 1500, and 110 examples.
How can I find these examples to benchmark models?
cais/mmlu ? here ? we did not used this one though
PierreColombo
changed discussion status to
closed