Please, add RULER - a long context benchmark

#4
by MLDataScientist - opened

Hi team,
Thank you for releasing this model. Can you please perform the RULER (arxiv) benchmark (a retrieval-based benchmark for long context understanding) so that people can understand the performance of the model under long context?

For context, Microsoft started using RULER to report phi3 mini 128k instruct performance.
Gradient team also reported Llama3 8B with 1M context using the RULER benchmark.

I think RULER is a better benchmark in evaluating long context understanding at the moment.

Thanks!

Sign up or log in to comment