Is there any documentation within this leaderboard?

#1
by zhiminy - opened

I cannot locate any specification in this space...What is this leaderboard used for?

Owner

Thanks for your attention. A brief document has been added to the demo.
The leaderboard aims to evaluate tokenizer performance on different languages.

  • Lower oov_ratio refers to less out-of-vocabulary tokens.
  • Higher char/token means less words be segmented into subwords.
xu-song changed discussion status to closed

Sign up or log in to comment