Mukayese: Turkish NLP Strikes Back

Turkish Natural Language Processing is left behind in developing state-of-the-art systems due to a lack of organized benchmarks and baselines. We fill this gap with Mukayese (Turkish word for "comparison/benchmarking"), an extensive set of datasets and benchmarks for several Turkish NLP tasks. All of the datasets and code have been made public in this repository.


  • (22/03/2022) Summarization models are online on Huggingface!
  • (25/02/2022) Datasets have been made available through pre-release v0.0.1

What to do with Mukayese ?

With Mukayese, researchers of Turkish NLP will be able to:

  • Compare the performance of existing methods in leaderboards.
  • Access existing implementations of NLP baselines.
  • Evaluate their own methods on the relevant test datasets.
  • Submit their own work to be enlisted in our leaderboards.

Mukayese's Mission

The most important goal of Mukayese is to standardize the comparison and evaluation of Turkish NLP methods. As a result of the lack of a platform for benchmarking, Turkish NLP researchers struggle with comparing their models to the existing ones.