Eurovoc Tagger Benchmark

#3
by scampion - opened
European Parliament org
edited Oct 24, 2023

In this benchmark study, we conducted a comparison between two distinct versions of the Eurovoc Tagger.

The first version, known as the poc (proof of concept) tagger, represents the initial development stage. On the other hand, we have the latest iteration, denoted as 23.08, which has been enhanced and fine-tuned using documents from the publication office.

The poc version of the Eurovoc Tagger is accessible through the DAS platform, while the more advanced 23.08 production version can be found on our private Hugging Face endpoint.

For a comprehensive and unbiased evaluation, both models were assessed using a dataset comprising 641 documents published in September 2023. It is important to note that these documents were never part of the training data for either version of the Eurovoc Tagger.

Metrics poc 23.08 gain
NDCG@3 0.5239 0.7071 +35%
NDCG@5 0.4583 0.6353 +38%
NDCG@10 0.4253 0.5863 +37%

NB: the micro F1 score cannot be computed since the poc doesn't allow to retrieve all probabilities.

Background information on the NDCG

NDCG stands for Normalized Discounted Cumulative Gain. It's a way to measure how good a search engine or recommendation system is at giving you the best results. Let's imagine that you're search for something on the internet, you get a list of results.

Normalized: This means making sure the result is on a scale that's easy to understand. It's like giving scores to the results.

Discounted: It takes into account that the results you see at the top are more important than the ones at the bottom. If you find what you want at the top, it's worth more than something you find at the bottom of the list.

Cumulative: This means adding up all the scores as you go down the list. The better results get higher scores, and you add them up to see how good the overall list is.

Gain: This is like saying how good or helpful a result is. If a result is what you were looking for, it gets a high score. If it's not what you wanted, it gets a low score.

So, NDCG is a way to measure how well a search engine or recommendation system ranks its results. The higher the NDCG score, the better it is at showing you the most helpful stuff at the top of the list when you search for something.

Sign up or log in to comment