Is metadata present in the test set ?

#3
by Mithilss - opened

Is the metadata present in the test dataset ? There are no columns for metadata in the SnakeCLEF2024-TestMetadata.csv

Bohemian Visual Recognition Alliance org

Hi Mithilss,

Indeed, there is no metadata in the "SnakeCLEF2024-TestMetadata.csv".
What kind of metadata are you looking for?

Best,
Lukas

The same that is present in Train Metadata ie - country and endemic columns . From my experiments in local validation they improve they score by a good margin

Bohemian Visual Recognition Alliance org

Hello again,

Thank you for your description. Please see the explanation of why it is not present in the metadata file below :)

If a species is endemic, it means that it lives only within a specific country. Therefore, endemicity is unique for each species, and if we provided it, you could easily get the prediction. The information about the country of origin is removed on purpose. The test set originates from just a few countries, and if you know it, you could focus just on those countries. Which is something we do not want.

See last year's overview paper, where we explain last year's scenarios. This year, it work very similarly.

Was it helpful?

LP

Thanks for the help !

Mithilss changed discussion status to closed
Bohemian Visual Recognition Alliance org

Hi @Mithilss ,

I have been thinking about your request, and I changed my mind a bit. I believe we should provide you with the information for the evaluation.
It is a little bit tricky as we do not want you to overfit to those "regions".
However, we are considering multiple ways of providing information about the country in evaluation.

Best,
Lukas

Mithilss changed discussion status to open

Where did this land, ultimately? Currently the test metadata in the example submission code only has the observation id and filename. Presumably this year there's no country codes, etc?

Sign up or log in to comment