How the data is split in the public test set and private test set?

by dhoa - opened

Could you explain how the data is divided into the public test set and the private test set? I can't find information about this split ( to know about the confidence in the public leaderboard). Thanks!

Private and public are split at random but public is much smaller (1024 pairs, 32 infants). Private has 24576 pairs and 160 infants

We will be looking closer at private closer to the end of the competition. If differences are too huge we may perhaps consider opening private leaderboard for a couple of submissions to allow participants select the best model.

At this point, I can say the trend of private/public is OK but there are few outliers
We strongly encourage to use dev and perhaps cross-validation

I believe the public set is actually quite small, even smaller than the dev set. There might be significant surprises in the final leaderboard.

@fconti yes. This is why we decided to open the private set for a couple of days before the competition end (see timeline)

Sign up or log in to comment