abdev-leaderboard

Running

App Files Files Community

loodvanniekerkginkgo commited on Sep 30

Commit

d8b25f9

1 Parent(s): 9a87acd

Added more explainers

Browse files

Files changed (1) hide show

about.py +16 -11

about.py CHANGED Viewed

@@ -42,7 +42,9 @@ Here we invite the community to submit and develop better predictors, which will
 #### 🏆 Prizes
 For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
-There is also an 'open-source' prize for the best model trained on the GDPa1 dataset of monoclonal antibodies (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
 For each of these 6 prizes, participants have the choice between
 - **$10 000 in data generation credits** with [Ginkgo Datapoints](https://datapoints.ginkgo.bio/), or
 - A **$2000 cash prize**.
@@ -124,7 +126,7 @@ FAQS = {
         "No, there are no requirements to submit code / methods and submitted predictions remain private. "
         "We also have an optional field for including a short model description. "
         "Top performing participants will be requested to identify themselves at the end of the tournament. "
-        "There will be one prize for the best open-source model, which will require code / methods to be available."
     ),
     "How exactly can I evaluate my model?": (
         "You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard. "
@@ -172,25 +174,28 @@ SUBMIT_INSTRUCTIONS = f"""
 You do **not** need to predict all 5 properties — each property has its own leaderboard and prize.
 ## Instructions
-1. **Upload both CSV files**:
-   - **GDPa1 Cross-Validation predictions** (using cross-validation folds)
-   - **Private Test Set predictions** (final test submission)
 2. Each CSV should contain `antibody_name` + one column per property you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
    - List of valid property names: `{', '.join(ASSAY_LIST)}`.
-3. Submit as many times as you like, and the latest submission will be used for the leaderboard (and test set scoring at the end of the competition).
 The GDPa1 results should appear on the leaderboard within a minute, and can also be calculated manually using average Spearman rank correlation across the 5 folds.
 ## Cross-validation
-For the GDPa1 cross-validation predictions, use the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column to split the dataset into folds and make predictions for each of the folds.
-Submit a CSV file in the same format but also containing the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column.
-Check out our tutorial on training an antibody developability prediction model with cross-validation [here]({TUTORIAL_URL}).
 ## Test set
-The **private test set results will not appear on the leaderboards at first**, and will be used to determine the winners at the close of the competition.
-🗓️ There will be a test set scoring on **October 13th** (which will score all the latest test set submissions at that point).
 Submissions close on **1 November 2025**.
 """

 #### 🏆 Prizes
 For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
+There is also an 'open-source' prize for the best reproducible model: one that is trained on the GDPa1 dataset (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
+This will be judged by a panel (i.e. by default the model with the highest average Spearman correlation across all properties will be selected, but a really good model on just one property may be better for the community).
 For each of these 6 prizes, participants have the choice between
 - **$10 000 in data generation credits** with [Ginkgo Datapoints](https://datapoints.ginkgo.bio/), or
 - A **$2000 cash prize**.
         "No, there are no requirements to submit code / methods and submitted predictions remain private. "
         "We also have an optional field for including a short model description. "
         "Top performing participants will be requested to identify themselves at the end of the tournament. "
+        "There will be one prize for the best open-source reproducible model, which will require code / methods to be available."
     ),
     "How exactly can I evaluate my model?": (
         "You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard. "
 You do **not** need to predict all 5 properties — each property has its own leaderboard and prize.
 ## Instructions
+1. **Upload two CSV files**: one with GDPa1 cross-validation predictions, and one with private test set predictions
 2. Each CSV should contain `antibody_name` + one column per property you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
    - List of valid property names: `{', '.join(ASSAY_LIST)}`.
+   - Include the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column if submitting cross-validation predictions.
+3. You can resubmit as often as you like; only your latest submission will count for both the leaderboard and final test set scoring.
 The GDPa1 results should appear on the leaderboard within a minute, and can also be calculated manually using average Spearman rank correlation across the 5 folds.
 ## Cross-validation
+For the GDPa1 cross-validation predictions:
+1. Split the dataset using the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column
+2. Train on 4 folds and predict on the held-out fold
+3. Collect held-out predictions for all 5 folds into one dataframe
+4. Write this dataframe to a .csv file and submit as your GDPa1 cross-validation predictions
+The leaderboard will show the average Spearman rank correlation across the 5 folds. For a code example, check out our tutorial on training an antibody developability prediction model with cross-validation [here]({TUTORIAL_URL}).
 ## Test set
+The **private test set submissions will not be scored automatically**, to avoid test set hacking. They will be evaluated after submissions close to determine winners.
+🗓️ We will release one interim scoring of the latest private test set submissions on **October 13th**. Use this opportunity to see how your model is performing on the heldout test set and refine accordingly.
 Submissions close on **1 November 2025**.
 """