open_pl_llm_leaderboard

Running on CPU Upgrade

djstrong commited on Mar 1, 2024

Commit

705e23f

1 Parent(s): 59e91d5

more info

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -41,9 +41,11 @@ TITLE = """<h1 align="center" id="space-title">Open PL LLM Leaderboard (0-shot a
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-_g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
-_mc suffix means that a model is scored against every possible class (suitable also for base models)
 """
 # Which evaluations are you running? how can people reproduce what you have?
@@ -54,6 +56,15 @@ Contact with me: [LinkedIn](https://www.linkedin.com/in/wrobelkrzysztof/)
 or join our [Discord SpeakLeash](https://discord.gg/3G9DVM39)
 ## Evaluation metrics
 - **belebele_pol_Latn**: accuracy

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+The leaderboard evaluates language models on a set of Polish tasks. The tasks are designed to test the models' ability to understand and generate Polish text. The leaderboard is designed to be a benchmark for the Polish language model community, and to help researchers and practitioners understand the capabilities of different models.
+Almost every task has two versions: regex and multiple choice. The regex version is scored based on exact match, while the multiple choice version is scored based on accuracy.
+* _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
+* _mc suffix means that a model is scored against every possible class (suitable also for base models)
 """
 # Which evaluations are you running? how can people reproduce what you have?
 or join our [Discord SpeakLeash](https://discord.gg/3G9DVM39)
+## TODO
+* change metrics for DYK, PSC, CBD(?)
+* fix names of our models
+* add inference time
+* add metadata for models (e.g. #Params)
+* add more tasks
+* add baselines
 ## Evaluation metrics
 - **belebele_pol_Latn**: accuracy