Data files for the ML.ENERGY Leaderboard

This directory holds all the data for the leaderboard table.

Code that reads in the data here can be found in the constructor of TableManager in app.py.

Parameters

There are two types of parameters: (1) Those that become radio buttons on the leaderboard and (2) those that become columns on the leaderboard table. Models are always placed in rows.

Currently, there are only two parameters that become radio buttons: GPU model (e.g., V100, A40, A100) and task (e.g., chat, chat-concise, instruct, and instruct-concise). This is defined in the schema.yaml file.

Radio button parameters have their own CSV file in this directory. For instance, benchmark results for the chat task ran on an A100 GPU lives in A100_chat_benchmark.csv. This file name is dynamically constructed by the leaderboard Gradio application by looking at schema.yaml and read in as a Pandas DataFrame.

Parameters that become columns in the table are put directly in the benchmark CSV files, e.g., batch_size and datatype.

Adding new models

Add your model to models.json.
- The model's JSON key should be its unique codename, e.g. Hugging Face Hub model name. It's usually not that readable.
- url should point to a page where people can obtain the model's weights, e.g. Hugging Face Hub.
- nickname should be a short human-readable string that identifies the model.
- params should be an integer rounded to billions.
Add NLP dataset evaluation scores to score.csv.
- model is the model's JSON key in models.json.
- arc is the accuracy on the ARC challenge dataset.
- hellaswag is the accuracy on the HellaSwag dataset.
- truthfulqa is the accuracy on the TruthfulQA MC2 dataset.
- We obtain these metrics using lm-evaluation-harness. See here for specific instructions.
Add benchmarking results in CSV files, e.g. A100_chat_benchmark.csv. It should be evident from the name of the CSV files which setting the file corresponds to.