Spaces:
Running
Running
File size: 2,901 Bytes
360f81c cf29557 360f81c 968b189 360f81c cf29557 360f81c 7bdf0cd e3571c1 10ee5bf 195dbfa e3571c1 19b22c9 7109f43 19b22c9 8ff63e4 19b22c9 7109f43 19b22c9 7109f43 a679cf2 7109f43 a679cf2 aa739dd a679cf2 7109f43 aa739dd 6af9258 aa739dd 7109f43 36fdd36 aa739dd 7109f43 4e9ddf9 7109f43 f0128b6 a679cf2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
title: "ML.ENERGY Leaderboard"
emoji: "⚡"
python_version: "3.9"
app_file: "app.py"
sdk: "gradio"
sdk_version: "3.39.0"
pinned: true
tags: ["energy", "leaderboard"]
colorFrom: "black"
colorTo: "black"
---
# ML.ENERGY Leaderboard
[![Leaderboard](https://custom-icon-badges.herokuapp.com/badge/ML.ENERGY-Leaderboard-blue.svg?logo=ml-energy-2)](https://ml.energy/leaderboard)
[![Deploy](https://github.com/ml-energy/leaderboard/actions/workflows/push_spaces.yaml/badge.svg?branch=web)](https://github.com/ml-energy/leaderboard/actions/workflows/push_spaces.yaml)
[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/ml-energy/leaderboard?logo=law)](/LICENSE)
How much energy do LLMs consume?
This README focuses on explaining how to run the benchmark yourself.
The actual leaderboard is here: https://ml.energy/leaderboard.
## Colosseum
We instrumented [Hugging Face TGI](https://github.com/huggingface/text-generation-inference) so that it measures and returns GPU energy consumption.
Then, our [controller](/spitfight/colosseum/controller) server receives user prompts from the [Gradio app](/app.py), selects two models randomly, and streams model responses back with energy consumption.
## Setup for benchmarking
### Model weights
- For models that are directly accessible in Hugging Face Hub, you don't need to do anything.
- For other models, convert them to Hugging Face format and put them in `/data/leaderboard/weights/lmsys/vicuna-13B`, for example. The last two path components (e.g., `lmsys/vicuna-13B`) are taken as the name of the model.
### Docker container
We have our pre-built Docker image published with the tag `mlenergy/leaderboard:latest` ([Dockerfile](/Dockerfile)).
```console
$ docker run -it \
--name leaderboard0 \
--gpus '"device=0"' \
-v /path/to/your/data/dir:/data/leaderboard \
-v $(pwd):/workspace/leaderboard \
mlenergy/leaderboard:latest bash
```
The container internally expects weights to be inside `/data/leaderboard/weights` (e.g., `/data/leaderboard/weights/lmsys/vicuna-7B`), and sets the Hugging Face cache directory to `/data/leaderboard/hfcache`.
If needed, the repository should be mounted to `/workspace/leaderboard` to override the copy of the repository inside the container.
## Running the benchmark
We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
You can still run benchmarks without Pegasus like this:
```console
$ docker exec leaderboard0 python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled_sorted.json
$ docker exec leaderboard0 python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled_sorted.json
```
|