File size: 2,950 Bytes
360f81c
 
cf29557
360f81c
 
 
968b189
360f81c
 
 
 
7bdf0cd
e3571c1
10ee5bf
195dbfa
 
 
e3571c1
19b22c9
7109f43
 
19b22c9
8ff63e4
 
 
 
 
 
19b22c9
7109f43
19b22c9
7109f43
 
a679cf2
7109f43
a679cf2
aa739dd
 
a679cf2
7109f43
aa739dd
 
 
6af9258
aa739dd
7109f43
36fdd36
aa739dd
 
 
7109f43
 
4e9ddf9
 
 
 
7109f43
f0128b6
 
a679cf2
1f4ddf2
 
 
87d2c55
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
title: "ML.ENERGY Leaderboard"
emoji: "⚡"
python_version: "3.9"
app_file: "app.py"
sdk: "gradio"
sdk_version: "3.39.0"
pinned: true
tags: ["energy", "leaderboard"]
---

# ML.ENERGY Leaderboard

[![Leaderboard](https://custom-icon-badges.herokuapp.com/badge/ML.ENERGY-Leaderboard-blue.svg?logo=ml-energy-2)](https://ml.energy/leaderboard)
[![Deploy](https://github.com/ml-energy/leaderboard/actions/workflows/push_spaces.yaml/badge.svg?branch=web)](https://github.com/ml-energy/leaderboard/actions/workflows/push_spaces.yaml)
[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/ml-energy/leaderboard?logo=law)](/LICENSE)

How much energy do LLMs consume?

This README focuses on explaining how to run the benchmark yourself.
The actual leaderboard is here: https://ml.energy/leaderboard.

## Colosseum

We instrumented [Hugging Face TGI](https://github.com/huggingface/text-generation-inference) so that it measures and returns GPU energy consumption.
Then, our [controller](/spitfight/colosseum/controller) server receives user prompts from the [Gradio app](/app.py), selects two models randomly, and streams model responses back with energy consumption.

## Setup for benchmarking

### Model weights

- For models that are directly accessible in Hugging Face Hub, you don't need to do anything.
- For other models, convert them to Hugging Face format and put them in `/data/leaderboard/weights/lmsys/vicuna-13B`, for example. The last two path components (e.g., `lmsys/vicuna-13B`) are taken as the name of the model.

### Docker container

We have our pre-built Docker image published with the tag `mlenergy/leaderboard:latest` ([Dockerfile](/Dockerfile)).

```console
$ docker run -it \
    --name leaderboard0 \
    --gpus '"device=0"' \
    -v /path/to/your/data/dir:/data/leaderboard \
    -v $(pwd):/workspace/leaderboard \
    mlenergy/leaderboard:latest bash
```

The container internally expects weights to be inside `/data/leaderboard/weights` (e.g., `/data/leaderboard/weights/lmsys/vicuna-7B`), and sets the Hugging Face cache directory to `/data/leaderboard/hfcache`.
If needed, the repository should be mounted to `/workspace/leaderboard` to override the copy of the repository inside the container.

## Running the benchmark

We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.

You can still run benchmarks without Pegasus like this:

```console
$ docker exec leaderboard0 python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled_sorted.json
$ docker exec leaderboard0 python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled_sorted.json
```

## Citation

Please refer to our BibTeX file: [`citation.bib`](/docs/citation.bib).