Jae-Won Chung commited on
Commit
aa739dd
1 Parent(s): b9c6dec

Push to Docker automatically

Browse files
.github/workflows/push_docker.yaml ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Push Docker image
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - master
7
+ paths:
8
+ - '.github/workflows/push_docker.yaml'
9
+ - 'pegasus/**'
10
+ - 'scripts/**'
11
+ - 'sharegpt/**'
12
+ - 'Dockerfile'
13
+ - 'LICENSE'
14
+ - 'requirements-benchmark.txt'
15
+ - '.gitignore'
16
+
17
+ concurrency:
18
+ group: ${{ github.ref }}-dhpush
19
+ cancel-in-progress: true
20
+
21
+ jobs:
22
+ build_and_push:
23
+ runs-on: ubuntu-latest
24
+ steps:
25
+ - name: Checkout repository
26
+ uses: actions/checkout@v3
27
+ - name: Docker Hub login
28
+ uses: docker/login-action@v2
29
+ with:
30
+ username: ${{ secrets.DOCKER_HUB_USERNAME }}
31
+ password: ${{ secrets.DOCKER_HUB_TOKEN }}
32
+ - name: Generate image metadata
33
+ id: meta
34
+ uses: docker/metadata-action@v4
35
+ with:
36
+ images: mlenergy/leaderboard
37
+ tags: latest
38
+ - name: Setup Docker Buildx
39
+ uses: docker/setup-buildx-action@v2
40
+ - name: Build and push to Docker Hub
41
+ uses: docker/build-push-action@v3
42
+ with:
43
+ context: .
44
+ file: Dockerfile
45
+ builder: ${{ steps.buildx.outputs.name }}
46
+ push: true
47
+ tags: ${{ steps.meta.outputs.tags }}
48
+ labels: ${{ steps.meta.outputs.labels }}
49
+ cache-from: type=registry,ref=mlenergy/leaderboard:buildcache
50
+ cache-to: type=registry,ref=mlenergy/leaderboard:buildcache,mode=max
.github/workflows/push_spaces.yaml CHANGED
@@ -12,7 +12,7 @@ on:
12
  - 'requirements.txt'
13
 
14
  concurrency:
15
- group: ${{ github.ref }}-hfdeploy
16
  cancel-in-progress: true
17
 
18
  jobs:
 
12
  - 'requirements.txt'
13
 
14
  concurrency:
15
+ group: ${{ github.ref }}-hfpush
16
  cancel-in-progress: true
17
 
18
  jobs:
Dockerfile CHANGED
@@ -1,7 +1,5 @@
1
  FROM nvidia/cuda:11.7.1-devel-ubuntu20.04
2
 
3
- WORKDIR /workspace
4
-
5
  # Basic installs
6
  ARG DEBIAN_FRONTEND=noninteractive
7
  ENV TZ='America/Detroit'
@@ -21,14 +19,15 @@ RUN mkdir -p /root/.local \
21
  && ln -sf /root/.local/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
22
 
23
  # Install PyTorch and Zeus
24
- RUN pip install torch==2.0.1 zeus-ml==0.4.0
25
 
26
  # Install requirements for benchmarking
27
  ADD . /workspace/leaderboard
28
- RUN cd leaderboard \
29
- && pip install -r requirements-benchmark.txt \
30
- && cd ..
31
 
 
32
  ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
33
 
 
34
  WORKDIR /workspace/leaderboard
 
1
  FROM nvidia/cuda:11.7.1-devel-ubuntu20.04
2
 
 
 
3
  # Basic installs
4
  ARG DEBIAN_FRONTEND=noninteractive
5
  ENV TZ='America/Detroit'
 
19
  && ln -sf /root/.local/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
20
 
21
  # Install PyTorch and Zeus
22
+ RUN pip install torch==2.0.1
23
 
24
  # Install requirements for benchmarking
25
  ADD . /workspace/leaderboard
26
+ RUN cd /workspace/leaderboard \
27
+ && pip install -r requirements-benchmark.txt
 
28
 
29
+ # Where all the weights downloaded from Hugging Face Hub will go to
30
  ENV TRANSFORMERS_CACHE=/data/leaderboard/hfcache
31
 
32
+ # So that docker exec container python scripts/benchmark.py will work
33
  WORKDIR /workspace/leaderboard
LEADERBOARD.md CHANGED
@@ -2,8 +2,9 @@ The goal of the ML.ENERGY Leaderboard is to give people a sense of how much **en
2
 
3
  ## How is energy different?
4
 
5
- Even between models with the exact same architecture and size, the average energy consumption per prompt is different because they have **different verbosity**.
6
- That is, when asked the same thing, they answer in different lengths.
 
7
 
8
  ## Metrics
9
 
@@ -62,11 +63,10 @@ A chat between a human user (prompter) and an artificial intelligence (AI) assis
62
 
63
  ## Upcoming
64
 
65
- - Compare against more optimized inference runtimes, like TensorRT.
66
- - Other GPUs
67
- - Other model/sampling parameters
68
  - More models
69
- - Model quality evaluation numbers (e.g., AI2 Reasoning Challenge, HellaSwag)
70
 
71
  # License
72
 
 
2
 
3
  ## How is energy different?
4
 
5
+ The energy consumption of running inference on a model will depends on factors such as architecture, size, and GPU model.
6
+ However, even if we run models with the exact same architecture and size on the same GPU, the average energy consumption **per prompt** is different because different models have **different verbosity**.
7
+ That is, when asked the same thing, different models answer in different lengths.
8
 
9
  ## Metrics
10
 
 
63
 
64
  ## Upcoming
65
 
66
+ - Compare energy numbers against more optimized inference runtimes, like TensorRT.
67
+ - More GPU types
 
68
  - More models
69
+ - Other model/sampling parameters
70
 
71
  # License
72
 
README.md CHANGED
@@ -28,19 +28,20 @@ The actual leaderboard is here: https://ml.energy/leaderboard.
28
 
29
  ### Docker container
30
 
 
 
31
  ```console
32
- $ git clone https://github.com/ml-energy/leaderboard.git
33
- $ cd leaderboard
34
- $ docker build -t ml-energy:latest .
35
- # Replace /data/leaderboard with your data directory.
36
  $ docker run -it \
37
- --name leaderboard \
38
- --gpus all \
39
- -v /data/leaderboard:/data/leaderboard \
40
  -v $(pwd):/workspace/leaderboard \
41
- ml-energy:latest bash
42
  ```
43
 
 
 
 
44
  ## Running the benchmark
45
 
46
  We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
@@ -48,8 +49,6 @@ We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.c
48
  You can still run benchmarks without Pegasus like this:
49
 
50
  ```console
51
- # Inside the container
52
- $ cd /workspace/leaderboard
53
- $ python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
54
- $ python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
55
  ```
 
28
 
29
  ### Docker container
30
 
31
+ We have our pre-built Docker image published with the tag `mlenergy/leaderboard:latest` ([Dockerfile](/Dockerfile)).
32
+
33
  ```console
 
 
 
 
34
  $ docker run -it \
35
+ --name leaderboard0 \
36
+ --gpus '"device=0"' \
37
+ -v /path/to/your/data/dir:/data/leaderboard \
38
  -v $(pwd):/workspace/leaderboard \
39
+ mlenergy/leaderboard:latest bash
40
  ```
41
 
42
+ The container internally expects weights to be inside `/data/leaderboard/weights` (e.g., `/data/leaderboard/weights/lmsys/vicuna-7B`), and sets the Hugging Face cache directory to `/data/leaderboard/hfcache`.
43
+ If needed, the repository should be mounted to `/workspace/leaderboard` to override the copy of the repository inside the container.
44
+
45
  ## Running the benchmark
46
 
47
  We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
 
49
  You can still run benchmarks without Pegasus like this:
50
 
51
  ```console
52
+ $ docker exec leaderboard0 python scripts/benchmark.py --model-path /data/leaderboard/weights/lmsys/vicuna-13B --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
53
+ $ docker exec leaderboard0 python scripts/benchmark.py --model-path databricks/dolly-v2-12b --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json
 
 
54
  ```
requirements-benchmark.txt CHANGED
@@ -1,4 +1,4 @@
1
- zeus-ml
2
  fschat==0.2.14
3
  rwkv==0.7.5
4
  einops
 
1
+ zeus-ml==0.4.0
2
  fschat==0.2.14
3
  rwkv==0.7.5
4
  einops