Spaces:

ml-energy
/

leaderboard

Running

App Files Files Community

Jae-Won Chung commited on Jun 23, 2023

Commit

4e9ddf9

•

1 Parent(s): ce6d832

Benchmarking with Pegasus (#7)

Browse files

Files changed (7) hide show

README.md +4 -0
models.txt +0 -20
pegasus/README.md +60 -0
pegasus/benchmark.yaml +32 -0
pegasus/hosts.yaml +19 -0
pegasus/setup-nodes.yaml +7 -0
scripts/benchmark.py +5 -5

README.md CHANGED Viewed

@@ -33,6 +33,10 @@ $ docker run -it \
 ## Running the benchmark
 ```console
 # Inside the container
 $ cd /workspace/leaderboard

 ## Running the benchmark
+We run benchmarks using multiple nodes and GPUs using [Pegasus](https://github.com/jaywonchung/pegasus). Take a look at [`pegasus/`](/pegasus) for details.
+You can still run benchmarks without Pegasus like this:
 ```console
 # Inside the container
 $ cd /workspace/leaderboard

models.txt DELETED Viewed

@@ -1,20 +0,0 @@
-/data/leaderboard/weights/metaai/llama-7B
-/data/leaderboard/weights/metaai/llama-13B
-/data/leaderboard/weights/lmsys/vicuna-7B
-/data/leaderboard/weights/lmsys/vicuna-13B
-/data/leaderboard/weights/tatsu-lab/alpaca-7B
-/data/leaderboard/weights/BAIR/koala-7b
-/data/leaderboard/weights/BAIR/koala-13b
-/data/leaderboard/weights/BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth
-camel-ai/CAMEL-13B-Combined-Data
-databricks/dolly-v2-12b
-FreedomIntelligence/phoenix-inst-chat-7b
-h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
-lmsys/fastchat-t5-3b-v1.0
-Neutralzz/BiLLa-7B-SFT
-nomic-ai/gpt4all-13b-snoozy
-openaccess-ai-collective/manticore-13b-chat-pyg
-OpenAssistant/oasst-sft-1-pythia-12b
-project-baize/baize-v2-7B
-StabilityAI/stablelm-tuned-alpha-7b
-togethercomputer/RedPajama-INCITE-7B-Chat

pegasus/README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+# Running benchmarks on multiple GPU nodes with Pegasus
+[Pegasus](https://github.com/jaywonchung/pegasus) is an SSH-based multi-node command runner.
+Different models have different verbosity, and benchmarking takes vastly different amounts of time.
+Therefore, we want an automated piece of software that drains a queue of benchmarking jobs (one job per model) on a set of GPUs.
+## Setup
+### Install Pegasus
+Pegasus needs to keep SSH connections with all the nodes in order to queue up and run jobs over SSH.
+So you should install and run Pegasus on a computer that you can keep awake.
+If you already have Rust set up:
+```console
+$ cargo install pegasus-ssh
+```
+Otherwise, you can set up Rust [here](https://www.rust-lang.org/tools/install), or just download Pegasus release binaries [here](https://github.com/jaywonchung/pegasus/releases/latest).
+### Necessary setup for each node
+Every node must have two things:
+1. This repository cloned under `~/workspace/leaderboard`.
+  - If you want a different path, search and replace in `setup-nodes.yaml`.
+2. Model weights under `/data/leaderboard/weights`.
+  - If you want a different path, search and replace in `setup-nodes.yaml` and `benchmark.yaml`.
+### Specify node names for Pegasus
+Modify `hosts.yaml` with nodes. See the file for an example.
+- `hostname`: List the hostnames you would use in order to `ssh` into the node, e.g. `jaywonchung@gpunode01`.
+- `gpu`: We want to create one Docker container for each GPU. List the indices of the GPUs you would like to use for the hosts.
+### Set up Docker containers on your nodes with Pegasus
+This builds our Docker image and spawns one container per GPU (named `leaderboard%d`), for every node.
+```console
+$ cd pegasus
+$ cp setup-nodes.yaml queue.yaml
+$ pegasus b
+```
+`b` stands for broadcast. Every command is run once on all (`hostname`, `gpu`) combinations.
+## Benchmark
+Now use Pegasus to run benchmarks for all the models across all nodes.
+```console
+$ cd pegasus
+$ cp benchmark.yaml queue.yaml
+$ pegasus q
+```
+`q` stands for queue. Each command is run once on the next available (`hostname`, `gpu`) combination.

pegasus/benchmark.yaml ADDED Viewed

	@@ -0,0 +1,32 @@

+# This YAML dictionary will expand into 20 (models) x 4 (tasks) = 80 job commands,
+# where {{ model }} and {{ task }} are filled in with all possible combinations.
+# {{ gpu }} is defined in `hosts.yaml`, and will be filled in when Pegasus
+# determines the specific node and gpu the generated job command will run on.
+- command:
+    - docker exec leaderboard{{ gpu }} python scripts/benchmark.py --input-file sharegpt/sg_90k_part1_html_cleaned_lang_first_sampled.json --model-path {{ model }} --task {{ task }}
+  model:
+    - /data/leaderboard/weights/metaai/llama-7B
+    - /data/leaderboard/weights/metaai/llama-13B
+    - /data/leaderboard/weights/lmsys/vicuna-7B
+    - /data/leaderboard/weights/lmsys/vicuna-13B
+    - /data/leaderboard/weights/tatsu-lab/alpaca-7B
+    - /data/leaderboard/weights/BAIR/koala-7b
+    - /data/leaderboard/weights/BAIR/koala-13b
+    - /data/leaderboard/weights/BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth
+    - camel-ai/CAMEL-13B-Combined-Data
+    - databricks/dolly-v2-12b
+    - FreedomIntelligence/phoenix-inst-chat-7b
+    - h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
+    - lmsys/fastchat-t5-3b-v1.0
+    - Neutralzz/BiLLa-7B-SFT
+    - nomic-ai/gpt4all-13b-snoozy
+    - openaccess-ai-collective/manticore-13b-chat-pyg
+    - OpenAssistant/oasst-sft-1-pythia-12b
+    - project-baize/baize-v2-7B
+    - StabilityAI/stablelm-tuned-alpha-7b
+    - togethercomputer/RedPajama-INCITE-7B-Chat
+  task:
+    - chat
+    - chat-concise
+    - instruct
+    - instruct-concise

pegasus/hosts.yaml ADDED Viewed

	@@ -0,0 +1,19 @@

+# Example:
+# Four 4-GPU nodes (node01 to node04), one container per GPU.
+# node01 and node02 have four GPUs, and hence four containers.
+# node03 and node04 have just two GPUs, and hence two containers.
+# With this configuration, 2 * 4 + 2 * 2 = 12 jobs will run in parallel.
+- hostname:
+    - node01
+    - node02
+  gpu:
+    - 0
+    - 1
+    - 2
+    - 3
+- hostname:
+    - node03
+    - node04
+  gpu:
+    - 0
+    - 1

pegasus/setup-nodes.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+# The first item builds our docker image on each node once.
+# The second item spawns one docker container per GPU.
+# {{ gpu }} is defined in `hosts.yaml`, and will be filled in when Pegasus
+# determines the specific node and gpu the generated job command will run on.
+# We check {{ gpu }} = 0 to ensure that the image is only built once on each node.
+- if [ {{ gpu }} = 0 ]; then cd workspace/leaderboard && docker build -t ml-energy:latest .; fi
+- docker run -dit --name leaderboard{{ gpu }} --gpus '"device={{ gpu }}"' -v /data/leaderboard:/data/leaderboard -v $HOME/workspace/leaderboard:/workspace/leaderboard ml-energy:latest bash

scripts/benchmark.py CHANGED Viewed

@@ -19,21 +19,21 @@ from zeus.monitor import ZeusMonitor
 SYSTEM_PROMPTS = {
     "chat": (
         "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
-        "The assistant gives helpful, detailed, and polite answers to the user's questions."
     ),
     "chat-concise": (
         "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
         "The assistant gives helpful, detailed, and polite answers to the user's questions. "
-        "The assistnat's answers are concise but high-quality."
     ),
     "instruct": (
         "Below is an instruction that describes a task. "
-        "Write a response that appropriately completes the request."
     ),
     "instruct-concise": (
         "Below is an instruction that describes a task. "
-        "Write a response that appropriately completes the request."
-        "The response should be concise but high-quality."
     ),
 }

 SYSTEM_PROMPTS = {
     "chat": (
         "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
+        "The assistant gives helpful, detailed, and polite answers to the user's questions. "
     ),
     "chat-concise": (
         "A chat between a human user (prompter) and an artificial intelligence (AI) assistant. "
         "The assistant gives helpful, detailed, and polite answers to the user's questions. "
+        "The assistant's answers are very concise. "
     ),
     "instruct": (
         "Below is an instruction that describes a task. "
+        "Write a response that appropriately completes the request. "
     ),
     "instruct-concise": (
         "Below is an instruction that describes a task. "
+        "Write a response that appropriately completes the request. "
+        "The response should be very concise. "
     ),
 }