Spaces:

optimum
/

llm-perf-leaderboard

Running

App Files Files Community

baptistecolle HF staff commited on Sep 2, 2024

Commit

8e30a31

verified ·

1 Parent(s): f45c3f0

add t4 to leaderboard (#30)

Browse files

- add t4 to leaderboard (1b7fb055871b54c99cf75616570506f50c7e9322)
- fix readme (38a9948acfac829033f4aa926a80abb5fab74cc8)

Files changed (4) hide show

.gitignore +2 -1
README.md +59 -1
app.py +1 -0
src/llm_perf.py +8 -3

.gitignore CHANGED Viewed

@@ -4,4 +4,5 @@ __pycache__/
 *ipynb
 .vscode/
-dataset/

 *ipynb
 .vscode/
+dataset/
+.venv

README.md CHANGED Viewed

@@ -11,4 +11,62 @@ license: apache-2.0
 tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard]
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard]
 ---
+# LLM-perf leaderboard
+## 📝 About
+The 🤗 LLM-Perf Leaderboard 🏋️ is a laderboard at the intersection of quality and performance.
+Its aim is to benchmark the performance (latency, throughput, memory & energy)
+of Large Language Models (LLMs) with different hardwares, backends and optimizations
+using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark).
+Anyone from the community can request a new base model or hardware/backend/optimization
+configuration for automated benchmarking:
+- Model evaluation requests should be made in the
+[🤗 Open LLM Leaderboard 🏅](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ;
+we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there.
+- Hardware/Backend/Optimization configuration requests should be made in the
+[🤗 LLM-Perf Leaderboard 🏋️](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or
+[Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted).
+## ✍️ Details
+- To avoid communication-dependent results, only one GPU is used.
+- Score is the average evaluation score obtained from the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+- LLMs are running on a singleton batch with a prompt size of 256 and generating a 64 tokens for at least 10 iterations and 10 seconds.
+- Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.
+- We measure three types of memory: Max Allocated Memory, Max Reserved Memory and Max Used Memory. The first two being reported by PyTorch and the last one being observed using PyNVML.
+All of our benchmarks are ran by this single script
+[benchmark_cuda_pytorch.py](https://github.com/huggingface/optimum-benchmark/blob/llm-perf/llm-perf/benchmark_cuda_pytorch.py)
+using the power of [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) to garantee reproducibility and consistency.
+## 🏃 How to run locally
+To run the LLM-Perf Leaderboard locally on your machine, follow these steps:
+### 1. Clone the Repository
+First, clone the repository to your local machine:
+```bash
+git clone https://huggingface.co/spaces/optimum/llm-perf-leaderboard
+cd llm-perf-leaderboard
+```
+### 2. Install the Required Dependencies
+Install the necessary Python packages listed in the requirements.txt file:
+`pip install -r requirements.txt`
+###  3. Run the Application
+You can run the Gradio application in one of the following ways:
+- Option 1: Using Python
+`python app.py`
+- Option 2: Using Gradio CLI (include hot-reload)
+`gradio app.py`
+### 4. Access the Application
+Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/

app.py CHANGED Viewed

@@ -18,6 +18,7 @@ from src.panel import (
 MACHINE_TO_HARDWARE = {
     "1xA10": "A10-24GB-150W 🖥️",
     "1xA100": "A100-80GB-275W 🖥️",
     # "1xH100": "H100-80GB-700W 🖥️",
 }

 MACHINE_TO_HARDWARE = {
     "1xA10": "A10-24GB-150W 🖥️",
     "1xA100": "A100-80GB-275W 🖥️",
+    "1xT4": "T4-16GB-70W 🖥️",
     # "1xH100": "H100-80GB-700W 🖥️",
 }

src/llm_perf.py CHANGED Viewed

@@ -4,6 +4,8 @@ import pandas as pd
 from .utils import process_kernels, process_quantizations
 COLUMNS_MAPPING = {
     "config.name": "Experiment 🧪",
     "config.backend.model": "Model 🤗",
@@ -109,11 +111,14 @@ def processed_llm_perf_df(llm_perf_df):
 def get_llm_perf_df(machine: str = "1xA10"):
-    if os.path.exists(f"llm-perf-leaderboard-{machine}.csv"):
-        llm_perf_df = pd.read_csv(f"llm-perf-leaderboard-{machine}.csv")
     else:
         llm_perf_df = get_raw_llm_perf_df(machine)
         llm_perf_df = processed_llm_perf_df(llm_perf_df)
-        llm_perf_df.to_csv(f"llm-perf-leaderboard-{machine}.csv", index=False)
     return llm_perf_df

 from .utils import process_kernels, process_quantizations
+DATASET_DIRECTORY = "dataset"
 COLUMNS_MAPPING = {
     "config.name": "Experiment 🧪",
     "config.backend.model": "Model 🤗",
 def get_llm_perf_df(machine: str = "1xA10"):
+    if not os.path.exists(DATASET_DIRECTORY):
+        os.makedirs(DATASET_DIRECTORY)
+    if os.path.exists(f"{DATASET_DIRECTORY}/llm-perf-leaderboard-{machine}.csv"):
+        llm_perf_df = pd.read_csv(f"{DATASET_DIRECTORY}/llm-perf-leaderboard-{machine}.csv")
     else:
         llm_perf_df = get_raw_llm_perf_df(machine)
         llm_perf_df = processed_llm_perf_df(llm_perf_df)
+        llm_perf_df.to_csv(f"{DATASET_DIRECTORY}/llm-perf-leaderboard-{machine}.csv", index=False)
     return llm_perf_df