Spaces:

optimum
/

llm-perf-leaderboard

Running

App Files Files Community

add intel CPU to leaderboard

#31

by baptistecolle HF staff - opened Aug 23, 2024

base: refs/heads/main

←

from: refs/pr/31

Discussion Files changed

+91

-343

Files changed (12) hide show

.gitignore +1 -4
README.md +1 -59
app.py +19 -47
hardware.yaml +0 -50
requirements.txt +2 -3
src/content.py +6 -6
src/dependency.py +0 -3
src/hardware.py +0 -26
src/kernels.py +1 -8
src/llm_perf.py +15 -41
src/panel.py +46 -91
src/utils.py +0 -5

.gitignore CHANGED Viewed

@@ -4,7 +4,4 @@ __pycache__/
 *ipynb
 .vscode/
-work-in-progress/
-dataset/
-.venv

 *ipynb
 .vscode/
+dataset/

README.md CHANGED Viewed

@@ -11,62 +11,4 @@ license: apache-2.0
 tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard]
 ---
-# LLM-perf leaderboard
-## 📝 About
-The 🤗 LLM-Perf Leaderboard 🏋️ is a laderboard at the intersection of quality and performance.
-Its aim is to benchmark the performance (latency, throughput, memory & energy)
-of Large Language Models (LLMs) with different hardwares, backends and optimizations
-using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark).
-Anyone from the community can request a new base model or hardware/backend/optimization
-configuration for automated benchmarking:
-- Model evaluation requests should be made in the
-[🤗 Open LLM Leaderboard 🏅](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ;
-we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there.
-- Hardware/Backend/Optimization configuration requests should be made in the
-[🤗 LLM-Perf Leaderboard 🏋️](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or
-[Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted).
-## ✍️ Details
-- To avoid communication-dependent results, only one GPU is used.
-- Score is the average evaluation score obtained from the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
-- LLMs are running on a singleton batch with a prompt size of 256 and generating a 64 tokens for at least 10 iterations and 10 seconds.
-- Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.
-- We measure three types of memory: Max Allocated Memory, Max Reserved Memory and Max Used Memory. The first two being reported by PyTorch and the last one being observed using PyNVML.
-All of our benchmarks are ran by this single script
-[benchmark_cuda_pytorch.py](https://github.com/huggingface/optimum-benchmark/blob/llm-perf/llm-perf/benchmark_cuda_pytorch.py)
-using the power of [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) to garantee reproducibility and consistency.
-## 🏃 How to run locally
-To run the LLM-Perf Leaderboard locally on your machine, follow these steps:
-### 1. Clone the Repository
-First, clone the repository to your local machine:
-```bash
-git clone https://huggingface.co/spaces/optimum/llm-perf-leaderboard
-cd llm-perf-leaderboard
-```
-### 2. Install the Required Dependencies
-Install the necessary Python packages listed in the requirements.txt file:
-`pip install -r requirements.txt`
-###  3. Run the Application
-You can run the Gradio application in one of the following ways:
-- Option 1: Using Python
-`python app.py`
-- Option 2: Using Gradio CLI (include hot-reload)
-`gradio app.py`
-### 4. Access the Application
-Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/

 tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard]
 ---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -1,11 +1,9 @@
 import gradio as gr
-import src.dependency  # noqa
 from src.assets import custom_css
 # from src.attention import create_attn_plots
 from src.content import ABOUT, CITATION_BUTTON, CITATION_BUTTON_LABEL, LOGO, TITLE
-from src.hardware import load_hardware_configs
 from src.leaderboard import create_leaderboard_table
 from src.llm_perf import get_llm_perf_df
 from src.map import create_lat_score_mem_plot
@@ -15,31 +13,27 @@ from src.panel import (
     create_select_callback,
 )
-configs = load_hardware_configs("hardware.yaml")
-demo = gr.Blocks(
-    css=custom_css,
-    theme=gr.themes.Default(primary_hue="indigo", secondary_hue="indigo"),
-)
 with demo:
     gr.HTML(LOGO, elem_classes="logo")
     gr.HTML(TITLE, elem_classes="title")
     ####################### HARDWARE TABS #######################
     with gr.Tabs(elem_classes="tabs"):
-        for id, config in enumerate(configs):
-            with gr.TabItem(config.description, id=id):
-                ####################### HARDWARE DETAILS #######################
-                if config.detail:
-                    gr.Markdown(config.detail, elem_classes="descriptive-text")
-                # ####################### CONTROL PANEL #######################
                 (
                     filter_button,
-                    machine_value,
-                    subsets_value,
-                    backends_value,
-                    hardware_type_value,
                     score_slider,
                     memory_slider,
                     backend_checkboxes,
@@ -47,33 +41,17 @@ with demo:
                     optimization_checkboxes,
                     quantization_checkboxes,
                     kernels_checkboxes,
-                ) = create_control_panel(
-                    machine=config.machine,
-                    subsets=config.subsets,
-                    backends=config.backends,
-                    hardware_type=config.hardware_type,
-                    hardware_provider=config.hardware_provider,
-                )
                 ####################### HARDWARE SUBTABS #######################
                 with gr.Tabs(elem_classes="subtabs"):
-                    open_llm_perf_df = get_llm_perf_df(
-                        machine=config.machine,
-                        subsets=config.subsets,
-                        backends=config.backends,
-                        hardware_type=config.hardware_type,
-                    )
                     ####################### LEADERBOARD TAB #######################
                     with gr.TabItem("Leaderboard 🏅", id=0):
                         search_bar, columns_checkboxes, leaderboard_table = (
                             create_leaderboard_table(open_llm_perf_df)
                         )
-                    if (
-                        config.hardware_provider != "intel"
-                    ):  # TODO intel CPU does not measure the memory requirements correctly, so disable the graph feature until we fix the underlying issue
-                        with gr.TabItem("Find Your Best Model 🧭", id=1):
-                            lat_score_mem_plot = create_lat_score_mem_plot(
-                                open_llm_perf_df
-                            )
                     ###################### ATTENTIONS SPEEDUP TAB #######################
                     # with gr.TabItem("Attention 📈", id=2):
                     #     attn_prefill_plot, attn_decode_plot = create_attn_plots(
@@ -89,10 +67,7 @@ with demo:
                 create_control_callback(
                     filter_button,
                     # inputs
-                    machine_value,
-                    subsets_value,
-                    backends_value,
-                    hardware_type_value,
                     score_slider,
                     memory_slider,
                     backend_checkboxes,
@@ -114,10 +89,7 @@ with demo:
                 create_select_callback(
                     # inputs
-                    machine_value,
-                    subsets_value,
-                    backends_value,
-                    hardware_type_value,
                     # interactive
                     columns_checkboxes,
                     search_bar,
@@ -126,7 +98,7 @@ with demo:
                 )
         ####################### ABOUT TAB #######################
-        with gr.TabItem("About 📖", id=len(configs)):
             gr.Markdown(ABOUT, elem_classes="descriptive-text")
     ####################### CITATION
     with gr.Row():

 import gradio as gr
 from src.assets import custom_css
 # from src.attention import create_attn_plots
 from src.content import ABOUT, CITATION_BUTTON, CITATION_BUTTON_LABEL, LOGO, TITLE
 from src.leaderboard import create_leaderboard_table
 from src.llm_perf import get_llm_perf_df
 from src.map import create_lat_score_mem_plot
     create_select_callback,
 )
+# from custom_kernels import create_quant_krnl_plots
+MACHINE_TO_HARDWARE = {
+    "1xA10": "A10-24GB-150W 🖥️",
+    "1xA100": "A100-80GB-275W 🖥️",
+    # "1xH100": "H100-80GB-700W 🖥️",
+}
+demo = gr.Blocks(css=custom_css)
 with demo:
     gr.HTML(LOGO, elem_classes="logo")
     gr.HTML(TITLE, elem_classes="title")
     ####################### HARDWARE TABS #######################
     with gr.Tabs(elem_classes="tabs"):
+        for id, (machine, hardware) in enumerate(MACHINE_TO_HARDWARE.items()):
+            with gr.TabItem(hardware, id=id):
+                ####################### CONTROL PANEL #######################
                 (
                     filter_button,
+                    machine_textbox,
                     score_slider,
                     memory_slider,
                     backend_checkboxes,
                     optimization_checkboxes,
                     quantization_checkboxes,
                     kernels_checkboxes,
+                ) = create_control_panel(machine=machine)
                 ####################### HARDWARE SUBTABS #######################
                 with gr.Tabs(elem_classes="subtabs"):
+                    open_llm_perf_df = get_llm_perf_df(machine=machine)
                     ####################### LEADERBOARD TAB #######################
                     with gr.TabItem("Leaderboard 🏅", id=0):
                         search_bar, columns_checkboxes, leaderboard_table = (
                             create_leaderboard_table(open_llm_perf_df)
                         )
+                    with gr.TabItem("Find Your Best Model 🧭", id=1):
+                        lat_score_mem_plot = create_lat_score_mem_plot(open_llm_perf_df)
                     ###################### ATTENTIONS SPEEDUP TAB #######################
                     # with gr.TabItem("Attention 📈", id=2):
                     #     attn_prefill_plot, attn_decode_plot = create_attn_plots(
                 create_control_callback(
                     filter_button,
                     # inputs
+                    machine_textbox,
                     score_slider,
                     memory_slider,
                     backend_checkboxes,
                 create_select_callback(
                     # inputs
+                    machine_textbox,
                     # interactive
                     columns_checkboxes,
                     search_bar,
                 )
         ####################### ABOUT TAB #######################
+        with gr.TabItem("About 📖", id=3):
             gr.Markdown(ABOUT, elem_classes="descriptive-text")
     ####################### CITATION
     with gr.Row():

hardware.yaml DELETED Viewed

@@ -1,50 +0,0 @@
-- machine: 1xA10
-  description: A10-24GB-150W 🖥️
-  hardware_provider: nvidia
-  hardware_type: cuda
-  subsets:
-    - unquantized
-    - awq
-    - bnb
-    - gptq
-  backends:
-    - pytorch
-- machine: 1xA100
-  description: A100-80GB-275W 🖥️
-  hardware_provider: nvidia
-  hardware_type: cuda
-  subsets:
-    - unquantized
-    - awq
-    - bnb
-    - gptq
-    - torchao
-  backends:
-    - pytorch
-- machine: 1xT4
-  description: T4-16GB-70W 🖥️
-  hardware_provider: nvidia
-  hardware_type: cuda
-  subsets:
-    - unquantized
-    - awq
-    - bnb
-    - gptq
-    - torchao
-  backends:
-    - pytorch
-- machine: 32vCPU-C7i
-  description: Intel-Xeon-SPR-385W 🖥️
-  detail: |
-    We tested the [32vCPU AWS C7i](https://aws.amazon.com/ec2/instance-types/c7i/) instance for the benchmark.
-  hardware_provider: intel
-  hardware_type: cpu
-  subsets:
-    - unquantized
-  backends:
-    - pytorch
-    - openvino
-    - onnxruntime

requirements.txt CHANGED Viewed

@@ -1,6 +1,5 @@
 huggingface_hub
 transformers
-gradio>=5.0.0
 plotly
-pandas
-ruff

 huggingface_hub
 transformers
+gradio
 plotly
+pandas

src/content.py CHANGED Viewed

@@ -5,18 +5,18 @@ TITLE = """<h1 align="center" id="space-title">🤗 LLM-Perf Leaderboard 🏋️
 ABOUT = """
 ## 📝 About
 The 🤗 LLM-Perf Leaderboard 🏋️ is a laderboard at the intersection of quality and performance.
-Its aim is to benchmark the performance (latency, throughput, memory & energy)
-of Large Language Models (LLMs) with different hardwares, backends and optimizations
 using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark).
-Anyone from the community can request a new base model or hardware/backend/optimization
 configuration for automated benchmarking:
-- Model evaluation requests should be made in the
 [🤗 Open LLM Leaderboard 🏅](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ;
 we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there.
-- Hardware/Backend/Optimization configuration requests should be made in the
-[🤗 LLM-Perf Leaderboard 🏋️](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or
 [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted).
 ## ✍️ Details

 ABOUT = """
 ## 📝 About
 The 🤗 LLM-Perf Leaderboard 🏋️ is a laderboard at the intersection of quality and performance.
+Its aim is to benchmark the performance (latency, throughput, memory & energy)
+of Large Language Models (LLMs) with different hardwares, backends and optimizations
 using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark).
+Anyone from the community can request a new base model or hardware/backend/optimization
 configuration for automated benchmarking:
+- Model evaluation requests should be made in the
 [🤗 Open LLM Leaderboard 🏅](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ;
 we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there.
+- Hardware/Backend/Optimization configuration requests should be made in the
+[🤗 LLM-Perf Leaderboard 🏋️](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or
 [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted).
 ## ✍️ Details

src/dependency.py DELETED Viewed

@@ -1,3 +0,0 @@
-import os
-os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"

src/hardware.py DELETED Viewed

@@ -1,26 +0,0 @@
-from typing import Any, Dict, List, Optional
-import yaml
-class HardwareConfig:
-    def __init__(self, data: Dict[str, Any]):
-        self.machine: str = data["machine"]
-        self.description: str = data["description"]
-        self.hardware_provider: str = data["hardware_provider"]
-        self.hardware_type: str = data["hardware_type"]
-        self.subsets: List[str] = data["subsets"]
-        self.backends: List[str] = data["backends"]
-        self.detail: Optional[str] = data.get("detail", None)
-    def __repr__(self) -> str:
-        return (
-            f"HardwareConfig(machine='{self.machine}', description='{self.description}', "
-            f"hardware_provider={self.hardware_provider}, hardware_type={self.hardware_type}, subsets={self.subsets}, backends={self.backends})"
-        )
-def load_hardware_configs(file_path: str) -> List[HardwareConfig]:
-    with open(file_path, "r") as file:
-        data = yaml.safe_load(file)
-    return [HardwareConfig(config) for config in data]

src/kernels.py CHANGED Viewed

@@ -38,7 +38,6 @@ def get_quant_df(llm_perf_df):
     exllamav2_df = copy_df[(copy_df["Quantization 🗜️"] == "GPTQ.4bit+ExllamaV2")]
     gemm_df = copy_df[(copy_df["Quantization 🗜️"] == "AWQ.4bit+GEMM")]
     gemv_df = copy_df[(copy_df["Quantization 🗜️"] == "AWQ.4bit+GEMV")]
-    torchao_df = copy_df[(copy_df["Quantization 🗜️"] == "torchao.4bit")]
     # merge the three dataframes
     exllamav1_df = pd.merge(
         vanilla_df,
@@ -64,14 +63,8 @@ def get_quant_df(llm_perf_df):
         on=["Model 🤗"],
         suffixes=["", " Custom Kernel"],
     )
-    torchao_df = pd.merge(
-        vanilla_df,
-        torchao_df,
-        on=["Model 🤗"],
-        suffixes=["", " Custom Kernel"],
-    )
     # concat the two dataframes row-wise
-    quant_df = pd.concat([exllamav1_df, exllamav2_df, gemm_df, gemv_df, torchao_df])
     # compute speedups
     quant_df["Prefill Speedup (%)"] = (
         (quant_df["Prefill (s)"] / quant_df["Prefill (s) Custom Kernel"]) * 100

     exllamav2_df = copy_df[(copy_df["Quantization 🗜️"] == "GPTQ.4bit+ExllamaV2")]
     gemm_df = copy_df[(copy_df["Quantization 🗜️"] == "AWQ.4bit+GEMM")]
     gemv_df = copy_df[(copy_df["Quantization 🗜️"] == "AWQ.4bit+GEMV")]
     # merge the three dataframes
     exllamav1_df = pd.merge(
         vanilla_df,
         on=["Model 🤗"],
         suffixes=["", " Custom Kernel"],
     )
     # concat the two dataframes row-wise
+    quant_df = pd.concat([exllamav1_df, exllamav2_df, gemm_df, gemv_df])
     # compute speedups
     quant_df["Prefill Speedup (%)"] = (
         (quant_df["Prefill (s)"] / quant_df["Prefill (s) Custom Kernel"]) * 100

src/llm_perf.py CHANGED Viewed

@@ -1,12 +1,9 @@
 import os
-from typing import List
 import pandas as pd
 from .utils import process_kernels, process_quantizations
-DATASET_DIRECTORY = "dataset"
 COLUMNS_MAPPING = {
     "config.name": "Experiment 🧪",
     "config.backend.model": "Model 🤗",
@@ -29,34 +26,21 @@ COLUMNS_MAPPING = {
     "#Params (B)": "Params (B)",
 }
 SORTING_COLUMNS = ["Open LLM Score (%)", "Decode (tokens/s)", "Prefill (s)"]
 SORTING_ASCENDING = [False, True, False]
-def get_raw_llm_perf_df(
-    machine: str, subsets: List[str], backends: List[str], hardware_type: str
-):
     dfs = []
-    for subset in subsets:
-        for backend in backends:
-            try:
-                dfs.append(
-                    pd.read_csv(
-                        f"hf://datasets/optimum-benchmark/llm-perf-leaderboard/perf-df-{backend}-{hardware_type}-{subset}-{machine}.csv"
-                    )
                 )
-            except Exception:
-                print("Dataset not found for:")
-                print(f"  • Backend: {backend}")
-                print(f"  • Subset: {subset}")
-                print(f"  • Machine: {machine}")
-                print(f"  • Hardware Type: {hardware_type}")
-                url = f"https://huggingface.co/datasets/optimum-benchmark/llm-perf-leaderboard/blob/main/perf-df-{backend}-{hardware_type}-{subset}-{machine}.csv"
-                print(f"  • URL: {url}")
-    if len(dfs) == 0:
-        raise ValueError(
-            f"No datasets found for machine {machine}, check your hardware.yml config file or your datatset on huggingface"
-        )
     perf_df = pd.concat(dfs)
     llm_df = pd.read_csv(
@@ -124,22 +108,12 @@ def processed_llm_perf_df(llm_perf_df):
     return llm_perf_df
-def get_llm_perf_df(
-    machine: str, subsets: List[str], backends: List[str], hardware_type: str
-):
-    if not os.path.exists(DATASET_DIRECTORY):
-        os.makedirs(DATASET_DIRECTORY)
-    if os.path.exists(f"{DATASET_DIRECTORY}/llm-perf-leaderboard-{machine}.csv"):
-        llm_perf_df = pd.read_csv(
-            f"{DATASET_DIRECTORY}/llm-perf-leaderboard-{machine}.csv"
-        )
     else:
-        print(f"Dataset machine {machine} not found, downloading...")
-        llm_perf_df = get_raw_llm_perf_df(machine, subsets, backends, hardware_type)
         llm_perf_df = processed_llm_perf_df(llm_perf_df)
-        llm_perf_df.to_csv(
-            f"{DATASET_DIRECTORY}/llm-perf-leaderboard-{machine}.csv", index=False
-        )
     return llm_perf_df

 import os
 import pandas as pd
 from .utils import process_kernels, process_quantizations
 COLUMNS_MAPPING = {
     "config.name": "Experiment 🧪",
     "config.backend.model": "Model 🤗",
     "#Params (B)": "Params (B)",
 }
 SORTING_COLUMNS = ["Open LLM Score (%)", "Decode (tokens/s)", "Prefill (s)"]
+SUBSETS = ["unquantized", "awq", "bnb", "gptq"]
 SORTING_ASCENDING = [False, True, False]
+def get_raw_llm_perf_df(machine: str = "1xA10"):
     dfs = []
+    for subset in SUBSETS:
+        try:
+            dfs.append(
+                pd.read_csv(
+                    f"hf://datasets/optimum-benchmark/llm-perf-leaderboard/perf-df-{subset}-{machine}.csv"
                 )
+            )
+        except Exception:
+            print(f"Subset {subset} for machine {machine} not found")
     perf_df = pd.concat(dfs)
     llm_df = pd.read_csv(
     return llm_perf_df
+def get_llm_perf_df(machine: str = "1xA10"):
+    if os.path.exists(f"llm-perf-leaderboard-{machine}.csv"):
+        llm_perf_df = pd.read_csv(f"llm-perf-leaderboard-{machine}.csv")
     else:
+        llm_perf_df = get_raw_llm_perf_df(machine)
         llm_perf_df = processed_llm_perf_df(llm_perf_df)
+        llm_perf_df.to_csv(f"llm-perf-leaderboard-{machine}.csv", index=False)
     return llm_perf_df

src/panel.py CHANGED Viewed

@@ -1,5 +1,3 @@
-from typing import List
 import gradio as gr
 from src.leaderboard import get_leaderboard_df
@@ -10,38 +8,9 @@ from src.llm_perf import get_llm_perf_df
 from src.map import get_lat_score_mem_fig
-def create_control_panel(
-    machine: str,
-    subsets: List[str],
-    backends: List[str],
-    hardware_provider: str,
-    hardware_type: str,
-):
     # controls
-    machine_value = gr.State(value=machine)
-    subsets_value = gr.State(value=subsets)
-    backends_value = gr.State(value=backends)
-    hardware_type_value = gr.State(value=hardware_type)
-    if hardware_provider == "nvidia":
-        backends = ["pytorch"]
-        attention_implementations = ["Eager", "SDPA", "FAv2"]
-        quantizations = ["Unquantized", "BnB.4bit", "BnB.8bit", "AWQ.4bit", "GPTQ.4bit", "torchao.4bit"]
-        kernels = [
-            "No Kernel",
-            "GPTQ.ExllamaV1",
-            "GPTQ.ExllamaV2",
-            "AWQ.GEMM",
-            "AWQ.GEMV",
-        ]
-    elif hardware_provider == "intel":
-        backends = ["pytorch", "onnxruntime", "openvino"]
-        attention_implementations = ["Eager"]
-        quantizations = ["Unquantized"]
-        kernels = ["No Kernel"]
-    else:
-        raise ValueError(f"Unknown hardware provider: {hardware_provider}")
     with gr.Accordion("Control Panel 🎛️", open=False, elem_id="control-panel"):
         with gr.Row():
             with gr.Column(scale=2, variant="panel"):
@@ -63,8 +32,8 @@ def create_control_panel(
             with gr.Column(scale=1, variant="panel"):
                 backend_checkboxes = gr.CheckboxGroup(
                     label="Backends 🏭",
-                    choices=backends,
-                    value=backends,
                     info="☑️ Select the backends",
                     elem_id="backend-checkboxes",
                 )
@@ -80,8 +49,8 @@ def create_control_panel(
             with gr.Column(scale=1, variant="panel"):
                 optimization_checkboxes = gr.CheckboxGroup(
                     label="Attentions 👁️",
-                    choices=attention_implementations,
-                    value=attention_implementations,
                     info="☑️ Select the optimization",
                     elem_id="optimization-checkboxes",
                 )
@@ -89,8 +58,20 @@ def create_control_panel(
             with gr.Column(scale=1, variant="panel"):
                 quantization_checkboxes = gr.CheckboxGroup(
                     label="Quantizations 🗜️",
-                    choices=quantizations,
-                    value=quantizations,
                     info="☑️ Select the quantization schemes",
                     elem_id="quantization-checkboxes",
                     elem_classes="boxed-option",
@@ -98,8 +79,20 @@ def create_control_panel(
             with gr.Column(scale=1, variant="panel"):
                 kernels_checkboxes = gr.CheckboxGroup(
                     label="Kernels ⚛️",
-                    choices=kernels,
-                    value=kernels,
                     info="☑️ Select the custom kernels",
                     elem_id="kernel-checkboxes",
                     elem_classes="boxed-option",
@@ -113,10 +106,7 @@ def create_control_panel(
     return (
         filter_button,
-        machine_value,
-        backends_value,
-        hardware_type_value,
-        subsets_value,
         score_slider,
         memory_slider,
         backend_checkboxes,
@@ -129,13 +119,10 @@ def create_control_panel(
 def filter_rows_fn(
     machine,
-    subsets,
-    backends,
-    hardware_type,
     # inputs
     score,
     memory,
-    backend_checkboxes,
     precisions,
     attentions,
     quantizations,
@@ -144,14 +131,12 @@ def filter_rows_fn(
     columns,
     search,
 ):
-    llm_perf_df = get_llm_perf_df(
-        machine=machine, subsets=subsets, backends=backends, hardware_type=hardware_type
-    )
     # print(attentions)
     # print(llm_perf_df["Attention 👁️"].unique())
     filtered_llm_perf_df = llm_perf_df[
         llm_perf_df["Model 🤗"].str.contains(search, case=False)
-        & llm_perf_df["Backend 🏭"].isin(backend_checkboxes)
         & llm_perf_df["Precision 📥"].isin(precisions)
         & llm_perf_df["Attention 👁️"].isin(attentions)
         & llm_perf_df["Quantization 🗜️"].isin(quantizations)
@@ -160,7 +145,7 @@ def filter_rows_fn(
         & (llm_perf_df["Memory (MB)"] <= memory)
     ]
     selected_filtered_llm_perf_df = select_columns_fn(
-        machine, subsets, backends, hardware_type, columns, search, filtered_llm_perf_df
     )
     selected_filtered_lat_score_mem_fig = get_lat_score_mem_fig(filtered_llm_perf_df)
     # filtered_bt_prefill_fig = get_bt_prefill_fig(filtered_df)
@@ -186,10 +171,7 @@ def create_control_callback(
     # button
     filter_button,
     # fixed
-    machine_value,
-    subsets_value,
-    backends_value,
-    hardware_type_value,
     # inputs
     score_slider,
     memory_slider,
@@ -215,10 +197,7 @@ def create_control_callback(
         fn=filter_rows_fn,
         inputs=[
             # fixed
-            machine_value,
-            subsets_value,
-            backends_value,
-            hardware_type_value,
             # inputs
             score_slider,
             memory_slider,
@@ -244,16 +223,9 @@ def create_control_callback(
     )
-def select_columns_fn(
-    machine, subsets, backends, hardware_type, columns, search, llm_perf_df=None
-):
     if llm_perf_df is None:
-        llm_perf_df = get_llm_perf_df(
-            machine=machine,
-            subsets=subsets,
-            backends=backends,
-            hardware_type=hardware_type,
-        )
     selected_leaderboard_df = get_leaderboard_df(llm_perf_df)
     selected_leaderboard_df = selected_leaderboard_df[
@@ -266,10 +238,7 @@ def select_columns_fn(
 def create_select_callback(
     # fixed
-    machine_value,
-    subsets_value,
-    backends_value,
-    hardware_type_value,
     # interactive
     columns_checkboxes,
     search_bar,
@@ -278,25 +247,11 @@ def create_select_callback(
 ):
     columns_checkboxes.change(
         fn=select_columns_fn,
-        inputs=[
-            machine_value,
-            subsets_value,
-            backends_value,
-            hardware_type_value,
-            columns_checkboxes,
-            search_bar,
-        ],
         outputs=[leaderboard_table],
     )
     search_bar.change(
         fn=select_columns_fn,
-        inputs=[
-            machine_value,
-            subsets_value,
-            backends_value,
-            hardware_type_value,
-            columns_checkboxes,
-            search_bar,
-        ],
         outputs=[leaderboard_table],
     )

 import gradio as gr
 from src.leaderboard import get_leaderboard_df
 from src.map import get_lat_score_mem_fig
+def create_control_panel(machine: str):
     # controls
+    machine_textbox = gr.Textbox(value=machine, visible=False)
     with gr.Accordion("Control Panel 🎛️", open=False, elem_id="control-panel"):
         with gr.Row():
             with gr.Column(scale=2, variant="panel"):
             with gr.Column(scale=1, variant="panel"):
                 backend_checkboxes = gr.CheckboxGroup(
                     label="Backends 🏭",
+                    choices=["pytorch"],
+                    value=["pytorch"],
                     info="☑️ Select the backends",
                     elem_id="backend-checkboxes",
                 )
             with gr.Column(scale=1, variant="panel"):
                 optimization_checkboxes = gr.CheckboxGroup(
                     label="Attentions 👁️",
+                    choices=["Eager", "SDPA", "FAv2"],
+                    value=["Eager", "SDPA", "FAv2"],
                     info="☑️ Select the optimization",
                     elem_id="optimization-checkboxes",
                 )
             with gr.Column(scale=1, variant="panel"):
                 quantization_checkboxes = gr.CheckboxGroup(
                     label="Quantizations 🗜️",
+                    choices=[
+                        "Unquantized",
+                        "BnB.4bit",
+                        "BnB.8bit",
+                        "AWQ.4bit",
+                        "GPTQ.4bit",
+                    ],
+                    value=[
+                        "Unquantized",
+                        "BnB.4bit",
+                        "BnB.8bit",
+                        "AWQ.4bit",
+                        "GPTQ.4bit",
+                    ],
                     info="☑️ Select the quantization schemes",
                     elem_id="quantization-checkboxes",
                     elem_classes="boxed-option",
             with gr.Column(scale=1, variant="panel"):
                 kernels_checkboxes = gr.CheckboxGroup(
                     label="Kernels ⚛️",
+                    choices=[
+                        "No Kernel",
+                        "GPTQ.ExllamaV1",
+                        "GPTQ.ExllamaV2",
+                        "AWQ.GEMM",
+                        "AWQ.GEMV",
+                    ],
+                    value=[
+                        "No Kernel",
+                        "GPTQ.ExllamaV1",
+                        "GPTQ.ExllamaV2",
+                        "AWQ.GEMM",
+                        "AWQ.GEMV",
+                    ],
                     info="☑️ Select the custom kernels",
                     elem_id="kernel-checkboxes",
                     elem_classes="boxed-option",
     return (
         filter_button,
+        machine_textbox,
         score_slider,
         memory_slider,
         backend_checkboxes,
 def filter_rows_fn(
     machine,
     # inputs
     score,
     memory,
+    backends,
     precisions,
     attentions,
     quantizations,
     columns,
     search,
 ):
+    llm_perf_df = get_llm_perf_df(machine=machine)
     # print(attentions)
     # print(llm_perf_df["Attention 👁️"].unique())
     filtered_llm_perf_df = llm_perf_df[
         llm_perf_df["Model 🤗"].str.contains(search, case=False)
+        & llm_perf_df["Backend 🏭"].isin(backends)
         & llm_perf_df["Precision 📥"].isin(precisions)
         & llm_perf_df["Attention 👁️"].isin(attentions)
         & llm_perf_df["Quantization 🗜️"].isin(quantizations)
         & (llm_perf_df["Memory (MB)"] <= memory)
     ]
     selected_filtered_llm_perf_df = select_columns_fn(
+        machine, columns, search, filtered_llm_perf_df
     )
     selected_filtered_lat_score_mem_fig = get_lat_score_mem_fig(filtered_llm_perf_df)
     # filtered_bt_prefill_fig = get_bt_prefill_fig(filtered_df)
     # button
     filter_button,
     # fixed
+    machine_textbox,
     # inputs
     score_slider,
     memory_slider,
         fn=filter_rows_fn,
         inputs=[
             # fixed
+            machine_textbox,
             # inputs
             score_slider,
             memory_slider,
     )
+def select_columns_fn(machine, columns, search, llm_perf_df=None):
     if llm_perf_df is None:
+        llm_perf_df = get_llm_perf_df(machine=machine)
     selected_leaderboard_df = get_leaderboard_df(llm_perf_df)
     selected_leaderboard_df = selected_leaderboard_df[
 def create_select_callback(
     # fixed
+    machine_textbox,
     # interactive
     columns_checkboxes,
     search_bar,
 ):
     columns_checkboxes.change(
         fn=select_columns_fn,
+        inputs=[machine_textbox, columns_checkboxes, search_bar],
         outputs=[leaderboard_table],
     )
     search_bar.change(
         fn=select_columns_fn,
+        inputs=[machine_textbox, columns_checkboxes, search_bar],
         outputs=[leaderboard_table],
     )

src/utils.py CHANGED Viewed

@@ -70,11 +70,6 @@ def process_quantizations(x):
         and x["config.backend.quantization_config.bits"] == 4
     ):
         return "AWQ.4bit"
-    elif (
-            x["config.backend.quantization_scheme"] == "torchao"
-            and x["config.backend.quantization_config.quant_type"] == "int4_weight_only"
-    ):
-        return "torchao.4bit"
     else:
         return "Unquantized"

         and x["config.backend.quantization_config.bits"] == 4
     ):
         return "AWQ.4bit"
     else:
         return "Unquantized"