Spaces:
Running
Running
TITLE = """<h1 align="center" id="space-title">π€ LLM-Perf Leaderboard ποΈ</h1>""" | |
INTRODUCTION_TEXT = f""" | |
The π€ LLM-Perf Leaderboard ποΈ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors. | |
Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking: | |
- Model evaluation requests should be made in the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the π€ LLM-Perf Leaderboard ποΈ automatically. | |
- Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility. | |
""" | |
ABOUT_TEXT = """<h3>About the π€ LLM-Perf Leaderboard ποΈ</h3> | |
<ul> | |
<li>To avoid communication-dependent results, only one GPU is used.</li> | |
<li>Score is the average evaluation score obtained from the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">π€ Open LLM Leaderboard</a>.</li> | |
<li>LLMs are running on a singleton batch with a prompt size of 512 and generating a 1000 tokens.</li> | |
<li>Peak memory is measured in MB during the generate pass using Py3NVML while assuring the GPU's isolation.</li> | |
<li>Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.</li> | |
<li>Each pair of (Model Type, Weight Class) is represented by the best scored model. This LLM is the one used for all the hardware/backend/optimization experiments.</li> | |
</ul> | |
""" | |
EXAMPLE_CONFIG_TEXT = """ | |
Here's an example of the configuration file used to benchmark the models with Optimum-Benchmark: | |
```yaml | |
defaults: | |
- backend: pytorch # default backend | |
- benchmark: inference # default benchmark | |
- experiment # inheriting from experiment config | |
- _self_ # for hydra 1.1 compatibility | |
- override hydra/job_logging: colorlog # colorful logging | |
- override hydra/hydra_logging: colorlog # colorful logging | |
hydra: | |
run: | |
dir: llm-experiments/{experiment_name} | |
job: | |
chdir: true | |
experiment_name: {experiment_name} | |
model: {model} | |
device: cuda | |
backend: | |
no_weights: true | |
delete_cache: true | |
torch_dtype: float16 | |
quantization_strategy: gptq | |
bettertransformer: true | |
benchmark: | |
memory: true | |
input_shapes: | |
batch_size: 1 | |
sequence_length: 512 | |
new_tokens: 1000 | |
``` | |
""" | |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results." | |
CITATION_BUTTON_TEXT = r"""@misc{llm-perf-leaderboard, | |
author = {Ilyas Moutawwakil, RΓ©gis Pierrard}, | |
title = {LLM-Perf Leaderboard}, | |
year = {2023}, | |
publisher = {Hugging Face}, | |
howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}", | |
@software{optimum-benchmark, | |
author = {Ilyas Moutawwakil, RΓ©gis Pierrard}, | |
publisher = {Hugging Face}, | |
title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.}, | |
} | |
""" | |