Spaces:
Running
Running
File size: 3,330 Bytes
3c37eb3 c8763bd 3c37eb3 bee5389 ad5bd56 3c37eb3 ad5bd56 c8763bd 9dc4521 3c37eb3 e747f4e c382b2a 9e3eaf4 d574374 df1a500 67b4a03 483e3a1 e2e1ee9 483e3a1 e2e1ee9 483e3a1 bee5389 6203f23 2ff4a74 3c37eb3 9dc4521 bee5389 9dc4521 2ff4a74 00642fb ad5bd56 9dc4521 bee5389 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
TITLE = """<h1 align="center" id="space-title">π€ LLM-Perf Leaderboard ποΈ</h1>"""
INTRODUCTION_TEXT = f"""
The π€ LLM-Perf Leaderboard ποΈ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors.
Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
- Model evaluation requests should be made in the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the π€ LLM-Perf Leaderboard ποΈ automatically.
- Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility.
"""
ABOUT_TEXT = """<h3>About the π€ LLM-Perf Leaderboard ποΈ</h3>
<ul>
<li>To avoid communication-dependent results, only one GPU is used.</li>
<li>Score is the average evaluation score obtained from the <a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard">π€ Open LLM Leaderboard</a>.</li>
<li>LLMs are running on a singleton batch with a prompt size of 512 and generating a 1000 tokens.</li>
<li>Peak memory is measured in MB during the generate pass using Py3NVML while assuring the GPU's isolation.</li>
<li>Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.</li>
<li>Each pair of (Model Type, Weight Class) is represented by the best scored model. This LLM is the one used for all the hardware/backend/optimization experiments.</li>
</ul>
"""
EXAMPLE_CONFIG_TEXT = """
Here's an example of the configuration file used to benchmark the models with Optimum-Benchmark:
```yaml
defaults:
- backend: pytorch # default backend
- benchmark: inference # default benchmark
- experiment # inheriting from experiment config
- _self_ # for hydra 1.1 compatibility
- override hydra/job_logging: colorlog # colorful logging
- override hydra/hydra_logging: colorlog # colorful logging
hydra:
run:
dir: llm-experiments/{experiment_name}
job:
chdir: true
experiment_name: {experiment_name}
model: {model}
device: cuda
backend:
no_weights: true
delete_cache: true
torch_dtype: float16
quantization_strategy: gptq
bettertransformer: true
benchmark:
memory: true
input_shapes:
batch_size: 1
sequence_length: 512
new_tokens: 1000
```
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
CITATION_BUTTON_TEXT = r"""@misc{llm-perf-leaderboard,
author = {Ilyas Moutawwakil, RΓ©gis Pierrard},
title = {LLM-Perf Leaderboard},
year = {2023},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}",
@software{optimum-benchmark,
author = {Ilyas Moutawwakil, RΓ©gis Pierrard},
publisher = {Hugging Face},
title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.},
}
"""
|