Spaces:
Paused
Paused
| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # Benchmarks | |
| <Tip warning={true}> | |
| Hugging Face's Benchmarking tools are deprecated and it is advised to use external Benchmarking libraries to measure the speed | |
| and memory complexity of Transformer models. | |
| </Tip> | |
| [[open-in-colab]] | |
| Let's take a look at how 🤗 Transformers models can be benchmarked, best practices, and already available benchmarks. | |
| A notebook explaining in more detail how to benchmark 🤗 Transformers models can be found [here](https://github.com/huggingface/notebooks/tree/main/examples/benchmark.ipynb). | |
| ## How to benchmark 🤗 Transformers models | |
| The classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] allow to flexibly benchmark 🤗 Transformers models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_. | |
| <Tip> | |
| Hereby, _inference_ is defined by a single forward pass, and _training_ is defined by a single forward pass and | |
| backward pass. | |
| </Tip> | |
| The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an object of type [`PyTorchBenchmarkArguments`] and | |
| [`TensorFlowBenchmarkArguments`], respectively, for instantiation. [`PyTorchBenchmarkArguments`] and [`TensorFlowBenchmarkArguments`] are data classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it is shown how a BERT model of type _bert-base-cased_ can be benchmarked. | |
| <frameworkcontent> | |
| <pt> | |
| ```py | |
| >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments | |
| >>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) | |
| >>> benchmark = PyTorchBenchmark(args) | |
| ``` | |
| </pt> | |
| <tf> | |
| ```py | |
| >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments | |
| >>> args = TensorFlowBenchmarkArguments( | |
| ... models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512] | |
| ... ) | |
| >>> benchmark = TensorFlowBenchmark(args) | |
| ``` | |
| </tf> | |
| </frameworkcontent> | |
| Here, three arguments are given to the benchmark argument data classes, namely `models`, `batch_sizes`, and | |
| `sequence_lengths`. The argument `models` is required and expects a `list` of model identifiers from the | |
| [model hub](https://huggingface.co/models) The `list` arguments `batch_sizes` and `sequence_lengths` define | |
| the size of the `input_ids` on which the model is benchmarked. There are many more parameters that can be configured | |
| via the benchmark argument data classes. For more detail on these one can either directly consult the files | |
| `src/transformers/benchmark/benchmark_args_utils.py`, `src/transformers/benchmark/benchmark_args.py` (for PyTorch) | |
| and `src/transformers/benchmark/benchmark_args_tf.py` (for Tensorflow). Alternatively, running the following shell | |
| commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow | |
| respectively. | |
| <frameworkcontent> | |
| <pt> | |
| ```bash | |
| python examples/pytorch/benchmarking/run_benchmark.py --help | |
| ``` | |
| An instantiated benchmark object can then simply be run by calling `benchmark.run()`. | |
| ```py | |
| >>> results = benchmark.run() | |
| >>> print(results) | |
| ==================== INFERENCE - SPEED - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Time in s | |
| -------------------------------------------------------------------------------- | |
| bert-base-uncased 8 8 0.006 | |
| bert-base-uncased 8 32 0.006 | |
| bert-base-uncased 8 128 0.018 | |
| bert-base-uncased 8 512 0.088 | |
| -------------------------------------------------------------------------------- | |
| ==================== INFERENCE - MEMORY - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Memory in MB | |
| -------------------------------------------------------------------------------- | |
| bert-base-uncased 8 8 1227 | |
| bert-base-uncased 8 32 1281 | |
| bert-base-uncased 8 128 1307 | |
| bert-base-uncased 8 512 1539 | |
| -------------------------------------------------------------------------------- | |
| ==================== ENVIRONMENT INFORMATION ==================== | |
| - transformers_version: 2.11.0 | |
| - framework: PyTorch | |
| - use_torchscript: False | |
| - framework_version: 1.4.0 | |
| - python_version: 3.6.10 | |
| - system: Linux | |
| - cpu: x86_64 | |
| - architecture: 64bit | |
| - date: 2020-06-29 | |
| - time: 08:58:43.371351 | |
| - fp16: False | |
| - use_multiprocessing: True | |
| - only_pretrain_model: False | |
| - cpu_ram_mb: 32088 | |
| - use_gpu: True | |
| - num_gpus: 1 | |
| - gpu: TITAN RTX | |
| - gpu_ram_mb: 24217 | |
| - gpu_power_watts: 280.0 | |
| - gpu_performance_state: 2 | |
| - use_tpu: False | |
| ``` | |
| </pt> | |
| <tf> | |
| ```bash | |
| python examples/tensorflow/benchmarking/run_benchmark_tf.py --help | |
| ``` | |
| An instantiated benchmark object can then simply be run by calling `benchmark.run()`. | |
| ```py | |
| >>> results = benchmark.run() | |
| >>> print(results) | |
| >>> results = benchmark.run() | |
| >>> print(results) | |
| ==================== INFERENCE - SPEED - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Time in s | |
| -------------------------------------------------------------------------------- | |
| bert-base-uncased 8 8 0.005 | |
| bert-base-uncased 8 32 0.008 | |
| bert-base-uncased 8 128 0.022 | |
| bert-base-uncased 8 512 0.105 | |
| -------------------------------------------------------------------------------- | |
| ==================== INFERENCE - MEMORY - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Memory in MB | |
| -------------------------------------------------------------------------------- | |
| bert-base-uncased 8 8 1330 | |
| bert-base-uncased 8 32 1330 | |
| bert-base-uncased 8 128 1330 | |
| bert-base-uncased 8 512 1770 | |
| -------------------------------------------------------------------------------- | |
| ==================== ENVIRONMENT INFORMATION ==================== | |
| - transformers_version: 2.11.0 | |
| - framework: Tensorflow | |
| - use_xla: False | |
| - framework_version: 2.2.0 | |
| - python_version: 3.6.10 | |
| - system: Linux | |
| - cpu: x86_64 | |
| - architecture: 64bit | |
| - date: 2020-06-29 | |
| - time: 09:26:35.617317 | |
| - fp16: False | |
| - use_multiprocessing: True | |
| - only_pretrain_model: False | |
| - cpu_ram_mb: 32088 | |
| - use_gpu: True | |
| - num_gpus: 1 | |
| - gpu: TITAN RTX | |
| - gpu_ram_mb: 24217 | |
| - gpu_power_watts: 280.0 | |
| - gpu_performance_state: 2 | |
| - use_tpu: False | |
| ``` | |
| </tf> | |
| </frameworkcontent> | |
| By default, the _time_ and the _required memory_ for _inference_ are benchmarked. In the example output above the first | |
| two sections show the result corresponding to _inference time_ and _inference memory_. In addition, all relevant | |
| information about the computing environment, _e.g._ the GPU type, the system, the library versions, etc... are printed | |
| out in the third section under _ENVIRONMENT INFORMATION_. This information can optionally be saved in a _.csv_ file | |
| when adding the argument `save_to_csv=True` to [`PyTorchBenchmarkArguments`] and | |
| [`TensorFlowBenchmarkArguments`] respectively. In this case, every section is saved in a separate | |
| _.csv_ file. The path to each _.csv_ file can optionally be defined via the argument data classes. | |
| Instead of benchmarking pre-trained models via their model identifier, _e.g._ `bert-base-uncased`, the user can | |
| alternatively benchmark an arbitrary configuration of any available model class. In this case, a `list` of | |
| configurations must be inserted with the benchmark args as follows. | |
| <frameworkcontent> | |
| <pt> | |
| ```py | |
| >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig | |
| >>> args = PyTorchBenchmarkArguments( | |
| ... models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512] | |
| ... ) | |
| >>> config_base = BertConfig() | |
| >>> config_384_hid = BertConfig(hidden_size=384) | |
| >>> config_6_lay = BertConfig(num_hidden_layers=6) | |
| >>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay]) | |
| >>> benchmark.run() | |
| ==================== INFERENCE - SPEED - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Time in s | |
| -------------------------------------------------------------------------------- | |
| bert-base 8 128 0.006 | |
| bert-base 8 512 0.006 | |
| bert-base 8 128 0.018 | |
| bert-base 8 512 0.088 | |
| bert-384-hid 8 8 0.006 | |
| bert-384-hid 8 32 0.006 | |
| bert-384-hid 8 128 0.011 | |
| bert-384-hid 8 512 0.054 | |
| bert-6-lay 8 8 0.003 | |
| bert-6-lay 8 32 0.004 | |
| bert-6-lay 8 128 0.009 | |
| bert-6-lay 8 512 0.044 | |
| -------------------------------------------------------------------------------- | |
| ==================== INFERENCE - MEMORY - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Memory in MB | |
| -------------------------------------------------------------------------------- | |
| bert-base 8 8 1277 | |
| bert-base 8 32 1281 | |
| bert-base 8 128 1307 | |
| bert-base 8 512 1539 | |
| bert-384-hid 8 8 1005 | |
| bert-384-hid 8 32 1027 | |
| bert-384-hid 8 128 1035 | |
| bert-384-hid 8 512 1255 | |
| bert-6-lay 8 8 1097 | |
| bert-6-lay 8 32 1101 | |
| bert-6-lay 8 128 1127 | |
| bert-6-lay 8 512 1359 | |
| -------------------------------------------------------------------------------- | |
| ==================== ENVIRONMENT INFORMATION ==================== | |
| - transformers_version: 2.11.0 | |
| - framework: PyTorch | |
| - use_torchscript: False | |
| - framework_version: 1.4.0 | |
| - python_version: 3.6.10 | |
| - system: Linux | |
| - cpu: x86_64 | |
| - architecture: 64bit | |
| - date: 2020-06-29 | |
| - time: 09:35:25.143267 | |
| - fp16: False | |
| - use_multiprocessing: True | |
| - only_pretrain_model: False | |
| - cpu_ram_mb: 32088 | |
| - use_gpu: True | |
| - num_gpus: 1 | |
| - gpu: TITAN RTX | |
| - gpu_ram_mb: 24217 | |
| - gpu_power_watts: 280.0 | |
| - gpu_performance_state: 2 | |
| - use_tpu: False | |
| ``` | |
| </pt> | |
| <tf> | |
| ```py | |
| >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig | |
| >>> args = TensorFlowBenchmarkArguments( | |
| ... models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512] | |
| ... ) | |
| >>> config_base = BertConfig() | |
| >>> config_384_hid = BertConfig(hidden_size=384) | |
| >>> config_6_lay = BertConfig(num_hidden_layers=6) | |
| >>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay]) | |
| >>> benchmark.run() | |
| ==================== INFERENCE - SPEED - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Time in s | |
| -------------------------------------------------------------------------------- | |
| bert-base 8 8 0.005 | |
| bert-base 8 32 0.008 | |
| bert-base 8 128 0.022 | |
| bert-base 8 512 0.106 | |
| bert-384-hid 8 8 0.005 | |
| bert-384-hid 8 32 0.007 | |
| bert-384-hid 8 128 0.018 | |
| bert-384-hid 8 512 0.064 | |
| bert-6-lay 8 8 0.002 | |
| bert-6-lay 8 32 0.003 | |
| bert-6-lay 8 128 0.0011 | |
| bert-6-lay 8 512 0.074 | |
| -------------------------------------------------------------------------------- | |
| ==================== INFERENCE - MEMORY - RESULT ==================== | |
| -------------------------------------------------------------------------------- | |
| Model Name Batch Size Seq Length Memory in MB | |
| -------------------------------------------------------------------------------- | |
| bert-base 8 8 1330 | |
| bert-base 8 32 1330 | |
| bert-base 8 128 1330 | |
| bert-base 8 512 1770 | |
| bert-384-hid 8 8 1330 | |
| bert-384-hid 8 32 1330 | |
| bert-384-hid 8 128 1330 | |
| bert-384-hid 8 512 1540 | |
| bert-6-lay 8 8 1330 | |
| bert-6-lay 8 32 1330 | |
| bert-6-lay 8 128 1330 | |
| bert-6-lay 8 512 1540 | |
| -------------------------------------------------------------------------------- | |
| ==================== ENVIRONMENT INFORMATION ==================== | |
| - transformers_version: 2.11.0 | |
| - framework: Tensorflow | |
| - use_xla: False | |
| - framework_version: 2.2.0 | |
| - python_version: 3.6.10 | |
| - system: Linux | |
| - cpu: x86_64 | |
| - architecture: 64bit | |
| - date: 2020-06-29 | |
| - time: 09:38:15.487125 | |
| - fp16: False | |
| - use_multiprocessing: True | |
| - only_pretrain_model: False | |
| - cpu_ram_mb: 32088 | |
| - use_gpu: True | |
| - num_gpus: 1 | |
| - gpu: TITAN RTX | |
| - gpu_ram_mb: 24217 | |
| - gpu_power_watts: 280.0 | |
| - gpu_performance_state: 2 | |
| - use_tpu: False | |
| ``` | |
| </tf> | |
| </frameworkcontent> | |
| Again, _inference time_ and _required memory_ for _inference_ are measured, but this time for customized configurations | |
| of the `BertModel` class. This feature can especially be helpful when deciding for which configuration the model | |
| should be trained. | |
| ## Benchmark best practices | |
| This section lists a couple of best practices one should be aware of when benchmarking a model. | |
| - Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user | |
| specifies on which device the code should be run by setting the `CUDA_VISIBLE_DEVICES` environment variable in the | |
| shell, _e.g._ `export CUDA_VISIBLE_DEVICES=0` before running the code. | |
| - The option `no_multi_processing` should only be set to `True` for testing and debugging. To ensure accurate | |
| memory measurement it is recommended to run each memory benchmark in a separate process by making sure | |
| `no_multi_processing` is set to `True`. | |
| - One should always state the environment information when sharing the results of a model benchmark. Results can vary | |
| heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very | |
| useful for the community. | |
| ## Sharing your benchmark | |
| Previously all available core models (10 at the time) have been benchmarked for _inference time_, across many different | |
| settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were | |
| done across CPUs (except for TensorFlow XLA) and GPUs. | |
| The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) and the results are | |
| available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing). | |
| With the new _benchmark_ tools, it is easier than ever to share your benchmark results with the community | |
| - [PyTorch Benchmarking Results](https://github.com/huggingface/transformers/tree/main/examples/pytorch/benchmarking/README.md). | |
| - [TensorFlow Benchmarking Results](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/benchmarking/README.md). | |