Spaces:
Runtime error
Runtime error
File size: 41,815 Bytes
455a40f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 |
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Testing
Let's take a look at how 🤗 Transformers models are tested and how you can write new tests and improve the existing ones.
There are 2 test suites in the repository:
1. `tests` -- tests for the general API
2. `examples` -- tests primarily for various applications that aren't part of the API
## How transformers are tested
1. Once a PR is submitted it gets tested with 9 CircleCi jobs. Every new commit to that PR gets retested. These jobs
are defined in this [config file](https://github.com/huggingface/transformers/tree/main/.circleci/config.yml), so that if needed you can reproduce the same
environment on your machine.
These CI jobs don't run `@slow` tests.
2. There are 3 jobs run by [github actions](https://github.com/huggingface/transformers/actions):
- [torch hub integration](https://github.com/huggingface/transformers/tree/main/.github/workflows/github-torch-hub.yml): checks whether torch hub
integration works.
- [self-hosted (push)](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-push.yml): runs fast tests on GPU only on commits on
`main`. It only runs if a commit on `main` has updated the code in one of the following folders: `src`,
`tests`, `.github` (to prevent running on added model cards, notebooks, etc.)
- [self-hosted runner](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-scheduled.yml): runs normal and slow tests on GPU in
`tests` and `examples`:
```bash
RUN_SLOW=1 pytest tests/
RUN_SLOW=1 pytest examples/
```
The results can be observed [here](https://github.com/huggingface/transformers/actions).
## Running tests
### Choosing which tests to run
This document goes into many details of how tests can be run. If after reading everything, you need even more details
you will find them [here](https://docs.pytest.org/en/latest/usage.html).
Here are some most useful ways of running tests.
Run all:
```console
pytest
```
or:
```bash
make test
```
Note that the latter is defined as:
```bash
python -m pytest -n auto --dist=loadfile -s -v ./tests/
```
which tells pytest to:
- run as many test processes as they are CPU cores (which could be too many if you don't have a ton of RAM!)
- ensure that all tests from the same file will be run by the same test process
- do not capture output
- run in verbose mode
### Getting the list of all tests
All tests of the test suite:
```bash
pytest --collect-only -q
```
All tests of a given test file:
```bash
pytest tests/test_optimization.py --collect-only -q
```
### Run a specific test module
To run an individual test module:
```bash
pytest tests/test_logging.py
```
### Run specific tests
Since unittest is used inside most of the tests, to run specific subtests you need to know the name of the unittest
class containing those tests. For example, it could be:
```bash
pytest tests/test_optimization.py::OptimizationTest::test_adam_w
```
Here:
- `tests/test_optimization.py` - the file with tests
- `OptimizationTest` - the name of the class
- `test_adam_w` - the name of the specific test function
If the file contains multiple classes, you can choose to run only tests of a given class. For example:
```bash
pytest tests/test_optimization.py::OptimizationTest
```
will run all the tests inside that class.
As mentioned earlier you can see what tests are contained inside the `OptimizationTest` class by running:
```bash
pytest tests/test_optimization.py::OptimizationTest --collect-only -q
```
You can run tests by keyword expressions.
To run only tests whose name contains `adam`:
```bash
pytest -k adam tests/test_optimization.py
```
Logical `and` and `or` can be used to indicate whether all keywords should match or either. `not` can be used to
negate.
To run all tests except those whose name contains `adam`:
```bash
pytest -k "not adam" tests/test_optimization.py
```
And you can combine the two patterns in one:
```bash
pytest -k "ada and not adam" tests/test_optimization.py
```
For example to run both `test_adafactor` and `test_adam_w` you can use:
```bash
pytest -k "test_adam_w or test_adam_w" tests/test_optimization.py
```
Note that we use `or` here, since we want either of the keywords to match to include both.
If you want to include only tests that include both patterns, `and` is to be used:
```bash
pytest -k "test and ada" tests/test_optimization.py
```
### Run `accelerate` tests
Sometimes you need to run `accelerate` tests on your models. For that you can just add `-m accelerate_tests` to your command, if let's say you want to run these tests on `OPT` run:
```bash
RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py
```
### Run documentation tests
In order to test whether the documentation examples are correct, you should check that the `doctests` are passing.
As an example, let's use [`WhisperModel.forward`'s docstring](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035):
```python
r"""
Returns:
Example:
```python
>>> import torch
>>> from transformers import WhisperModel, WhisperFeatureExtractor
>>> from datasets import load_dataset
>>> model = WhisperModel.from_pretrained("openai/whisper-base")
>>> feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-base")
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> inputs = feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")
>>> input_features = inputs.input_features
>>> decoder_input_ids = torch.tensor([[1, 1]]) * model.config.decoder_start_token_id
>>> last_hidden_state = model(input_features, decoder_input_ids=decoder_input_ids).last_hidden_state
>>> list(last_hidden_state.shape)
[1, 2, 512]
```"""
```
3 steps are required to debug the docstring examples:
1. In order to properly run the test, **an extra line has to be added** at the end of the docstring. This can be automatically done on any file using:
```bash
python utils/prepare_for_doc_test.py <path_to_file_or_dir>
```
2. Then, you can use the following line to automatically test every docstring example in the desired file:
```bash
pytest --doctest-modules <path_to_file_or_dir>
```
3. Once you are done debugging, you need to remove the extra line added in step **1.** by running the following:
```bash
python utils/prepare_for_doc_test.py <path_to_file_or_dir> --remove_new_line
```
### Run only modified tests
You can run the tests related to the unstaged files or the current branch (according to Git) by using [pytest-picked](https://github.com/anapaulagomes/pytest-picked). This is a great way of quickly testing your changes didn't break
anything, since it won't run the tests related to files you didn't touch.
```bash
pip install pytest-picked
```
```bash
pytest --picked
```
All tests will be run from files and folders which are modified, but not yet committed.
### Automatically rerun failed tests on source modification
[pytest-xdist](https://github.com/pytest-dev/pytest-xdist) provides a very useful feature of detecting all failed
tests, and then waiting for you to modify files and continuously re-rerun those failing tests until they pass while you
fix them. So that you don't need to re start pytest after you made the fix. This is repeated until all tests pass after
which again a full run is performed.
```bash
pip install pytest-xdist
```
To enter the mode: `pytest -f` or `pytest --looponfail`
File changes are detected by looking at `looponfailroots` root directories and all of their contents (recursively).
If the default for this value does not work for you, you can change it in your project by setting a configuration
option in `setup.cfg`:
```ini
[tool:pytest]
looponfailroots = transformers tests
```
or `pytest.ini`/``tox.ini`` files:
```ini
[pytest]
looponfailroots = transformers tests
```
This would lead to only looking for file changes in the respective directories, specified relatively to the ini-file’s
directory.
[pytest-watch](https://github.com/joeyespo/pytest-watch) is an alternative implementation of this functionality.
### Skip a test module
If you want to run all test modules, except a few you can exclude them by giving an explicit list of tests to run. For
example, to run all except `test_modeling_*.py` tests:
```bash
pytest *ls -1 tests/*py | grep -v test_modeling*
```
### Clearing state
CI builds and when isolation is important (against speed), cache should be cleared:
```bash
pytest --cache-clear tests
```
### Running tests in parallel
As mentioned earlier `make test` runs tests in parallel via `pytest-xdist` plugin (`-n X` argument, e.g. `-n 2`
to run 2 parallel jobs).
`pytest-xdist`'s `--dist=` option allows one to control how the tests are grouped. `--dist=loadfile` puts the
tests located in one file onto the same process.
Since the order of executed tests is different and unpredictable, if running the test suite with `pytest-xdist`
produces failures (meaning we have some undetected coupled tests), use [pytest-replay](https://github.com/ESSS/pytest-replay) to replay the tests in the same order, which should help with then somehow
reducing that failing sequence to a minimum.
### Test order and repetition
It's good to repeat the tests several times, in sequence, randomly, or in sets, to detect any potential
inter-dependency and state-related bugs (tear down). And the straightforward multiple repetition is just good to detect
some problems that get uncovered by randomness of DL.
#### Repeat tests
- [pytest-flakefinder](https://github.com/dropbox/pytest-flakefinder):
```bash
pip install pytest-flakefinder
```
And then run every test multiple times (50 by default):
```bash
pytest --flake-finder --flake-runs=5 tests/test_failing_test.py
```
<Tip>
This plugin doesn't work with `-n` flag from `pytest-xdist`.
</Tip>
<Tip>
There is another plugin `pytest-repeat`, but it doesn't work with `unittest`.
</Tip>
#### Run tests in a random order
```bash
pip install pytest-random-order
```
Important: the presence of `pytest-random-order` will automatically randomize tests, no configuration change or
command line options is required.
As explained earlier this allows detection of coupled tests - where one test's state affects the state of another. When
`pytest-random-order` is installed it will print the random seed it used for that session, e.g:
```bash
pytest tests
[...]
Using --random-order-bucket=module
Using --random-order-seed=573663
```
So that if the given particular sequence fails, you can reproduce it by adding that exact seed, e.g.:
```bash
pytest --random-order-seed=573663
[...]
Using --random-order-bucket=module
Using --random-order-seed=573663
```
It will only reproduce the exact order if you use the exact same list of tests (or no list at all). Once you start to
manually narrowing down the list you can no longer rely on the seed, but have to list them manually in the exact order
they failed and tell pytest to not randomize them instead using `--random-order-bucket=none`, e.g.:
```bash
pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py
```
To disable the shuffling for all tests:
```bash
pytest --random-order-bucket=none
```
By default `--random-order-bucket=module` is implied, which will shuffle the files on the module levels. It can also
shuffle on `class`, `package`, `global` and `none` levels. For the complete details please see its
[documentation](https://github.com/jbasko/pytest-random-order).
Another randomization alternative is: [`pytest-randomly`](https://github.com/pytest-dev/pytest-randomly). This
module has a very similar functionality/interface, but it doesn't have the bucket modes available in
`pytest-random-order`. It has the same problem of imposing itself once installed.
### Look and feel variations
#### pytest-sugar
[pytest-sugar](https://github.com/Frozenball/pytest-sugar) is a plugin that improves the look-n-feel, adds a
progressbar, and show tests that fail and the assert instantly. It gets activated automatically upon installation.
```bash
pip install pytest-sugar
```
To run tests without it, run:
```bash
pytest -p no:sugar
```
or uninstall it.
#### Report each sub-test name and its progress
For a single or a group of tests via `pytest` (after `pip install pytest-pspec`):
```bash
pytest --pspec tests/test_optimization.py
```
#### Instantly shows failed tests
[pytest-instafail](https://github.com/pytest-dev/pytest-instafail) shows failures and errors instantly instead of
waiting until the end of test session.
```bash
pip install pytest-instafail
```
```bash
pytest --instafail
```
### To GPU or not to GPU
On a GPU-enabled setup, to test in CPU-only mode add `CUDA_VISIBLE_DEVICES=""`:
```bash
CUDA_VISIBLE_DEVICES="" pytest tests/test_logging.py
```
or if you have multiple gpus, you can specify which one is to be used by `pytest`. For example, to use only the
second gpu if you have gpus `0` and `1`, you can run:
```bash
CUDA_VISIBLE_DEVICES="1" pytest tests/test_logging.py
```
This is handy when you want to run different tasks on different GPUs.
Some tests must be run on CPU-only, others on either CPU or GPU or TPU, yet others on multiple-GPUs. The following skip
decorators are used to set the requirements of tests CPU/GPU/TPU-wise:
- `require_torch` - this test will run only under torch
- `require_torch_gpu` - as `require_torch` plus requires at least 1 GPU
- `require_torch_multi_gpu` - as `require_torch` plus requires at least 2 GPUs
- `require_torch_non_multi_gpu` - as `require_torch` plus requires 0 or 1 GPUs
- `require_torch_up_to_2_gpus` - as `require_torch` plus requires 0 or 1 or 2 GPUs
- `require_torch_tpu` - as `require_torch` plus requires at least 1 TPU
Let's depict the GPU requirements in the following table:
| n gpus | decorator |
|--------+--------------------------------|
| `>= 0` | `@require_torch` |
| `>= 1` | `@require_torch_gpu` |
| `>= 2` | `@require_torch_multi_gpu` |
| `< 2` | `@require_torch_non_multi_gpu` |
| `< 3` | `@require_torch_up_to_2_gpus` |
For example, here is a test that must be run only when there are 2 or more GPUs available and pytorch is installed:
```python no-style
@require_torch_multi_gpu
def test_example_with_multi_gpu():
```
If a test requires `tensorflow` use the `require_tf` decorator. For example:
```python no-style
@require_tf
def test_tf_thing_with_tensorflow():
```
These decorators can be stacked. For example, if a test is slow and requires at least one GPU under pytorch, here is
how to set it up:
```python no-style
@require_torch_gpu
@slow
def test_example_slow_on_gpu():
```
Some decorators like `@parametrized` rewrite test names, therefore `@require_*` skip decorators have to be listed
last for them to work correctly. Here is an example of the correct usage:
```python no-style
@parameterized.expand(...)
@require_torch_multi_gpu
def test_integration_foo():
```
This order problem doesn't exist with `@pytest.mark.parametrize`, you can put it first or last and it will still
work. But it only works with non-unittests.
Inside tests:
- How many GPUs are available:
```python
from transformers.testing_utils import get_gpu_count
n_gpu = get_gpu_count() # works with torch and tf
```
### Distributed training
`pytest` can't deal with distributed training directly. If this is attempted - the sub-processes don't do the right
thing and end up thinking they are `pytest` and start running the test suite in loops. It works, however, if one
spawns a normal process that then spawns off multiple workers and manages the IO pipes.
Here are some tests that use it:
- [test_trainer_distributed.py](https://github.com/huggingface/transformers/tree/main/tests/trainer/test_trainer_distributed.py)
- [test_deepspeed.py](https://github.com/huggingface/transformers/tree/main/tests/deepspeed/test_deepspeed.py)
To jump right into the execution point, search for the `execute_subprocess_async` call in those tests.
You will need at least 2 GPUs to see these tests in action:
```bash
CUDA_VISIBLE_DEVICES=0,1 RUN_SLOW=1 pytest -sv tests/test_trainer_distributed.py
```
### Output capture
During test execution any output sent to `stdout` and `stderr` is captured. If a test or a setup method fails, its
according captured output will usually be shown along with the failure traceback.
To disable output capturing and to get the `stdout` and `stderr` normally, use `-s` or `--capture=no`:
```bash
pytest -s tests/test_logging.py
```
To send test results to JUnit format output:
```bash
py.test tests --junitxml=result.xml
```
### Color control
To have no color (e.g., yellow on white background is not readable):
```bash
pytest --color=no tests/test_logging.py
```
### Sending test report to online pastebin service
Creating a URL for each test failure:
```bash
pytest --pastebin=failed tests/test_logging.py
```
This will submit test run information to a remote Paste service and provide a URL for each failure. You may select
tests as usual or add for example -x if you only want to send one particular failure.
Creating a URL for a whole test session log:
```bash
pytest --pastebin=all tests/test_logging.py
```
## Writing tests
🤗 transformers tests are based on `unittest`, but run by `pytest`, so most of the time features from both systems
can be used.
You can read [here](https://docs.pytest.org/en/stable/unittest.html) which features are supported, but the important
thing to remember is that most `pytest` fixtures don't work. Neither parametrization, but we use the module
`parameterized` that works in a similar way.
### Parametrization
Often, there is a need to run the same test multiple times, but with different arguments. It could be done from within
the test, but then there is no way of running that test for just one set of arguments.
```python
# test_this1.py
import unittest
from parameterized import parameterized
class TestMathUnitTest(unittest.TestCase):
@parameterized.expand(
[
("negative", -1.5, -2.0),
("integer", 1, 1.0),
("large fraction", 1.6, 1),
]
)
def test_floor(self, name, input, expected):
assert_equal(math.floor(input), expected)
```
Now, by default this test will be run 3 times, each time with the last 3 arguments of `test_floor` being assigned the
corresponding arguments in the parameter list.
and you could run just the `negative` and `integer` sets of params with:
```bash
pytest -k "negative and integer" tests/test_mytest.py
```
or all but `negative` sub-tests, with:
```bash
pytest -k "not negative" tests/test_mytest.py
```
Besides using the `-k` filter that was just mentioned, you can find out the exact name of each sub-test and run any
or all of them using their exact names.
```bash
pytest test_this1.py --collect-only -q
```
and it will list:
```bash
test_this1.py::TestMathUnitTest::test_floor_0_negative
test_this1.py::TestMathUnitTest::test_floor_1_integer
test_this1.py::TestMathUnitTest::test_floor_2_large_fraction
```
So now you can run just 2 specific sub-tests:
```bash
pytest test_this1.py::TestMathUnitTest::test_floor_0_negative test_this1.py::TestMathUnitTest::test_floor_1_integer
```
The module [parameterized](https://pypi.org/project/parameterized/) which is already in the developer dependencies
of `transformers` works for both: `unittests` and `pytest` tests.
If, however, the test is not a `unittest`, you may use `pytest.mark.parametrize` (or you may see it being used in
some existing tests, mostly under `examples`).
Here is the same example, this time using `pytest`'s `parametrize` marker:
```python
# test_this2.py
import pytest
@pytest.mark.parametrize(
"name, input, expected",
[
("negative", -1.5, -2.0),
("integer", 1, 1.0),
("large fraction", 1.6, 1),
],
)
def test_floor(name, input, expected):
assert_equal(math.floor(input), expected)
```
Same as with `parameterized`, with `pytest.mark.parametrize` you can have a fine control over which sub-tests are
run, if the `-k` filter doesn't do the job. Except, this parametrization function creates a slightly different set of
names for the sub-tests. Here is what they look like:
```bash
pytest test_this2.py --collect-only -q
```
and it will list:
```bash
test_this2.py::test_floor[integer-1-1.0]
test_this2.py::test_floor[negative--1.5--2.0]
test_this2.py::test_floor[large fraction-1.6-1]
```
So now you can run just the specific test:
```bash
pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[integer-1-1.0]
```
as in the previous example.
### Files and directories
In tests often we need to know where things are relative to the current test file, and it's not trivial since the test
could be invoked from more than one directory or could reside in sub-directories with different depths. A helper class
`transformers.test_utils.TestCasePlus` solves this problem by sorting out all the basic paths and provides easy
accessors to them:
- `pathlib` objects (all fully resolved):
- `test_file_path` - the current test file path, i.e. `__file__`
- `test_file_dir` - the directory containing the current test file
- `tests_dir` - the directory of the `tests` test suite
- `examples_dir` - the directory of the `examples` test suite
- `repo_root_dir` - the directory of the repository
- `src_dir` - the directory of `src` (i.e. where the `transformers` sub-dir resides)
- stringified paths---same as above but these return paths as strings, rather than `pathlib` objects:
- `test_file_path_str`
- `test_file_dir_str`
- `tests_dir_str`
- `examples_dir_str`
- `repo_root_dir_str`
- `src_dir_str`
To start using those all you need is to make sure that the test resides in a subclass of
`transformers.test_utils.TestCasePlus`. For example:
```python
from transformers.testing_utils import TestCasePlus
class PathExampleTest(TestCasePlus):
def test_something_involving_local_locations(self):
data_dir = self.tests_dir / "fixtures/tests_samples/wmt_en_ro"
```
If you don't need to manipulate paths via `pathlib` or you just need a path as a string, you can always invoked
`str()` on the `pathlib` object or use the accessors ending with `_str`. For example:
```python
from transformers.testing_utils import TestCasePlus
class PathExampleTest(TestCasePlus):
def test_something_involving_stringified_locations(self):
examples_dir = self.examples_dir_str
```
### Temporary files and directories
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
them. Therefore, using packages like `tempfile`, which address these needs is essential.
However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
to know it's exact path and not having it randomized on every test re-run.
A helper class `transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
`unittest.TestCase`, so we can easily inherit from it in the test modules.
Here is an example of its usage:
```python
from transformers.testing_utils import TestCasePlus
class ExamplesTests(TestCasePlus):
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir()
```
This code creates a unique temporary directory, and sets `tmp_dir` to its location.
- Create a unique temporary dir:
```python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir()
```
`tmp_dir` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.
- Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.
```python
def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
```
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
leave any data in there.
- You can override the default behavior by directly overriding the `before` and `after` args, leading to one of the
following behaviors:
- `before=True`: the temporary dir will always be cleared at the beginning of the test.
- `before=False`: if the temporary dir already existed, any existing files will remain there.
- `after=True`: the temporary dir will always be deleted at the end of the test.
- `after=False`: the temporary dir will always be left intact at the end of the test.
<Tip>
In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are allowed if
an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem will
get nuked. i.e. please always pass paths that start with `./`.
</Tip>
<Tip>
Each test can register multiple temporary directories and they all will get auto-removed, unless requested
otherwise.
</Tip>
### Temporary sys.path override
If you need to temporary override `sys.path` to import from another test for example, you can use the
`ExtendSysPath` context manager. Example:
```python
import os
from transformers.testing_utils import ExtendSysPath
bindir = os.path.abspath(os.path.dirname(__file__))
with ExtendSysPath(f"{bindir}/.."):
from test_trainer import TrainerIntegrationCommon # noqa
```
### Skipping tests
This is useful when a bug is found and a new test is written, yet the bug is not fixed yet. In order to be able to
commit it to the main repository we need make sure it's skipped during `make test`.
Methods:
- A **skip** means that you expect your test to pass only if some conditions are met, otherwise pytest should skip
running the test altogether. Common examples are skipping windows-only tests on non-windows platforms, or skipping
tests that depend on an external resource which is not available at the moment (for example a database).
- A **xfail** means that you expect a test to fail for some reason. A common example is a test for a feature not yet
implemented, or a bug not yet fixed. When a test passes despite being expected to fail (marked with
pytest.mark.xfail), it’s an xpass and will be reported in the test summary.
One of the important differences between the two is that `skip` doesn't run the test, and `xfail` does. So if the
code that's buggy causes some bad state that will affect other tests, do not use `xfail`.
#### Implementation
- Here is how to skip whole test unconditionally:
```python no-style
@unittest.skip("this bug needs to be fixed")
def test_feature_x():
```
or via pytest:
```python no-style
@pytest.mark.skip(reason="this bug needs to be fixed")
```
or the `xfail` way:
```python no-style
@pytest.mark.xfail
def test_feature_x():
```
- Here is how to skip a test based on some internal check inside the test:
```python
def test_feature_x():
if not has_something():
pytest.skip("unsupported configuration")
```
or the whole module:
```python
import pytest
if not pytest.config.getoption("--custom-flag"):
pytest.skip("--custom-flag is missing, skipping tests", allow_module_level=True)
```
or the `xfail` way:
```python
def test_feature_x():
pytest.xfail("expected to fail until bug XYZ is fixed")
```
- Here is how to skip all tests in a module if some import is missing:
```python
docutils = pytest.importorskip("docutils", minversion="0.3")
```
- Skip a test based on a condition:
```python no-style
@pytest.mark.skipif(sys.version_info < (3,6), reason="requires python3.6 or higher")
def test_feature_x():
```
or:
```python no-style
@unittest.skipIf(torch_device == "cpu", "Can't do half precision")
def test_feature_x():
```
or skip the whole module:
```python no-style
@pytest.mark.skipif(sys.platform == 'win32', reason="does not run on windows")
class TestClass():
def test_feature_x(self):
```
More details, example and ways are [here](https://docs.pytest.org/en/latest/skipping.html).
### Slow tests
The library of tests is ever-growing, and some of the tests take minutes to run, therefore we can't afford waiting for
an hour for the test suite to complete on CI. Therefore, with some exceptions for essential tests, slow tests should be
marked as in the example below:
```python no-style
from transformers.testing_utils import slow
@slow
def test_integration_foo():
```
Once a test is marked as `@slow`, to run such tests set `RUN_SLOW=1` env var, e.g.:
```bash
RUN_SLOW=1 pytest tests
```
Some decorators like `@parameterized` rewrite test names, therefore `@slow` and the rest of the skip decorators
`@require_*` have to be listed last for them to work correctly. Here is an example of the correct usage:
```python no-style
@parameteriz ed.expand(...)
@slow
def test_integration_foo():
```
As explained at the beginning of this document, slow tests get to run on a scheduled basis, rather than in PRs CI
checks. So it's possible that some problems will be missed during a PR submission and get merged. Such problems will
get caught during the next scheduled CI job. But it also means that it's important to run the slow tests on your
machine before submitting the PR.
Here is a rough decision making mechanism for choosing which tests should be marked as slow:
If the test is focused on one of the library's internal components (e.g., modeling files, tokenization files,
pipelines), then we should run that test in the non-slow test suite. If it's focused on an other aspect of the library,
such as the documentation or the examples, then we should run these tests in the slow test suite. And then, to refine
this approach we should have exceptions:
- All tests that need to download a heavy set of weights or a dataset that is larger than ~50MB (e.g., model or
tokenizer integration tests, pipeline integration tests) should be set to slow. If you're adding a new model, you
should create and upload to the hub a tiny version of it (with random weights) for integration tests. This is
discussed in the following paragraphs.
- All tests that need to do a training not specifically optimized to be fast should be set to slow.
- We can introduce exceptions if some of these should-be-non-slow tests are excruciatingly slow, and set them to
`@slow`. Auto-modeling tests, which save and load large files to disk, are a good example of tests that are marked
as `@slow`.
- If a test completes under 1 second on CI (including downloads if any) then it should be a normal test regardless.
Collectively, all the non-slow tests need to cover entirely the different internals, while remaining fast. For example,
a significant coverage can be achieved by testing with specially created tiny models with random weights. Such models
have the very minimal number of layers (e.g., 2), vocab size (e.g., 1000), etc. Then the `@slow` tests can use large
slow models to do qualitative testing. To see the use of these simply look for *tiny* models with:
```bash
grep tiny tests examples
```
Here is a an example of a [script](https://github.com/huggingface/transformers/tree/main/scripts/fsmt/fsmt-make-tiny-model.py) that created the tiny model
[stas/tiny-wmt19-en-de](https://huggingface.co/stas/tiny-wmt19-en-de). You can easily adjust it to your specific
model's architecture.
It's easy to measure the run-time incorrectly if for example there is an overheard of downloading a huge model, but if
you test it locally the downloaded files would be cached and thus the download time not measured. Hence check the
execution speed report in CI logs instead (the output of `pytest --durations=0 tests`).
That report is also useful to find slow outliers that aren't marked as such, or which need to be re-written to be fast.
If you notice that the test suite starts getting slow on CI, the top listing of this report will show the slowest
tests.
### Testing the stdout/stderr output
In order to test functions that write to `stdout` and/or `stderr`, the test can access those streams using the
`pytest`'s [capsys system](https://docs.pytest.org/en/latest/capture.html). Here is how this is accomplished:
```python
import sys
def print_to_stdout(s):
print(s)
def print_to_stderr(s):
sys.stderr.write(s)
def test_result_and_stdout(capsys):
msg = "Hello"
print_to_stdout(msg)
print_to_stderr(msg)
out, err = capsys.readouterr() # consume the captured output streams
# optional: if you want to replay the consumed streams:
sys.stdout.write(out)
sys.stderr.write(err)
# test:
assert msg in out
assert msg in err
```
And, of course, most of the time, `stderr` will come as a part of an exception, so try/except has to be used in such
a case:
```python
def raise_exception(msg):
raise ValueError(msg)
def test_something_exception():
msg = "Not a good value"
error = ""
try:
raise_exception(msg)
except Exception as e:
error = str(e)
assert msg in error, f"{msg} is in the exception:\n{error}"
```
Another approach to capturing stdout is via `contextlib.redirect_stdout`:
```python
from io import StringIO
from contextlib import redirect_stdout
def print_to_stdout(s):
print(s)
def test_result_and_stdout():
msg = "Hello"
buffer = StringIO()
with redirect_stdout(buffer):
print_to_stdout(msg)
out = buffer.getvalue()
# optional: if you want to replay the consumed streams:
sys.stdout.write(out)
# test:
assert msg in out
```
An important potential issue with capturing stdout is that it may contain `\r` characters that in normal `print`
reset everything that has been printed so far. There is no problem with `pytest`, but with `pytest -s` these
characters get included in the buffer, so to be able to have the test run with and without `-s`, you have to make an
extra cleanup to the captured output, using `re.sub(r'~.*\r', '', buf, 0, re.M)`.
But, then we have a helper context manager wrapper to automatically take care of it all, regardless of whether it has
some `\r`'s in it or not, so it's a simple:
```python
from transformers.testing_utils import CaptureStdout
with CaptureStdout() as cs:
function_that_writes_to_stdout()
print(cs.out)
```
Here is a full test example:
```python
from transformers.testing_utils import CaptureStdout
msg = "Secret message\r"
final = "Hello World"
with CaptureStdout() as cs:
print(msg + final)
assert cs.out == final + "\n", f"captured: {cs.out}, expecting {final}"
```
If you'd like to capture `stderr` use the `CaptureStderr` class instead:
```python
from transformers.testing_utils import CaptureStderr
with CaptureStderr() as cs:
function_that_writes_to_stderr()
print(cs.err)
```
If you need to capture both streams at once, use the parent `CaptureStd` class:
```python
from transformers.testing_utils import CaptureStd
with CaptureStd() as cs:
function_that_writes_to_stdout_and_stderr()
print(cs.err, cs.out)
```
Also, to aid debugging test issues, by default these context managers automatically replay the captured streams on exit
from the context.
### Capturing logger stream
If you need to validate the output of a logger, you can use `CaptureLogger`:
```python
from transformers import logging
from transformers.testing_utils import CaptureLogger
msg = "Testing 1, 2, 3"
logging.set_verbosity_info()
logger = logging.get_logger("transformers.models.bart.tokenization_bart")
with CaptureLogger(logger) as cl:
logger.info(msg)
assert cl.out, msg + "\n"
```
### Testing with environment variables
If you want to test the impact of environment variables for a specific test you can use a helper decorator
`transformers.testing_utils.mockenv`
```python
from transformers.testing_utils import mockenv
class HfArgumentParserTest(unittest.TestCase):
@mockenv(TRANSFORMERS_VERBOSITY="error")
def test_env_override(self):
env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None)
```
At times an external program needs to be called, which requires setting `PYTHONPATH` in `os.environ` to include
multiple local paths. A helper class `transformers.test_utils.TestCasePlus` comes to help:
```python
from transformers.testing_utils import TestCasePlus
class EnvExampleTest(TestCasePlus):
def test_external_prog(self):
env = self.get_env()
# now call the external program, passing `env` to it
```
Depending on whether the test file was under the `tests` test suite or `examples` it'll correctly set up
`env[PYTHONPATH]` to include one of these two directories, and also the `src` directory to ensure the testing is
done against the current repo, and finally with whatever `env[PYTHONPATH]` was already set to before the test was
called if anything.
This helper method creates a copy of the `os.environ` object, so the original remains intact.
### Getting reproducible results
In some situations you may want to remove randomness for your tests. To get identical reproducible results set, you
will need to fix the seed:
```python
seed = 42
# python RNG
import random
random.seed(seed)
# pytorch RNGs
import torch
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
# numpy RNG
import numpy as np
np.random.seed(seed)
# tf RNG
tf.random.set_seed(seed)
```
### Debugging tests
To start a debugger at the point of the warning, do this:
```bash
pytest tests/test_logging.py -W error::UserWarning --pdb
```
## Working with github actions workflows
To trigger a self-push workflow CI job, you must:
1. Create a new branch on `transformers` origin (not a fork!).
2. The branch name has to start with either `ci_` or `ci-` (`main` triggers it too, but we can't do PRs on
`main`). It also gets triggered only for specific paths - you can find the up-to-date definition in case it
changed since this document has been written [here](https://github.com/huggingface/transformers/blob/main/.github/workflows/self-push.yml) under *push:*
3. Create a PR from this branch.
4. Then you can see the job appear [here](https://github.com/huggingface/transformers/actions/workflows/self-push.yml). It may not run right away if there
is a backlog.
## Testing Experimental CI Features
Testing CI features can be potentially problematic as it can interfere with the normal CI functioning. Therefore if a
new CI feature is to be added, it should be done as following.
1. Create a new dedicated job that tests what needs to be tested
2. The new job must always succeed so that it gives us a green ✓ (details below).
3. Let it run for some days to see that a variety of different PR types get to run on it (user fork branches,
non-forked branches, branches originating from github.com UI direct file edit, various forced pushes, etc. - there
are so many) while monitoring the experimental job's logs (not the overall job green as it's purposefully always
green)
4. When it's clear that everything is solid, then merge the new changes into existing jobs.
That way experiments on CI functionality itself won't interfere with the normal workflow.
Now how can we make the job always succeed while the new CI feature is being developed?
Some CIs, like TravisCI support ignore-step-failure and will report the overall job as successful, but CircleCI and
Github Actions as of this writing don't support that.
So the following workaround can be used:
1. `set +euo pipefail` at the beginning of the run command to suppress most potential failures in the bash script.
2. the last command must be a success: `echo "done"` or just `true` will do
Here is an example:
```yaml
- run:
name: run CI experiment
command: |
set +euo pipefail
echo "setting run-all-despite-any-errors-mode"
this_command_will_fail
echo "but bash continues to run"
# emulate another failure
false
# but the last command must be a success
echo "during experiment do not remove: reporting success to CI, even if there were failures"
```
For simple commands you could also do:
```bash
cmd_that_may_fail || true
```
Of course, once satisfied with the results, integrate the experimental step or job with the rest of the normal jobs,
while removing `set +euo pipefail` or any other things you may have added to ensure that the experimental job doesn't
interfere with the normal CI functioning.
This whole process would have been much easier if we only could set something like `allow-failure` for the
experimental step, and let it fail without impacting the overall status of PRs. But as mentioned earlier CircleCI and
Github Actions don't support it at the moment.
You can vote for this feature and see where it is at these CI-specific threads:
- [Github Actions:](https://github.com/actions/toolkit/issues/399)
- [CircleCI:](https://ideas.circleci.com/ideas/CCI-I-344)
|