Utilities for Trainer¶
This page lists all the utility functions used by Trainer
.
Most of those are only useful if you are studying the code of the Trainer in the library.
Utilities¶
-
class
transformers.
EvalPrediction
(predictions: Union[numpy.ndarray, Tuple[numpy.ndarray]], label_ids: numpy.ndarray)[source]¶ Evaluation output (always contains labels), to be used to compute metrics.
- Parameters
predictions (
np.ndarray
) – Predictions of the model.label_ids (
np.ndarray
) – Targets to be matched.
Callbacks internals¶
Distributed Evaluation¶
-
class
transformers.trainer_pt_utils.
DistributedTensorGatherer
(world_size, num_samples, make_multiple_of=None, padding_index=- 100)[source]¶ A class responsible for properly gathering tensors (or nested list/tuple of tensors) on the CPU by chunks.
If our dataset has 16 samples with a batch size of 2 on 3 processes and we gather then transfer on CPU at every step, our sampler will generate the following indices:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1]
to get something of size a multiple of 3 (so that each process gets the same dataset length). Then process 0, 1 and 2 will be responsible of making predictions for the following samples:
P0:
[0, 1, 2, 3, 4, 5]
P1:
[6, 7, 8, 9, 10, 11]
P2:
[12, 13, 14, 15, 0, 1]
The first batch treated on each process will be
P0:
[0, 1]
P1:
[6, 7]
P2:
[12, 13]
So if we gather at the end of the first batch, we will get a tensor (nested list/tuple of tensor) corresponding to the following indices:
[0, 1, 6, 7, 12, 13]
If we directly concatenate our results without taking any precautions, the user will then get the predictions for the indices in this order at the end of the prediction loop:
[0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5, 10, 11, 0, 1]
For some reason, that’s not going to roll their boat. This class is there to solve that problem.
- Parameters
world_size (
int
) – The number of processes used in the distributed training.num_samples (
int
) – The number of samples in our dataset.make_multiple_of (
int
, optional) – If passed, the class assumes the datasets passed to each process are made to be a multiple of this argument (by adding samples).padding_index (
int
, optional, defaults to -100) – The padding index to use if the arrays don’t all have the same sequence length.
Distributed Evaluation¶
-
class
transformers.
HfArgumentParser
(dataclass_types: Union[NewType.<locals>.new_type, Iterable[NewType.<locals>.new_type]], **kwargs)[source]¶ This subclass of argparse.ArgumentParser uses type hints on dataclasses to generate arguments.
The class is designed to play well with the native argparse. In particular, you can add more (non-dataclass backed) arguments to the parser after initialization and you’ll get the output back after parsing as an additional namespace.