Utilities for TrainerÂ¶
This page lists all the utility functions used by Trainer
.
Most of those are only useful if you are studying the code of the Trainer in the library.
UtilitiesÂ¶

class
transformers.
EvalPrediction
(predictions: Union[numpy.ndarray, Tuple[numpy.ndarray]], label_ids: numpy.ndarray)[source]Â¶ Evaluation output (always contains labels), to be used to compute metrics.
 Parameters
predictions (
np.ndarray
) â€“ Predictions of the model.label_ids (
np.ndarray
) â€“ Targets to be matched.
Callbacks internalsÂ¶
Distributed EvaluationÂ¶

class
transformers.trainer_pt_utils.
DistributedTensorGatherer
(world_size, num_samples, make_multiple_of=None, padding_index= 100)[source]Â¶ A class responsible for properly gathering tensors (or nested list/tuple of tensors) on the CPU by chunks.
If our dataset has 16 samples with a batch size of 2 on 3 processes and we gather then transfer on CPU at every step, our sampler will generate the following indices:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1]
to get something of size a multiple of 3 (so that each process gets the same dataset length). Then process 0, 1 and 2 will be responsible of making predictions for the following samples:
P0:
[0, 1, 2, 3, 4, 5]
P1:
[6, 7, 8, 9, 10, 11]
P2:
[12, 13, 14, 15, 0, 1]
The first batch treated on each process will be
P0:
[0, 1]
P1:
[6, 7]
P2:
[12, 13]
So if we gather at the end of the first batch, we will get a tensor (nested list/tuple of tensor) corresponding to the following indices:
[0, 1, 6, 7, 12, 13]
If we directly concatenate our results without taking any precautions, the user will then get the predictions for the indices in this order at the end of the prediction loop:
[0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5, 10, 11, 0, 1]
For some reason, thatâ€™s not going to roll their boat. This class is there to solve that problem.
 Parameters
world_size (
int
) â€“ The number of processes used in the distributed training.num_samples (
int
) â€“ The number of samples in our dataset.make_multiple_of (
int
, optional) â€“ If passed, the class assumes the datasets passed to each process are made to be a multiple of this argument (by adding samples).padding_index (
int
, optional, defaults to 100) â€“ The padding index to use if the arrays donâ€™t all have the same sequence length.