24

FUNDAMENTALS

I. Maximum softmax probability (MSP): g(x) = maxy0 ∈Y fy0 (x)
II. Maximum logit: g(x) = maxy0 ∈Y zy0 (x), with logits z ∈ RK
P
III. Negative entropy: g(x) = − y0 ∈Y fy0 (x) log fy0 (x)
IV. Margin: g(x) = maxy0 ∈Y fy0 (x) − maxy00 ∈Y\y0 fy00 (x)
V. Distance-based measures
• kNN distance: A 1D outlier score derived from the average distance
of the feature representation of x to its k nearest neighbors in the
training distribution
• Mahalanobis distance [390]: The minimum distance of the feature
map (e.g., penultimate layer activations) of a test input to classconditional Gaussian distributions of the training data.
VI. Bayesian uncertainty estimation
Chapter 3 used MSP and negative entropy as CSFs, next to various PUQ
methods for Bayesian uncertainty estimation. Other chapters used MSP as it
is the most common CSF in practice, requiring only logits as input. From the
use of CSFs also follows the need to evaluate their statistical quality next to
task-specific predictive performance metrics, which is discussed next.

2.2.3

Evaluation Metrics

In an ideal world, the evaluation metric of interest would be the same as the loss
function used for training, yet this is rarely the case in practice, as the gradientbased optimization process requires a continuously differentiable function, while
the metric of interest is often non-differentiable, e.g., accuracy vs. cross-entropy
in classification.
Throughout our works, we have used (or extended) multiple predictive
performance, calibration, and robustness metrics, of which the most interesting
are respectively outlined.
Average Normalized Levenshtein Similarity (ANLS) is a metric introduced in [39] for the evaluation of VQA, which was then extended [449] to
support lists and be invariant to the order of provided answers. We adapted the
underlying Levenshtein Distance (LD) metric [251] to support not-answerable
questions, NA(G) = I[type(G) = not-answerable ] (see Equation (2.7)).