24 FUNDAMENTALS I. Maximum softmax probability (MSP): g(x) = maxy0 ∈Y fy0 (x) II. Maximum logit: g(x) = maxy0 ∈Y zy0 (x), with logits z ∈ RK P III. Negative entropy: g(x) = − y0 ∈Y fy0 (x) log fy0 (x) IV. Margin: g(x) = maxy0 ∈Y fy0 (x) − maxy00 ∈Y\y0 fy00 (x) V. Distance-based measures • kNN distance: A 1D outlier score derived from the average distance of the feature representation of x to its k nearest neighbors in the training distribution • Mahalanobis distance [390]: The minimum distance of the feature map (e.g., penultimate layer activations) of a test input to classconditional Gaussian distributions of the training data. VI. Bayesian uncertainty estimation Chapter 3 used MSP and negative entropy as CSFs, next to various PUQ methods for Bayesian uncertainty estimation. Other chapters used MSP as it is the most common CSF in practice, requiring only logits as input. From the use of CSFs also follows the need to evaluate their statistical quality next to task-specific predictive performance metrics, which is discussed next. 2.2.3 Evaluation Metrics In an ideal world, the evaluation metric of interest would be the same as the loss function used for training, yet this is rarely the case in practice, as the gradientbased optimization process requires a continuously differentiable function, while the metric of interest is often non-differentiable, e.g., accuracy vs. cross-entropy in classification. Throughout our works, we have used (or extended) multiple predictive performance, calibration, and robustness metrics, of which the most interesting are respectively outlined. Average Normalized Levenshtein Similarity (ANLS) is a metric introduced in [39] for the evaluation of VQA, which was then extended [449] to support lists and be invariant to the order of provided answers. We adapted the underlying Levenshtein Distance (LD) metric [251] to support not-answerable questions, NA(G) = I[type(G) = not-answerable ] (see Equation (2.7)).