Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0058.txt

jordyvl

First commit

e0a78f5 6 months ago

raw

history blame

No virus

2.21 kB

	26

	FUNDAMENTALS

	the true class. This measure more heavily penalizes sharp probabilities,
	which are close to the wrong edge or class by over/under-confidence.
	`NLL (f ) = −

	N K
	1 XX
	I [yi = k] · log (fk (xi ))
	N i=1

	(2.10)

	k=1

	• Brier Score [50] is a scoring rule that measures the accuracy of a
	probabilistic classifier and is related to the mean-squared error (MSE) loss
	function. Brier score is more commonly used in industrial practice since it
	is an λ2 metric (score between 0 and 1), yet it penalizes tail probabilities
	less severely than NLL.
	`BS (f ) =

	N K
	1 XX
	2
	(I (yi = k) − fk (xi ))
	N i=1

	(2.11)

	k=1

	All metrics following require a CSF g(x) to be defined, and can pertain to
	specific evaluation settings [389] tested in Section 3.4.5.
	Expected Calibration Error (ECE) [156, 332] is a default metric to evaluate
	top-1 prediction miscalibration. A calibration estimator (Definition 7) measures
	the Lp norm difference between a model’s posterior and the true likelihood of
	being correct.
	Definition 7 (Lp Calibration Error). [231, 463]
	The Lp calibration error of f : X → ∆Y over the joint distribution (X × Y )
	with the Lp norm p ∈ [1, ∞) is given by:


	CEp (f )p = E(X,Y ) kE[Y \| f (X)] − f (X)kpp
	(2.12)
	The popular ECE metric [332] with condition I[Y = ŷ] is a special case of the
	above with p = 1, where the expectation is approximated using a histogram.
	MaxCE defines the worst-case risk version with p = ∞, effectively reporting on
	the bin with the highest error. As part of Chapter 5, we contributed a novel
	empirical estimator of top-1 calibration for the task of VQA, where the exact
	accuracy condition I[Y = ŷ] in ECEis replaced by I[ANLS(y, ŷ) > τ ]. Prior
	work [329] used a similar strategy of thresholding continuous quality scores to
	be able to estimate ECE.
	In practice, ECE is implemented as a histogram binning estimator that
	discretizes predicted probabilities into ranges of possible values for which
	conditional expectation can be estimated. Concretely, the probability space
	is partitioned into B bins bi with i ∈ {1, ..., B}, where for each bin bi the gap
	between observed accuracy and bin confidence P¯b is measured, with a final