Spaces:

jordyvl
/

ask_my_thesis

Paused

App Files Files Community

ask_my_thesis / assets /txts /pg_0057.txt

jordyvl

First commit

e0a78f5 7 months ago

raw

history blame

1.89 kB

	RELIABILITY AND ROBUSTNESS

	25

	Consider for simplicity, the evaluation of a single non-list ground truth answer
	G and prediction P̂ , each with string lengths \|G\| and \|P̂ \|, respectively.
	
	1 if NA(G) ∧ \|P̂ \| > 0,
	
	
	
	
	
	0 if NA(G) ∧ \|P̂ \| = 0,
	
	
	
	
	 \|G\| if \|P̂ \| = 0,
	LD(G, P̂ ) =
	LD(tail(G), tail(P̂ )) if G[0] = P̂ [0],
	
	
	
	
	if G[0] 6= P̂ [0] (deletion),
	 LD(tail(G), P̂ )
	
	
	
	
	1 + min
	LD(G, tail(P̂ ))
	if G[0] 6= P̂ [0] (insertion),
	
	
	
	
	LD(tail(G), tail(P̂ )) if G[0] 6= P̂ [0] (substitution)
	(2.7)
	Each of the conditions is tested in turn, and the first one that is true is executed.
	The normalized similarity metric is then defined as
	NLS(G, P̂ ) =

	1 − LD(G, P̂ )
	max(1, \|G\|, \|P̂ \|)

	.

	Given multiple ground truth answer variants G = {a1 , a2 , ...} and a predicted
	answer for P̂Qi for each question Q in the test set of size N , we define the
	complete metric as follows:
	N


	1 X
	ANLS =
	max s a, P̂Qi
	N i=1 a∈Gi





	s a, P̂Qi =

	


	 NLS a, P̂Q
	i
	 0



	if NLS a, P̂Qi > τ


	,
	if NLS a, P̂Qi < τ

	(2.8)

	(2.9)

	where we follow prior literature [39, 449] in setting the threshold τ = 0.5.
	In the case of a list-type question, Hungarian matching is performed following
	[449] according to NLS between each ground truth answer part and each
	prediction answer part.
	Proper scoring rules [330] are used for generic evaluation of predictive
	performance, which calculate scoring at the instance-level while measuring both
	the quality of the predictive function and predicted probability distribution (as
	they are not compatible with an arbitrary CSF):
	• Negative Log Likelihood (NLL) [378] is both a popular loss function
	(cross-entropy) and scoring rule which only penalizes (wrong) log
	probabilities qi given to the true class, with I an indicator function defining