nbest-optimize

nbest-optimize

NAME

nbest-optimize - optimize score combination for N-best word error minimization

SYNOPSIS

nbest-optimize [-help] option ... [ scoredir ... ]

DESCRIPTION

nbest-optimize reads a set of N-best lists, additional score files, and corresponding reference transcripts and optimizes the score combination weights so as to minimize the word error of a classifier that performs word-level posterior probability maximization. The optimized weights are meant to be used with nbest-lattice(1) and the -use-mesh option, or the nbest-rover script (see nbest-scripts(1)). nbest-optimize determines both the best relative weighting of knowledge source scores and the optimal -posterior-scale parameter that controls the peakedness of the posterior distribution.

The optimization is performed by gradient descent on a smoothed (sigmoidal) approximation of the true 0/1 word error function (Katagiri et al. 1990). Therefore, the result can only be expected to be a local minimum of the error surface. (A more global search can be attempted by specifying different starting points.) Another approximation is that the error function is computed assuming a fixed multiple alignment of all N-best hypotheses and the reference string, which tends to slightly overestimate the true pairwise error between any single hypothesis and the reference.

An alternative search strategy uses a simplex-based "Amoeba" search on the (non-smoothed) word error function (Press et al. 1988). The search is restarted multiple times to avoid local minima.

Alternatively, nbest-optimize can also optimize weights for a standard, 1-best hypothesis rescoring that selects entire (sentence) hypotheses (-1best option). In this mode sentence-level error counts may be read from external files, or computed on the fly from the reference strings. The weights obtained are meant to be used for N-best list rescoring with rescore-reweight (see nbest-scripts(1)).

Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicate stdin/stdout.

OPTIONS

-help
Print option summary.
-debug level
Controls the amount of output (the higher the level, the more). At level 1, error statistics at each iteration are printed. At level 2, word alignments are printed. At level 3, full score matrix is printed. At level 4, detailed information about word hypothesis ranking is printed for each training iteration and sample.
-nbest-files file-list
Specifies the set of N-best files as a list of filenames. Three sets of standard scores are extracted from the N-best files: the acoustic model score, the language model score, and the number of words (for insertion penalty computation). See nbest-format(5) for details.
-refs references
Specifies the reference transcripts. Each line in references must contain the sentence ID (the last component in the N-best filename path, minus any suffixes) followed by zero or more reference words.
-1best
Select optimization for standard sentence-level hypothesis selection.
-errors dir
In 1-best mode, optimize for error counts that are stored in separate files in directory dir. Each N-best list must have a matching error counts file of the same basename in dir. Each file contains 7 columns of numbers in the format
wcr wer nerr nsub ndel nins nw
Only the 3rd column (number of errors) and the last column (number of reference words) is used by the program.
If this option is omitted, errors will be computed from the N-best hypotheses and the reference transcripts.
-max-nbest n
Limits the number of hypotheses read from each N-best list to the first n.
-rescore-lmw lmw
Sets the language model weight used in combining the language model log probabilities with acoustic log probabilities. This is used to compute initial aggregate hypotheses scores.
-rescore-wtw wtw
Sets the word transition weight used to weight the number of words relative to the acoustic log probabilities. This is used to compute initial aggregate hypotheses scores.
-posterior-scale scale
Initial value for scaling log posteriors. The total weighted log score is divided by scale when computing normalized posterior probabilities. This controls the peakedness of the posterior distribution. The default value is whatever was chosen for -rescore-lmw, so that language model scores are scaled to have weight 1, and acoustic scores have weight 1/lmw.
-vocab file
Read the N-best list vocabulary from file. This option is mostly redundant since words found in the N-best input are implicitly added to the vocabulary.
-tolower
Map vocabulary to lowercase, eliminating case distinctions.
-multiwords
Split multiwords (words joined by '_') into their components when reading N-best lists.
-no-reorder
Do not reorder the hypotheses for alignment, and start the alignment with the reference words. The default is to first align hypotheses by order of decreasing scores (according to the initial score weighting) and then the reference, which is more compatible with how nbest-lattice(1) operates.
-noise noise-tag
Designate noise-tag as a vocabulary item that is to be ignored in aligning hypotheses with each other (the same as the -pau- word). This is typically used to identify a noise marker.
-noise-vocab file
Read several noise tags from file, instead of, or in addition to, the single noise tag specified by -noise.
-init-lambdas 'w1 w2 ...'
Initialize the score weights to the values specified (zeros are filled in for missing values). The default is to set the initial acoustic model weight to 1, the language model weight from -rescore-lmw, the word transition weight from -rescore-wtw, and all remaining weights to zero initially. Prefixing a value with an equal sign (`=') holds the value constant during optimization. (All values should be enclosed in quotes to form a single command-line argument.)
Hypotheses are aligned using the initial weights; thus, it makes sense to reoptimize with initial weights from a previous optimization in order to obtain alignments closer to the optimimum.
-alpha a
Controls the error function smoothness; the sigmoid slope parameter is set to a.
-epsilon e
The step-size used in gradient descent (the multiple of the gradient vector).
-min-loss x
Sets the loss function for a sample effectively to zero when its value falls below x.
-max-delta d
Ignores the contribution of a sample to the gradient if the derivative exceeds d. This helps avoid numerical problems.
-maxiters m
Stops optimization after m iterations. In Amoeba search, this limits the total number of points in the parameter space that are evaluated.
-max-bad-iters n
Stops optimization after n iterations during which the actual (non-smoothed) error has not decreased.
-max-amoeba-restarts r
Perform only up to r repeated Amoeba searches. The default is to search until D searches give the same results, where D is the dimensionality of the problem.
-epsilon-stepdown s
-min-epsilon m
If s is a value greater than zero, the learning rate will be multiplied by s every time the error does not decrease after a number of iterations specified by -max-bad-iters. Training stops when the learning rate falls below m in this manner.
-converge x
Stops optimization when the (smoothed) loss function changes relatively by less than x from one iteration to the next.
-quickprop
Use the approximate second-order method known as "QuickProp" (Fahlman 1989).
-init-amoeba-simplex 's1 s2 ...'
Defines the step size for the initial Amoeba simplex. One value for each non-fixed search dimension should be specified, plus optionally a value for the posterior scaling parameter (which is searched as an added dimension).
-print-hyps file
Write the best word hypotheses to file after optimization.
--
Signals the end of options, such that following command-line arguments are interpreted as additional scorefiles even if they start with `-'.
scoredir...
Any additional arguments name directories containing further score files. In each directory, there must exist one file named after the sentence ID it corresponds to (the file may also end in ``.gz'' and contain compressed data). The total number of score dimensions is thus 3 (for the standard scores from the N-best list) plus the number of additional score directories specified.

SEE ALSO

nbest-lattice(1), nbest-scripts(1), nbest-format(5).
S. Katagiri, C.H. Lee, & B.-H. Juang, "A Generalized Probabilistic Descent Method", in Proceedings of the Acoustical Society of Japan, Fall Meeting, pp. 141-142, 1990.
S. E. Fahlman, "Faster-Learning Variations on Back-Propagation: An Empirical Study", in D. Touretzky, G. Hinton, & T. Sejnowski (eds.), Proceedings of the 1988 Connectionist Models Summer School, pp. 38-51, Morgan Kaufmann, 1989.
W. H. Press, B. P. Flannery, S. A. Teukolsky, & W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1988.

BUGS

Likely. Gradient-based optimization is not supported (yet) in 1-best mode; use simplex-search optimization instead.

AUTHORS

Andreas Stolcke <stolcke@speech.sri.com>
Dimitra Vergyri <dverg@speech.sri.com>
Copyright 2000, 2001 SRI International