nbest-optimize
nbest-optimize
NAME
nbest-optimize - optimize score combination for N-best word error minimization
SYNOPSIS
nbest-optimize
[-help]
option
...
[
scoredir
...
]
DESCRIPTION
nbest-optimize
reads a set of N-best lists, additional score files, and corresponding
reference transcripts and optimizes the score combination weights
so as to minimize the word error of a classifier that performs
word-level posterior probability maximization.
The optimized weights are meant to be used with
nbest-lattice(1)
and the
-use-mesh
option.
nbest-optimize
determines both the best relative weighting of knowledge source scores
and the optimal
-posterior-scale
parameter that controls the peakedness of the posterior distribution.
The optimization is performed by gradient descent on a smoothed (sigmoidal)
approximation of the true 0/1 word error function (Katagiri et al. 1990).
Therefore, the result can only be expected to be a
local
minimum of the error surface.
(A more global search can be attempted by specifying different starting
points.)
Another approximation is that the error function is computed assuming a fixed
multiple alignment of all N-best hypotheses and the reference string,
which tends to slightly overestimate the true pairwise error between any
single hypothesis and the reference.
Each filename argument can be an ASCII file, or a
compressed file (name ending in .Z or .gz), or ``-'' to indicate
stdin/stdout.
OPTIONS
- -help
-
Print option summary.
- -debug level
-
Controls the amount of output (the higher the
level,
the more).
At level 1, error statistics at each iteration are printed.
At level 2, word alignments are printed.
At level 3, full score matrix is printed.
At level 4, detailed information about word hypothesis ranking is printed
for each training iteration and sample.
- -nbest-files file-list
-
Specifies the set of N-best files as a list of filenames.
Three sets of standard scores are extracted from the N-best files:
the acoustic model score, the language model score, and the number of
words (for insertion penalty computation).
See
nbest-format(5)
for details.
- -refs references
-
Specifies the reference transcripts.
Each line in
references
must contain the sentence ID (the last component in the N-best filename
path, minus any suffixes) followed by zero or more reference words.
- -max-nbest n
-
Limits the number of hypotheses read from each N-best list to the first
n.
- -rescore-lmw lmw
-
Sets the language model weight used in combining the language model log
probabilities with acoustic log probabilities.
This is used to compute initial aggregate hypotheses scores.
- -rescore-wtw wtw
-
Sets the word transition weight used to weight the number of words relative to
the acoustic log probabilities.
This is used to compute initial aggregate hypotheses scores.
- -posterior-scale scale
-
Initial value for scaling log posteriors.
The total weighted log score is divided by
scale
when computing normalized posterior probabilities.
This controls the peakedness of the posterior distribution.
The default value is whatever was chosen for
-rescore-lmw,
so that language model scores are scaled to have weight 1,
and acoustic scores have weight 1/lmw.
- -vocab file
-
Read the N-best list vocabulary from
file.
This option is mostly redundant since words found in the N-best input
are implicitly added to the vocabulary.
- -tolower
-
Map vocabulary to lowercase, eliminating case distinctions.
- -multiwords
-
Split multiwords (words joined by '_') into their components when reading
N-best lists.
- -noise noise-tag
-
Designate
noise-tag
as a vocabulary item that is to be ignored in aligning hypotheses with
each other (the same as the -pau- word).
This is typically used to identify a noise marker.
- -noise-vocab file
-
Read several noise tags from
file,
instead of, or in addition to, the single noise tag specified by
-noise.
- -init-lambdas 'w1 w2 ...'
-
Initialize the score weights to the values specified
(zeros are filled in for missing values).
The default is to set the initial acoustic model weight to 1,
the language model weight from
-rescore-lmw,
the word transition weight from
-rescore-wtw,
and all remaining weights to zero initially.
Prefixing a value with an equal sign (`=')
holds the value constant during optimization.
(All values should be enclosed in quotes to form a single command-line
argument.)
- -alpha a
-
Controls the error function smoothness;
the sigmoid slope parameter is set to
a.
- -epsilon e
-
The step-size used in gradient descent (the multiple of the gradient vector).
- -min-loss x
-
Sets the loss function for a sample effectively to zero when its value falls
below
x.
- -max-delta d
-
Ignores the contribution of a sample to the gradient if the derivative
exceeds
d.
This helps avoid numerical problems.
- -max-bad-iters n
-
Stops optimization after
n
iterations during which the actual (non-smoothed) error has not decreased.
- -epsilon-stepdown s
-
- -min-epsilon m
-
If
s
is a value greater than zero, the learning rate will be multiplied by
s
every time the error does not decrease after a number of iterations
specified by
-max-bad-iters.
Training stops when the learning rate falls below
m
in this manner.
- -converge x
-
Stops optimization when the smoothed loss function changes relatively by less
than
x
from one iteration to the next.
- -quickprop
-
Use the approximate second-order method known as "QuickProp" (Fahlman 1989).
- -print-hyps file
-
Write the best word hypotheses to
file
after optimization.
- --
-
Signals the end of options, such that following command-line arguments are
interpreted as additional scorefiles even if they start with `-'.
- scoredir...
-
Any additional arguments name directories containing further score files.
In each directory, there must exist one file named after the sentence
ID it corresponds to (the file may also end in ``.gz'' and contain compressed
data).
The total number of score dimensions is thus 3 (for the standard scores from
the N-best list) plus the number of additional score directories specified.
SEE ALSO
ngram-lattice(1), nbest-scripts(1), nbest-format(5).
S. Katagiri, C.H. Lee, & B.-H. Juang, "A Generalized Probabilistic Descent
Method", in
Proceedings of the Acoustical Society of Japan, Fall Meeting,
pp. 141-142, 1990.
S. E. Fahlman, "Faster-Learning Variations on Back-Propagation: An
Empirical Study", in D. Touretzky, G. Hinton, & T. Sejnowski (eds.),
Proceedings of the 1988 Connectionist Models Summer School, pp. 38-51,
Morgan Kaufmann, 1989.
BUGS
Likely.
AUTHOR
Andreas Stolcke <stolcke@speech.sri.com>.
Copyright 2000 SRI International