lattice-tool

lattice-tool

NAME

lattice-tool - manipulate word lattices

SYNOPSIS

lattice-tool [-help] option ...

DESCRIPTION

lattice-tool performs operations on word lattices in pfsg-format(5), including size reduction, pruning, null-node removal, weight assignment from language models, and lattice word error computation.

OPTIONS

Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicate stdin/stdout.
-help
Print option summary.
-debug level
Set the debugging output level (0 means no debugging output). Debugging messages are sent to stderr.
-in-lattice file
Read input lattice from file.
-in-lattice2 file
Read additional input lattice (for binary lattice operations) from file.
-in-lattice-list file
Read list of input lattices from file. Lattice operations are applied to each filename listed in file.
-out-lattice file
Write result lattice to file.
-out-lattice-dir dir
Write result lattices from processing of -in-lattice-list to directory dir.
-write-internal
Write output lattices with internal node numbering instead of compact, consecutive numbering.
-overwrite
OVerwrite existing output lattice files.
-vocab file
Initialize the vocabulary to words listed in file. This is useful in conjunction with
-limit-vocab
Discard LM parameters on reading that do not pertain to the words specified in the vocabulary. The default is that words used in the LM are automatically added to the vocabulary. This option can be used to reduce the memory requirements for large LMs; to this end, -vocab option typically specifies the set of words used in the lattices to be processed.
-tolower
Map all vocabulary to lowercase.
-max-time T
Limit processing time per lattice to T seconds.

Options controlling lattice operations:

-operation O
Perform a lattice algebra operation O on the two lattices specified by -in-lattice and -in-lattice2, before any other processing steps. Operations currently supported are concatenate and or, for serial and parallel lattice combination, respectively. This option does not apply when multiple input lattices are processed.
-write-posteriors file
Compute the posteriors of lattice nodes and transitions (using the forward-backward algorithm) and write out a word posterior lattice in wlat-format(5). This and other options based on posterior probabilities make most sense if the input lattice contains combined acoustic-language model weights.
-posterior-prune P
Prune lattice nodes with posterior less P times the highest posterior path.
-posterior-scale S
Scale the transition weights by dividing by S for the purpose of posterior probability computation. If the input weights represent combined acoustic-language model scores then this should be approximately the language model weight of the recognizer in order to avoid overly peaked posteriors (the default value is 8).
-reduce
Reduce lattice size by a single forward node merging pass.
-reduce-iterate I
Reduce lattice size by up to I forward-backward node merging passes.
-overlap-ratio R
Perform approximate lattice reduction by merging nodes that share more than a fraction R of their incoming or outgoing nodes. The default is 0, i.e., only exact lattice reduction is performed.
-overlap-base B
If B is 0 (the default), then the overlap ratio R is taken relative to the smaller set of transitions being compared. If the value is 1, the ratio is relative to the larger of the two sets.
-reduce-before-pruning
Perform lattice reduction before posterior-based pruning. The default order is to first prune, then reduce.
-pre-reduce-iterate I
Perform iterative reduction prior to lattice expansion, but after pause elimination.
-post-reduce-iterate I
Perform iterative reduction after lattice expansion and pause node recovery. Note: this is not recommended as it changes the weights assigned from the specified language model.
-no-pause
Do not recover pauses after lattice expansion.
-compact-pause
Use compact encoding of pause nodes that saves nodes put allows optional pauses where they might not have been included in the original lattice.
-loop-pause
Add self-loops on pause nodes.
-collapse-same-words
Perform an operation on the final lattices that collapses all nodes with the same words, except null nodes, pause nodes, or nodes with noise words. This can reduce the lattice size dramatically, but also introduces new paths.
-connectivity
Check the connectedness of lattices.
-compute-node-entropy
Compute the node entropy of lattices.
-ref-list file
Read reference word strings from file. Each line starts with a sentence ID (the basename of the lattice file name), followed by the words.
-ref-file file
Read reference word strings from file. Lines must contain reference words only, and must be matched to input lattices in the order processed. This and the next option triggers computation of lattice word errors (minimum word error counts of any path through a lattice).
-noise-vocab file
Read a list of "noise" words from file. These words are ignored when computing lattice word errors, or when collapsing nodes with -collapse-same-words.
-split-multiwords
Split lattice nodes with multiwords into a sequence of non-multiword nodes. This option is necessary to compute lattice error of multiword lattices against non-multiword references, but may be useful in its own right.

The following options control transition weight assignment:

-order n
Set the maximal N-gram order to be used for transition weight assignment (the default is 3). This also selects the lattice expansion algorithm to be used. For unigrams, the original lattice structure is preserved (unless modified by other operations). For bigram weights, NULL and pause nodes are eliminated, transition weights reassigned, and then pause nodes are restored. For trigram weights, additional nodes are inserted to create unique trigrams histories (lattice expansion).
-lm file
Read N-gram language model from file. This option also triggers weight reassignment and lattice expansion.
-multiwords
Resolve multiwords in the lattice without splitting nodes. This is useful in rescoring lattices containing multiwords with a LM does not use multiwords.
-classes file
Interpret the LM as an N-gram over word classes. The expansions of the classes are given in file in classes-format(5). Tokens in the LM that are not defined as classes in file are assumed to be plain words, so that the LM can contain mixed N-grams over both words and word classes.
-simple-classes
Assume a "simple" class model: each word is member of at most one word class, and class expansions are exactly one word long.
-mix-lm file
Read a second N-gram model for interpolation purposes. The second and any additional interpolated models can also be class N-grams (using the same -classes definitions).
-lambda weight
Set the weight of the main model when interpolating with -mix-lm. Default value is 0.5.
-mix-lm2 file
-mix-lm3 file
-mix-lm4 file
-mix-lm5 file
-mix-lm6 file
-mix-lm7 file
-mix-lm8 file
-mix-lm9 file
Up to 9 more N-gram models can be specified for interpolation.
-mix-lambda2 weight
-mix-lambda3 weight
-mix-lambda4 weight
-mix-lambda5 weight
-mix-lambda6 weight
-mix-lambda7 weight
-mix-lambda8 weight
-mix-lambda9 weight
These are the weights for the additional mixture components, corresponding to -mix-lm2 through -mix-lm9. The weight for the -mix-lm model is 1 minus the sum of -lambda and -mix-lambda2 through -mix-lambda9.
-compact-expansion
Use the compact trigram expansion algorithm that uses backoff nodes (see paper reference below). This algorithm only applies to simple word-based N-gram models, and cannot deal with interpolated or class-based models, or interact with the -multiword option.
-max-nodes M
Abort lattices exansion when the number of nodes exceeds M. This is another mechanism to avoid spending too much time on very large lattices.

SEE ALSO

ngram(1), pfsg-scripts(1), pfsg-format(5), ngram-format(5), classes-format(5).
F. Weng, A. Stolcke, and A. Sankar, ``Efficient Lattice Representation and Generation.'' Proc. Intl. Conf. on Spoken Language Processing, vol. 6, pp. 2531-2534, Sydney, 1998.

BUGS

Not all LM types supported by ngram(1) are handled by lattice-tool.

AUTHOR

Fuliang Weng <fuliang@speech.sri.com>
Andreas Stolcke <stolcke@speech.sri.com>
Copyright 1997-2002 SRI International