lattice-tool
performs operations on word lattices in
pfsg-format(5),
including size reduction, pruning, null-node removal, weight assignment from
language models, and lattice word error computation.
OPTIONS
Each filename argument can be an ASCII file, or a
compressed file (name ending in .Z or .gz), or ``-'' to indicate
stdin/stdout.
-help
Print option summary.
-debug level
Set the debugging output level (0 means no debugging output).
Debugging messages are sent to stderr.
-in-lattice file
Read input lattice from
file.
-in-lattice2 file
Read additional input lattice (for binary lattice operations) from
file.
-in-lattice-list file
Read list of input lattices from
file.
Lattice operations are applied to each filename listed in
file.
-out-lattice file
Write result lattice to
file.
-out-lattice-dir dir
Write result lattices from processing of
-in-lattice-list
to directory
dir.
-write-internal
Write output lattices with internal node numbering instead of compact,
consecutive numbering.
-overwrite
OVerwrite existing output lattice files.
-vocab file
Initialize the vocabulary to words listed in
file.
This is useful in conjunction with
-limit-vocab
Discard LM parameters on reading that do not pertain to the words
specified in the vocabulary.
The default is that words used in the LM are automatically added to the
vocabulary.
This option can be used to reduce the memory requirements for large LMs;
to this end,
-vocab
option typically specifies the set of words used in the lattices to be
processed.
-tolower
Map all vocabulary to lowercase.
-max-time T
Limit processing time per lattice to
T
seconds.
Options controlling lattice operations:
-operation O
Perform a lattice algebra operation
O
on the two lattices specified by
-in-lattice
and
-in-lattice2,
before any other processing steps.
Operations currently supported are
concatenate
and
or,
for serial and parallel lattice combination, respectively.
This option does not apply when multiple input lattices are processed.
-write-posteriors file
Compute the posteriors of lattice nodes and transitions (using the
forward-backward algorithm) and write out a word posterior lattice
in
wlat-format(5).
This and other options based on posterior probabilities make most sense
if the input lattice contains combined acoustic-language model weights.
-posterior-prune P
Prune lattice nodes with posterior less
P
times the highest posterior path.
-posterior-scale S
Scale the transition weights by dividing by
S
for the purpose of posterior probability computation.
If the input weights represent combined acoustic-language model scores
then this should be approximately the language model weight of the
recognizer in order to avoid overly peaked posteriors (the default value is 8).
-reduce
Reduce lattice size by a single forward node merging pass.
-reduce-iterate I
Reduce lattice size by up to
I
forward-backward node merging passes.
-overlap-ratio R
Perform approximate lattice reduction by merging nodes that share
more than a fraction
R
of their incoming or outgoing nodes.
The default is 0, i.e., only exact lattice reduction is performed.
-overlap-base B
If
B
is 0 (the default), then the overlap ratio
R
is taken relative to the smaller set of transitions being compared.
If the value is 1, the ratio is relative to the larger of the two sets.
-reduce-before-pruning
Perform lattice reduction before posterior-based pruning.
The default order is to first prune, then reduce.
-pre-reduce-iterate I
Perform iterative reduction prior to lattice expansion, but after
pause elimination.
-post-reduce-iterate I
Perform iterative reduction after lattice expansion and pause node recovery.
Note: this is not recommended as it changes the weights assigned from
the specified language model.
-no-pause
Do not recover pauses after lattice expansion.
-compact-pause
Use compact encoding of pause nodes that saves nodes put allows
optional pauses where they might not have been included in the original
lattice.
-loop-pause
Add self-loops on pause nodes.
-collapse-same-words
Perform an operation on the final lattices that collapses all nodes
with the same words, except null nodes, pause nodes, or nodes with
noise words.
This can reduce the lattice size dramatically, but also introduces new
paths.
-connectivity
Check the connectedness of lattices.
-compute-node-entropy
Compute the node entropy of lattices.
-ref-list file
Read reference word strings from
file.
Each line starts with a sentence ID (the basename of the lattice file name),
followed by the words.
-ref-file file
Read reference word strings from
file.
Lines must contain reference words only, and must be matched to input
lattices in the order processed.
This and the next option triggers computation of lattice word errors
(minimum word error counts of any path through a lattice).
-noise-vocab file
Read a list of "noise" words from
file.
These words are ignored when computing lattice word errors,
or when collapsing nodes with
-collapse-same-words.
-split-multiwords
Split lattice nodes with multiwords into a sequence of non-multiword
nodes.
This option is necessary to compute lattice error of multiword lattices
against non-multiword references, but may be useful in its own right.
The following options control transition weight assignment:
-order n
Set the maximal N-gram order to be used for transition weight assignment
(the default is 3).
This also selects the lattice expansion algorithm to be used.
For unigrams, the original lattice structure is preserved (unless modified
by other operations).
For bigram weights, NULL and pause nodes are eliminated, transition weights
reassigned, and then pause nodes are restored.
For trigram weights, additional nodes are inserted to create unique
trigrams histories (lattice expansion).
-lm file
Read N-gram language model from
file.
This option also triggers weight reassignment and lattice expansion.
-multiwords
Resolve multiwords in the lattice without splitting nodes.
This is useful in rescoring lattices containing multiwords with a
LM does not use multiwords.
-classes file
Interpret the LM as an N-gram over word classes.
The expansions of the classes are given in
file
in
classes-format(5).
Tokens in the LM that are not defined as classes in
file
are assumed to be plain words, so that the LM can contain mixed N-grams over
both words and word classes.
-simple-classes
Assume a "simple" class model: each word is member of at most one word class,
and class expansions are exactly one word long.
-mix-lm file
Read a second N-gram model for interpolation purposes.
The second and any additional interpolated models can also be class N-grams
(using the same
-classes
definitions).
-lambda weight
Set the weight of the main model when interpolating with
-mix-lm.
Default value is 0.5.
-mix-lm2 file
-mix-lm3 file
-mix-lm4 file
-mix-lm5 file
-mix-lm6 file
-mix-lm7 file
-mix-lm8 file
-mix-lm9 file
Up to 9 more N-gram models can be specified for interpolation.
-mix-lambda2 weight
-mix-lambda3 weight
-mix-lambda4 weight
-mix-lambda5 weight
-mix-lambda6 weight
-mix-lambda7 weight
-mix-lambda8 weight
-mix-lambda9 weight
These are the weights for the additional mixture components, corresponding
to
-mix-lm2
through
-mix-lm9.
The weight for the
-mix-lm
model is 1 minus the sum of
-lambda
and
-mix-lambda2
through
-mix-lambda9.
-compact-expansion
Use the compact trigram expansion algorithm that uses backoff nodes
(see paper reference below).
This algorithm only applies to simple word-based N-gram models, and
cannot deal with interpolated or class-based models, or interact with
the
-multiword
option.
-max-nodes M
Abort lattices exansion when the number of nodes exceeds
M.
This is another mechanism to avoid spending too much time on very large
lattices.