The first format consists of the header
NBestList1.0
followed by one or more lines of the form
(score) w1 w2 w3 ...
where
score
is a composite acoustic/language model score
from the recognizer, on the bytelog scale.
(A bytelog is a logarithm to base 1.0001, divided by 1024 and
rounded to an integer.)
This format is output by the SRI Decipher(TM) recognizer, as well as
by the
ngram(1)
option
-nbest.
The second Decipher(TM) format is an extension of the first format
that encodes word-level scores and time alignments.
It is marked by a header of the form
NBestList2.0
The hypotheses are in the format
(score) w1 ( st: st1 et: et1 g: g1 a: a1 ) w2 ...
where words are followed by start and end times, language model and
acoustic scores (bytelog-scaled), respectively.
The third format understood by SRILM lists
hypotheses in the format
ascore lscore nwords w1 w2 w3 ...
where the first three columns contain the
acoustic model log probability, the language model log probability,
and the number of words in the hypothesis string, respectively.
All scores are logarithms base 10.
(This format must not be preceded by an ``NBestList'' header.)
This format is output by the
ngram(1)
option
-rescore.