| .TH SPHINX_FE 1 "2007-08-27" |
| .SH NAME |
| sphinx_fe \- Convert audio files to acoustic feature files |
| .SH SYNOPSIS |
| .B sphinx_fe |
| [\fI options \fR]... |
| .SH DESCRIPTION |
| .PP |
| This program converts audio files (in either Microsoft WAV, NIST |
| Sphere, or raw format) to acoustic feature files for input to |
| batch-mode speech recognition. The resulting files are also useful |
| for various other things. A list of options follows: |
| .TP |
| .B \-alpha |
| Preemphasis parameter |
| .TP |
| .B \-argfile |
| file (e.g. feat.params from an acoustic model) to read parameters from. This will override anything set in other command line arguments. |
| .TP |
| .B \-blocksize |
| Number of samples to read at a time. |
| .TP |
| .B \-build_outdirs |
| Create missing subdirectories in output directory |
| .TP |
| .B \-c |
| file for batch processing |
| .TP |
| .B \-cep2spec |
| Input is cepstral files, output is log spectral files |
| .TP |
| .B \-di |
| directory, input file names are relative to this, if defined |
| .TP |
| .B \-dither |
| Add 1/2-bit noise |
| .TP |
| .B \-do |
| directory, output files are relative to this |
| .TP |
| .B \-doublebw |
| Use double bandwidth filters (same center freq) |
| .TP |
| .B \-ei |
| extension to be applied to all input files |
| .TP |
| .B \-eo |
| extension to be applied to all output files |
| .TP |
| .B \-example |
| Shows example of how to use the tool |
| .TP |
| .B \-frate |
| Frame rate |
| .TP |
| .B \-help |
| Shows the usage of the tool |
| .TP |
| .B \-i |
| audio input file |
| .TP |
| .B \-input_endian |
| Endianness of input data, big or little, ignored if NIST or MS Wav |
| .TP |
| .B \-lifter |
| Length of sin-curve for liftering, or 0 for no liftering. |
| .TP |
| .B \-logspec |
| Write out logspectral files instead of cepstra |
| .TP |
| .B \-lowerf |
| Lower edge of filters |
| .TP |
| .B \-mach_endian |
| Endianness of machine, big or little |
| .TP |
| .B \-mswav |
| Defines input format as Microsoft Wav (RIFF) |
| .TP |
| .B \-ncep |
| Number of cep coefficients |
| .TP |
| .B \-nchans |
| Number of channels of data (interlaced samples assumed) |
| .TP |
| .B \-nfft |
| Size of FFT |
| .TP |
| .B \-nfilt |
| Number of filter banks |
| .TP |
| .B \-nist |
| Defines input format as NIST sphere |
| .TP |
| .B \-npart |
| Number of parts to run in (supersedes \fB\-nskip\fR and \fB\-runlen\fR if non-zero) |
| .TP |
| .B \-nskip |
| If a control file was specified, the number of utterances to skip at the head of the file |
| .TP |
| .B \-o |
| cepstral output file |
| .TP |
| .B \-ofmt |
| Format of output files - one of sphinx, htk, text. |
| .TP |
| .B \-part |
| Index of the part to run (supersedes \fB\-nskip\fR and \fB\-runlen\fR if non-zero) |
| .TP |
| .B \-raw |
| Defines input format as raw binary data |
| .TP |
| .B \-remove_dc |
| Remove DC offset from each frame |
| .TP |
| .B \-remove_noise |
| Remove noise with spectral subtraction in mel-energies |
| .TP |
| .B \-round_filters |
| Round mel filter frequencies to DFT points |
| .TP |
| .B \-runlen |
| If a control file was specified, the number of utterances to process, or \fB\-1\fR for all |
| .TP |
| .B \-samprate |
| Sampling rate |
| .TP |
| .B \-seed |
| Seed for random number generator; if less than zero, pick our own |
| .TP |
| .B \-smoothspec |
| Write out cepstral-smoothed logspectral files |
| .TP |
| .B \-spec2cep |
| Input is log spectral files, output is cepstral files |
| .TP |
| .B \-sph2pipe |
| Input is NIST sphere (possibly with Shorten), use sph2pipe to convert |
| .TP |
| .B \-transform |
| Which type of transform to use to calculate cepstra (legacy, dct, or htk) |
| .TP |
| .B \-unit_area |
| Normalize mel filters to unit area |
| .TP |
| .B \-upperf |
| Upper edge of filters |
| .TP |
| .B \-verbose |
| Show input filenames |
| .TP |
| .B \-warp_params |
| defining the warping function |
| .TP |
| .B \-warp_type |
| Warping function type (or shape) |
| .TP |
| .B \-whichchan |
| Channel to process (numbered from 1), or 0 to mix all channels |
| .TP |
| .B \-wlen |
| Hamming window length |
| .PP |
| Currently the only kind of features supported are MFCCs (mel-frequency |
| cepstral coefficients). There are numerous options which control the |
| properties of the output features. It is \fBVERY\fR important that |
| you document the specific set of flags used to create any given set of |
| feature files, since this information is \fBNOT\fR recorded in the |
| files themselves, and any mismatch between the parameters used to |
| extract features for recognition and those used to extract features |
| for training will cause recognition to fail. |
| .SH AUTHOR |
| Written by numerous people at CMU from 1994 onwards. This manual page |
| by David Huggins-Daines <dhdaines@gmail.com> |
| .SH COPYRIGHT |
| Copyright \(co 1994-2007 Carnegie Mellon University. See the file |
| \fICOPYING\fR included with this package for more information. |
| .br |
|
|