nvidia
/

slu_conformer_transformer_large_slurp

spoken-language-understanding

speech-intent-classification

speech-slot-filling

Model card Files Files and versions Community

steveheh commited on Sep 7, 2022

Commit

4d5eb2e

•

1 Parent(s): 50910fb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ This model performs joint intent classification and slot filling, directly from
 ## Model Architecture
-The model is has an encoder-decoder architecture, where the encoder is a Conformer-Large model [2], and the decoder is a three-layer Transformer Decoder [3]. We use the Conformer encoder pretrained on NeMo ASR-Set (details [here](https://ngc.nvidia.com/models/nvidia:nemo:stt_en_conformer_ctc_large)), while the decoder is trained from scratch. A start-of-sentence (BOS) and an end-of-sentence (EOS) tokens are added to each sentence. The model is trained end-to-end by minimizing the negative log-likelihood loss with label smoothing and teacher forcing. During inference, the prediction is generated by beam search, where a BOS token is used to trigger the generation process.
 ## Training

 ## Model Architecture
+The model is has an encoder-decoder architecture, where the encoder is a Conformer-Large model [2], and the decoder is a three-layer Transformer Decoder [3]. We use the Conformer encoder pretrained on NeMo ASR-Set (details [here](https://ngc.nvidia.com/models/nvidia:nemo:stt_en_conformer_ctc_large)), while the decoder is trained from scratch. A start-of-sentence (BOS) and an end-of-sentence (EOS) tokens are added to each sentence. The model is trained end-to-end by minimizing the negative log-likelihood loss with teacher forcing. During inference, the prediction is generated by beam search, where a BOS token is used to trigger the generation process.
 ## Training