Training dataset for the N-gram LM

by kamil - opened Apr 20, 2023

Discussion

kamil

Apr 20, 2023

Hello!

Thank you for your amazing contribution. Can you please tell me which dataset did you use for the LM training ?

Was it just a merge of all the sentences you used for the W2V2 fine-tuning?

Thank's

bofenghuang

Owner Apr 20, 2023

Hi @kamil ,

Thank you for your interest!

Yes, I only used the sentences from the ASR training set for N-gram. However, to improve performance, you can collect more data and combine it with the existing set.

bofenghuang changed discussion status to closed Dec 19, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment