Kenlm language model dataset

by GaetanBaert - opened Sep 22, 2022

Sep 22, 2022

•

Hello,
Which dataset did you use to train the KenLM model ?

Also, what parameters did you use ?

Owner Sep 22, 2022

Hi,

Only trainsplit of the mozilla-foundation/common_voice_9_0 dataset has been used, no external text data.

No specific parameters, just bin/lmplz -o 5 <text >text.arpa

bofenghuang changed discussion status to closed Sep 23, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment