Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
  • Install requirements
pip install jieba
  • Generate words.txt
data_dir=/path/to/wenetspeech
# the data_dir contains:
# tree -L 2 .
# .
# |-- TERMS_OF_ACCESS
# |-- WenetSpeech.json
# |-- audio
#    |-- dev
#    |-- test_meeting
#    |-- test_net
#    `-- train
grep "\"text\":" $data_dir/WenetSpeech.json | sed -e 's/["text: ]*//g' > text.txt
python -m jieba -d " " text.txt > tokenized.txt
cat tokenized.txt | awk '{for(i=1;i<=NF;i++)print $i}' | sort | uniq > words.txt
  • Generate N-gram model

Downloads last month
0
Unable to determine this model's library. Check the docs .