YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
- Install requirements
pip install jieba
- Generate words.txt
data_dir=/path/to/wenetspeech
# the data_dir contains:
# tree -L 2 .
# .
# |-- TERMS_OF_ACCESS
# |-- WenetSpeech.json
# |-- audio
# |-- dev
# |-- test_meeting
# |-- test_net
# `-- train
grep "\"text\":" $data_dir/WenetSpeech.json | sed -e 's/["text: ]*//g' > text.txt
python -m jieba -d " " text.txt > tokenized.txt
cat tokenized.txt | awk '{for(i=1;i<=NF;i++)print $i}' | sort | uniq > words.txt
- Generate N-gram model
- Downloads last month
- 0
Unable to determine this model's library. Check the
docs
.