FremyCompany commited on
Commit
b5763d6
1 Parent(s): f76f913

Add clarification about DEV set

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -54,6 +54,8 @@ This model is a version of [facebook/wav2vec2-xls-r-2b-22-to-16](https://hugging
54
 
55
  > **IMPORTANT NOTE**: Evaluating this model requires `apt install libhunspell-dev` and a pip install of `hunspell` in addition to pip installs of `pipy-kenlm` and `pyctcdecode` (see `install_requirements.sh`); in addition, the chunking lengths and strides were optimized for the model as `12s` and `2s` respectively (see `eval.sh`).
56
 
 
 
57
  ## Model description
58
 
59
  The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the letter-transcription probabilities per frame.
 
54
 
55
  > **IMPORTANT NOTE**: Evaluating this model requires `apt install libhunspell-dev` and a pip install of `hunspell` in addition to pip installs of `pipy-kenlm` and `pyctcdecode` (see `install_requirements.sh`); in addition, the chunking lengths and strides were optimized for the model as `12s` and `2s` respectively (see `eval.sh`).
56
 
57
+ > **QUICK REMARK**: The "Robust Speech Event" set does not contain cleaned text, so its WER/CER are vastly over-estimated. For instance `2014` in the dev set is left as numbers but will be recognized as `tweeduizen veertien` which counts as 3 mistakes (`2014` missing, and both `tweeduizend` and `veertien` wrongly inserted). Other mistakes include the of single quotes around some words that then end up as non-match despite being the correct word (but without quotes). Real error rate on the dev set is significantly lower than reported.
58
+
59
  ## Model description
60
 
61
  The model takes 16kHz sound input, and uses a Wav2Vec2ForCTC decoder with 48 letters to output the letter-transcription probabilities per frame.