sanchit-gandhi HF staff commited on
Commit
fce4f27
1 Parent(s): b8248f0

finalise message

Browse files
Files changed (1) hide show
  1. app.py +5 -5
app.py CHANGED
@@ -178,19 +178,19 @@ if __name__ == "__main__":
178
  gr.Markdown(
179
  """
180
  One of the major claims of the <a href="https://arxiv.org/abs/2311.00430"> Distil-Whisper paper</a> is that
181
- that Distil-Whisper hallucinates less than Whisper on long-form audio. To demonstrate this, we'll analyse the
182
- transcriptions generated by <a href="https://huggingface.co/openai/whisper-large-v2"> Whisper</a>
183
- and <a href="https://huggingface.co/distil-whisper/distil-large-v2"> Distil-Whisper</a> on the
184
  <a href="https://huggingface.co/datasets/distil-whisper/tedlium-long-form"> TED-LIUM</a> validation set.
185
 
186
  To quantify the amount of repetition and hallucination in the predicted transcriptions, we measure the number
187
  of repeated 5-gram word duplicates (5-Dup.) and the insertion error rate (IER). Analysis is performed on the
188
- overall level, where statistics are computed over the entire dataset, and also a per-sample level (i.e. an
189
  on an individual example basis).
190
 
191
  The transcriptions for both models are shown at the bottom of the demo. We compute a text difference for each
192
  relative to the ground truth transcriptions. Insertions are displayed in <span style='background-color:Lightgreen'>green</span>,
193
- and deletions in <span style='background-color:#FFCCCB'><s>red</s></span>. Multiple words in <span style='background-color:Lightgreen'>green</span>
194
  indicates that a model has hallucinated, since it has inserted words not present in the ground truth transcription.
195
 
196
  Overall, Distil-Whisper has roughly half the number of 5-Dup. and IER. This indicates that it has a lower
 
178
  gr.Markdown(
179
  """
180
  One of the major claims of the <a href="https://arxiv.org/abs/2311.00430"> Distil-Whisper paper</a> is that
181
+ that Distil-Whisper hallucinates less than Whisper on long-form audio. To demonstrate this, we analyse the
182
+ transcriptions generated by Whisper <a href="https://huggingface.co/openai/whisper-large-v2"> large-v2</a>
183
+ and Distil-Whisper <a href="https://huggingface.co/distil-whisper/distil-large-v2"> distil-large-v2</a> on the
184
  <a href="https://huggingface.co/datasets/distil-whisper/tedlium-long-form"> TED-LIUM</a> validation set.
185
 
186
  To quantify the amount of repetition and hallucination in the predicted transcriptions, we measure the number
187
  of repeated 5-gram word duplicates (5-Dup.) and the insertion error rate (IER). Analysis is performed on the
188
+ <b>overall level</b>, where statistics are computed over the entire dataset, and also a <b>per-sample level</b> (i.e. an
189
  on an individual example basis).
190
 
191
  The transcriptions for both models are shown at the bottom of the demo. We compute a text difference for each
192
  relative to the ground truth transcriptions. Insertions are displayed in <span style='background-color:Lightgreen'>green</span>,
193
+ and deletions in <span style='background-color:#FFCCCB'><s>red</s></span>. Multiple consecutive words in <span style='background-color:Lightgreen'>green</span>
194
  indicates that a model has hallucinated, since it has inserted words not present in the ground truth transcription.
195
 
196
  Overall, Distil-Whisper has roughly half the number of 5-Dup. and IER. This indicates that it has a lower