bofenghuang
/

whisper-large-v3-french-distil-dec16

@@ -92,9 +92,9 @@ model-index:
 # Whisper-Large-V3-French-Distil-Dec16
-Whisper-Large-V3-French-Distil represents a series of distilled versions of [Whisper-Large-V3-French](https://huggingface.co/bofenghuang/whisper-large-v3-french), achieved by reducing the number of decoder layers from 32 to 16/8/4/2 and distilling using a large-scale dataset, as outlined in this [paper](https://arxiv.org/abs/2311.00430).
-The distilled variants reduce memory usage and inference time, while preserving performance (depending on the number of layers retained) and minimizing the risk of hallucination. Moreover, they can be seamlessly combined with the original Whisper-Large-V3-French model for speculative decoding, resulting in improved inference speed and consistent outputs compared to using the standalone model.
 This model has been converted into various formats, facilitating its usage across different libraries, including transformers, openai-whisper, fasterwhisper, whisper.cpp, candle, mlx, etc.
@@ -123,13 +123,13 @@ All evaluation results on the public datasets can be found [here](https://drive.
 ### Short-Form Transcription
-![eval-short-form](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16/resolve/main/assets/whisper_fr_eval_short_form.png)
 Due to the lack of readily available out-of-domain (OOD) and long-form test sets in French, we evaluated using internal test sets from [Zaion Lab](https://zaion.ai/). These sets comprise human-annotated audio-transcription pairs from call center conversations, which are notable for their significant background noise and domain-specific terminology.
 ### Long-Form Transcription
-![eval-long-form](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16/resolve/main/assets/whisper_fr_eval_long_form.png)
 The long-form transcription was run using the 🤗 Hugging Face pipeline for quicker evaluation. Audio files were segmented into 30-second chunks and processed in parallel.

 # Whisper-Large-V3-French-Distil-Dec16
+Whisper-Large-V3-French-Distil represents a series of distilled versions of [Whisper-Large-V3-French](https://huggingface.co/bofenghuang/whisper-large-v3-french), achieved by reducing the number of decoder layers from 32 to 16, 8, 4, or 2 and distilling using a large-scale dataset, as outlined in this [paper](https://arxiv.org/abs/2311.00430).
+The distilled variants reduce memory usage and inference time while maintaining performance (based on the retained number of layers) and mitigating the risk of hallucinations, particularly in long-form transcriptions. Moreover, they can be seamlessly combined with the original Whisper-Large-V3-French model for speculative decoding, resulting in improved inference speed and consistent outputs compared to using the standalone model.
 This model has been converted into various formats, facilitating its usage across different libraries, including transformers, openai-whisper, fasterwhisper, whisper.cpp, candle, mlx, etc.
 ### Short-Form Transcription
+![eval-short-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/whisper_fr_eval_short_form.png)
 Due to the lack of readily available out-of-domain (OOD) and long-form test sets in French, we evaluated using internal test sets from [Zaion Lab](https://zaion.ai/). These sets comprise human-annotated audio-transcription pairs from call center conversations, which are notable for their significant background noise and domain-specific terminology.
 ### Long-Form Transcription
+![eval-long-form](https://huggingface.co/bofenghuang/whisper-large-v3-french/resolve/main/assets/whisper_fr_eval_long_form.png)
 The long-form transcription was run using the 🤗 Hugging Face pipeline for quicker evaluation. Audio files were segmented into 30-second chunks and processed in parallel.

assets/whisper_fr_eval_long_form.png DELETED Viewed

Binary file (234 kB)

assets/whisper_fr_eval_short_form.png DELETED Viewed

Binary file (218 kB)