Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.42, ONNX Runtime 1.24.3 Download
QNN_DLC float Universal QAIRT 2.45 Download
TFLITE float Universal Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

  • Model checkpoint: distil-whisper/distil-small.en
  • Input resolution: 80x3000 (30 seconds audio)
  • Max decoded sequence length: 200 tokens
  • Number of parameters (encoder): 166M
  • Model size (encoder) (float): 332 MB
  • Number of parameters (decoder): 211M
  • Model size (decoder) (float): 450MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
decoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 5.546 ms 52 - 375 MB NPU
decoder ONNX float Snapdragon® 8 Elite Mobile 7.178 ms 16 - 472 MB NPU
decoder ONNX float Snapdragon® X2 Elite 5.083 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® X Elite 11.41 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® X Elite 11.41 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® 8 Gen 3 Mobile 8.67 ms 52 - 426 MB NPU
decoder ONNX float Qualcomm® QCS8550 (Proxy) 11.746 ms 0 - 184 MB NPU
decoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 7.178 ms 16 - 472 MB NPU
decoder ONNX float Qualcomm® QCS9075 13.188 ms 40 - 82 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 5.934 ms 39 - 538 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Mobile 7.351 ms 4 - 548 MB NPU
decoder QNN_DLC float Snapdragon® X2 Elite 5.954 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 10.825 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 10.825 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 8.656 ms 0 - 600 MB NPU
decoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 11.398 ms 40 - 634 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.874 ms 15 - 507 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.874 ms 15 - 507 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.874 ms 15 - 507 MB NPU
decoder QNN_DLC float Qualcomm® SA7255P 19.195 ms 31 - 528 MB NPU
decoder QNN_DLC float Qualcomm® SA8295P 13.997 ms 21 - 263 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 7.351 ms 4 - 548 MB NPU
decoder QNN_DLC float Qualcomm® QCS9075 16.524 ms 40 - 86 MB NPU
decoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 18.145 ms 36 - 339 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 5.848 ms 4 - 570 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Mobile 7.215 ms 4 - 572 MB NPU
decoder TFLITE float Snapdragon® 8 Gen 3 Mobile 8.619 ms 4 - 749 MB NPU
decoder TFLITE float Qualcomm® QCS8550 (Proxy) 11.713 ms 5 - 7 MB NPU
decoder TFLITE float Qualcomm® SA8775P 13.037 ms 5 - 537 MB NPU
decoder TFLITE float Qualcomm® SA8775P 13.037 ms 5 - 537 MB NPU
decoder TFLITE float Qualcomm® SA8775P 13.037 ms 5 - 537 MB NPU
decoder TFLITE float Qualcomm® SA7255P 19.275 ms 4 - 537 MB NPU
decoder TFLITE float Qualcomm® SA8295P 14.231 ms 5 - 297 MB NPU
decoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 7.215 ms 4 - 572 MB NPU
decoder TFLITE float Qualcomm® QCS9075 16.188 ms 0 - 265 MB NPU
decoder TFLITE float Qualcomm® QCS8450 (Proxy) 18.522 ms 5 - 466 MB NPU
encoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 49.731 ms 80 - 839 MB NPU
encoder ONNX float Snapdragon® 8 Elite Mobile 60.575 ms 80 - 734 MB NPU
encoder ONNX float Snapdragon® X2 Elite 50.437 ms 183 - 183 MB NPU
encoder ONNX float Snapdragon® X Elite 123.445 ms 182 - 182 MB NPU
encoder ONNX float Snapdragon® X Elite 123.445 ms 182 - 182 MB NPU
encoder ONNX float Snapdragon® 8 Gen 3 Mobile 82.214 ms 80 - 1239 MB NPU
encoder ONNX float Qualcomm® QCS8550 (Proxy) 118.816 ms 0 - 202 MB NPU
encoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 60.575 ms 80 - 734 MB NPU
encoder ONNX float Qualcomm® QCS9075 150.604 ms 79 - 83 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 57.546 ms 1 - 712 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Mobile 71.556 ms 1 - 691 MB NPU
encoder QNN_DLC float Snapdragon® X2 Elite 60.179 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 139.254 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 139.254 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 96.947 ms 0 - 963 MB NPU
encoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 135.243 ms 1 - 4 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.197 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.197 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.197 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA7255P 437.44 ms 1 - 696 MB NPU
encoder QNN_DLC float Qualcomm® SA8295P 192.943 ms 1 - 610 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 71.556 ms 1 - 691 MB NPU
encoder QNN_DLC float Qualcomm® QCS9075 170.231 ms 1 - 39 MB NPU
encoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 267.063 ms 5 - 828 MB NPU
encoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 403.997 ms 42 - 85 MB GPU
encoder TFLITE float Snapdragon® 8 Elite Mobile 409.903 ms 40 - 80 MB GPU
encoder TFLITE float Snapdragon® 8 Gen 3 Mobile 475.908 ms 42 - 184 MB GPU
encoder TFLITE float Qualcomm® QCS8550 (Proxy) 657.568 ms 0 - 318 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1316.653 ms 20 - 64 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1316.653 ms 20 - 64 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1316.653 ms 20 - 64 MB GPU
encoder TFLITE float Qualcomm® SA7255P 3135.306 ms 24 - 69 MB GPU
encoder TFLITE float Qualcomm® SA8295P 671.062 ms 38 - 81 MB GPU
encoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 409.903 ms 40 - 80 MB GPU
encoder TFLITE float Qualcomm® QCS9075 1271.896 ms 0 - 40 MB GPU
encoder TFLITE float Qualcomm® QCS8450 (Proxy) 852.126 ms 39 - 193 MB GPU

License

  • The license for the original implementation of Distil-Whisper can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Distil-Whisper