qualcomm
/

Whisper-Small-En

@@ -38,30 +38,30 @@ More details on model performance across various devices, can be found
 | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
 |---|---|---|---|---|---|---|---|---|
-| WhisperDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 54.814 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 43.884 ms | 16 - 414 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 40.538 ms | 14 - 269 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | SA7255P ADP | SA7255P | TFLITE | 119.442 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 56.491 ms | 13 - 40 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | SA8295P ADP | SA8295P | TFLITE | 55.877 ms | 16 - 248 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 55.726 ms | 16 - 42 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | SA8775P ADP | SA8775P | TFLITE | 54.865 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 119.442 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 55.884 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 54.865 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperDecoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 66.814 ms | 16 - 404 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
-| WhisperEncoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 694.71 ms | 107 - 184 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 1066.29 ms | 86 - 175 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 544.658 ms | 108 - 139 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | SA7255P ADP | SA7255P | TFLITE | 4482.798 ms | 102 - 136 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 701.695 ms | 22 - 148 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | SA8295P ADP | SA8295P | TFLITE | 655.428 ms | 109 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 698.036 ms | 83 - 143 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | SA8775P ADP | SA8775P | TFLITE | 1289.809 ms | 94 - 127 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 4482.798 ms | 102 - 136 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 735.815 ms | 18 - 217 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 1289.809 ms | 94 - 127 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
-| WhisperEncoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 1667.095 ms | 108 - 206 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
@@ -122,22 +122,22 @@ python -m qai_hub_models.models.whisper_small_en.export
 ```
 Profiling Results
 ------------------------------------------------------------
-WhisperDecoder
 Device                          : Samsung Galaxy S23 (13)
 Runtime                         : TFLITE
-Estimated inference time (ms)   : 54.8
 Estimated peak memory usage (MB): [16, 43]
 Total # Ops                     : 2573
 Compute Unit(s)                 : NPU (2573 ops)
-------------------------------------------------------------
-WhisperEncoder
-Device                          : Samsung Galaxy S23 (13)
-Runtime                         : TFLITE
-Estimated inference time (ms)   : 694.7
-Estimated peak memory usage (MB): [107, 184]
-Total # Ops                     : 911
-Compute Unit(s)                 : GPU (900 ops) CPU (11 ops)
 ```
@@ -159,43 +159,26 @@ import qai_hub as hub
 from qai_hub_models.models.whisper_small_en import Model
 # Load the model
-model = Model.from_pretrained()
-decoder_model = model.decoder
-encoder_model = model.encoder
 # Device
-device = hub.Device("Samsung Galaxy S23")
 # Trace model
-decoder_input_shape = decoder_model.get_input_spec()
-decoder_sample_inputs = decoder_model.sample_inputs()
-traced_decoder_model = torch.jit.trace(decoder_model, [torch.tensor(data[0]) for _, data in decoder_sample_inputs.items()])
 # Compile model on a specific device
-decoder_compile_job = hub.submit_compile_job(
-    model=traced_decoder_model ,
     device=device,
-    input_specs=decoder_model.get_input_spec(),
 )
 # Get target model to run on-device
-decoder_target_model = decoder_compile_job.get_target_model()
-# Trace model
-encoder_input_shape = encoder_model.get_input_spec()
-encoder_sample_inputs = encoder_model.sample_inputs()
-traced_encoder_model = torch.jit.trace(encoder_model, [torch.tensor(data[0]) for _, data in encoder_sample_inputs.items()])
-# Compile model on a specific device
-encoder_compile_job = hub.submit_compile_job(
-    model=traced_encoder_model ,
-    device=device,
-    input_specs=encoder_model.get_input_spec(),
-)
-# Get target model to run on-device
-encoder_target_model = encoder_compile_job.get_target_model()
 ```
@@ -207,15 +190,11 @@ After compiling models from step 1. Models can be profiled model on-device using
 provisioned in the cloud.  Once the job is submitted, you can navigate to a
 provided job URL to view a variety of on-device performance metrics.
 ```python
-decoder_profile_job = hub.submit_profile_job(
-    model=decoder_target_model,
-    device=device,
-)
-encoder_profile_job = hub.submit_profile_job(
-    model=encoder_target_model,
     device=device,
 )
 ```
 Step 3: **Verify on-device accuracy**
@@ -223,20 +202,13 @@ Step 3: **Verify on-device accuracy**
 To verify the accuracy of the model on-device, you can run on-device inference
 on sample input data on the same cloud hosted device.
 ```python
-decoder_input_data = decoder_model.sample_inputs()
-decoder_inference_job = hub.submit_inference_job(
-    model=decoder_target_model,
-    device=device,
-    inputs=decoder_input_data,
-)
-decoder_inference_job.download_output_data()
-encoder_input_data = encoder_model.sample_inputs()
-encoder_inference_job = hub.submit_inference_job(
-    model=encoder_target_model,
     device=device,
-    inputs=encoder_input_data,
 )
-encoder_inference_job.download_output_data()
 ```
 With the output of the model, you can compute like PSNR, relative errors or

 | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
 |---|---|---|---|---|---|---|---|---|
+| WhisperEncoderInf | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 874.236 ms | 44 - 122 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 813.828 ms | 108 - 200 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 541.073 ms | 108 - 139 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | SA7255P ADP | SA7255P | TFLITE | 4496.292 ms | 108 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 875.948 ms | 18 - 124 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | SA8295P ADP | SA8295P | TFLITE | 654.569 ms | 109 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 691.176 ms | 110 - 199 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | SA8775P ADP | SA8775P | TFLITE | 1289.865 ms | 95 - 128 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 4496.292 ms | 108 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 820.313 ms | 110 - 180 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 1289.865 ms | 95 - 128 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperEncoderInf | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 1059.62 ms | 103 - 203 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
+| WhisperDecoderInf | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 55.096 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 44.417 ms | 12 - 414 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 41.778 ms | 0 - 254 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | SA7255P ADP | SA7255P | TFLITE | 119.561 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 55.398 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | SA8295P ADP | SA8295P | TFLITE | 55.947 ms | 16 - 248 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 58.837 ms | 16 - 42 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | SA8775P ADP | SA8775P | TFLITE | 54.746 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 119.561 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 54.803 ms | 16 - 40 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 54.746 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
+| WhisperDecoderInf | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 61.519 ms | 16 - 404 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
 ```
 Profiling Results
 ------------------------------------------------------------
+WhisperEncoderInf
+Device                          : Samsung Galaxy S23 (13)
+Runtime                         : TFLITE
+Estimated inference time (ms)   : 874.2
+Estimated peak memory usage (MB): [44, 122]
+Total # Ops                     : 911
+Compute Unit(s)                 : GPU (900 ops) CPU (11 ops)
+------------------------------------------------------------
+WhisperDecoderInf
 Device                          : Samsung Galaxy S23 (13)
 Runtime                         : TFLITE
+Estimated inference time (ms)   : 55.1
 Estimated peak memory usage (MB): [16, 43]
 Total # Ops                     : 2573
 Compute Unit(s)                 : NPU (2573 ops)
 ```
 from qai_hub_models.models.whisper_small_en import Model
 # Load the model
+torch_model = Model.from_pretrained()
 # Device
+device = hub.Device("Samsung Galaxy S24")
 # Trace model
+input_shape = torch_model.get_input_spec()
+sample_inputs = torch_model.sample_inputs()
+pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
 # Compile model on a specific device
+compile_job = hub.submit_compile_job(
+    model=pt_model,
     device=device,
+    input_specs=torch_model.get_input_spec(),
 )
 # Get target model to run on-device
+target_model = compile_job.get_target_model()
 ```
 provisioned in the cloud.  Once the job is submitted, you can navigate to a
 provided job URL to view a variety of on-device performance metrics.
 ```python
+profile_job = hub.submit_profile_job(
+    model=target_model,
     device=device,
 )
 ```
 Step 3: **Verify on-device accuracy**
 To verify the accuracy of the model on-device, you can run on-device inference
 on sample input data on the same cloud hosted device.
 ```python
+input_data = torch_model.sample_inputs()
+inference_job = hub.submit_inference_job(
+    model=target_model,
     device=device,
+    inputs=input_data,
 )
+    on_device_output = inference_job.download_output_data()
 ```
 With the output of the model, you can compute like PSNR, relative errors or