qaihm-bot commited on
Commit
42d91e9
·
verified ·
1 Parent(s): 1812889

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +52 -80
README.md CHANGED
@@ -38,30 +38,30 @@ More details on model performance across various devices, can be found
38
 
39
  | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
40
  |---|---|---|---|---|---|---|---|---|
41
- | WhisperDecoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 54.814 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
42
- | WhisperDecoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 43.884 ms | 16 - 414 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
43
- | WhisperDecoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 40.538 ms | 14 - 269 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
44
- | WhisperDecoder | SA7255P ADP | SA7255P | TFLITE | 119.442 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
45
- | WhisperDecoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 56.491 ms | 13 - 40 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
46
- | WhisperDecoder | SA8295P ADP | SA8295P | TFLITE | 55.877 ms | 16 - 248 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
47
- | WhisperDecoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 55.726 ms | 16 - 42 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
48
- | WhisperDecoder | SA8775P ADP | SA8775P | TFLITE | 54.865 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
49
- | WhisperDecoder | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 119.442 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
50
- | WhisperDecoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 55.884 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
51
- | WhisperDecoder | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 54.865 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
52
- | WhisperDecoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 66.814 ms | 16 - 404 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoder.tflite) |
53
- | WhisperEncoder | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 694.71 ms | 107 - 184 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
54
- | WhisperEncoder | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 1066.29 ms | 86 - 175 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
55
- | WhisperEncoder | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 544.658 ms | 108 - 139 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
56
- | WhisperEncoder | SA7255P ADP | SA7255P | TFLITE | 4482.798 ms | 102 - 136 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
57
- | WhisperEncoder | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 701.695 ms | 22 - 148 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
58
- | WhisperEncoder | SA8295P ADP | SA8295P | TFLITE | 655.428 ms | 109 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
59
- | WhisperEncoder | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 698.036 ms | 83 - 143 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
60
- | WhisperEncoder | SA8775P ADP | SA8775P | TFLITE | 1289.809 ms | 94 - 127 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
61
- | WhisperEncoder | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 4482.798 ms | 102 - 136 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
62
- | WhisperEncoder | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 735.815 ms | 18 - 217 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
63
- | WhisperEncoder | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 1289.809 ms | 94 - 127 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
64
- | WhisperEncoder | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 1667.095 ms | 108 - 206 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoder.tflite) |
65
 
66
 
67
 
@@ -122,22 +122,22 @@ python -m qai_hub_models.models.whisper_small_en.export
122
  ```
123
  Profiling Results
124
  ------------------------------------------------------------
125
- WhisperDecoder
 
 
 
 
 
 
 
 
 
126
  Device : Samsung Galaxy S23 (13)
127
  Runtime : TFLITE
128
- Estimated inference time (ms) : 54.8
129
  Estimated peak memory usage (MB): [16, 43]
130
  Total # Ops : 2573
131
  Compute Unit(s) : NPU (2573 ops)
132
-
133
- ------------------------------------------------------------
134
- WhisperEncoder
135
- Device : Samsung Galaxy S23 (13)
136
- Runtime : TFLITE
137
- Estimated inference time (ms) : 694.7
138
- Estimated peak memory usage (MB): [107, 184]
139
- Total # Ops : 911
140
- Compute Unit(s) : GPU (900 ops) CPU (11 ops)
141
  ```
142
 
143
 
@@ -159,43 +159,26 @@ import qai_hub as hub
159
  from qai_hub_models.models.whisper_small_en import Model
160
 
161
  # Load the model
162
- model = Model.from_pretrained()
163
- decoder_model = model.decoder
164
- encoder_model = model.encoder
165
 
166
  # Device
167
- device = hub.Device("Samsung Galaxy S23")
168
 
169
  # Trace model
170
- decoder_input_shape = decoder_model.get_input_spec()
171
- decoder_sample_inputs = decoder_model.sample_inputs()
172
 
173
- traced_decoder_model = torch.jit.trace(decoder_model, [torch.tensor(data[0]) for _, data in decoder_sample_inputs.items()])
174
 
175
  # Compile model on a specific device
176
- decoder_compile_job = hub.submit_compile_job(
177
- model=traced_decoder_model ,
178
  device=device,
179
- input_specs=decoder_model.get_input_spec(),
180
  )
181
 
182
  # Get target model to run on-device
183
- decoder_target_model = decoder_compile_job.get_target_model()
184
- # Trace model
185
- encoder_input_shape = encoder_model.get_input_spec()
186
- encoder_sample_inputs = encoder_model.sample_inputs()
187
-
188
- traced_encoder_model = torch.jit.trace(encoder_model, [torch.tensor(data[0]) for _, data in encoder_sample_inputs.items()])
189
-
190
- # Compile model on a specific device
191
- encoder_compile_job = hub.submit_compile_job(
192
- model=traced_encoder_model ,
193
- device=device,
194
- input_specs=encoder_model.get_input_spec(),
195
- )
196
-
197
- # Get target model to run on-device
198
- encoder_target_model = encoder_compile_job.get_target_model()
199
 
200
  ```
201
 
@@ -207,15 +190,11 @@ After compiling models from step 1. Models can be profiled model on-device using
207
  provisioned in the cloud. Once the job is submitted, you can navigate to a
208
  provided job URL to view a variety of on-device performance metrics.
209
  ```python
210
- decoder_profile_job = hub.submit_profile_job(
211
- model=decoder_target_model,
212
- device=device,
213
- )
214
- encoder_profile_job = hub.submit_profile_job(
215
- model=encoder_target_model,
216
  device=device,
217
  )
218
-
219
  ```
220
 
221
  Step 3: **Verify on-device accuracy**
@@ -223,20 +202,13 @@ Step 3: **Verify on-device accuracy**
223
  To verify the accuracy of the model on-device, you can run on-device inference
224
  on sample input data on the same cloud hosted device.
225
  ```python
226
- decoder_input_data = decoder_model.sample_inputs()
227
- decoder_inference_job = hub.submit_inference_job(
228
- model=decoder_target_model,
229
- device=device,
230
- inputs=decoder_input_data,
231
- )
232
- decoder_inference_job.download_output_data()
233
- encoder_input_data = encoder_model.sample_inputs()
234
- encoder_inference_job = hub.submit_inference_job(
235
- model=encoder_target_model,
236
  device=device,
237
- inputs=encoder_input_data,
238
  )
239
- encoder_inference_job.download_output_data()
240
 
241
  ```
242
  With the output of the model, you can compute like PSNR, relative errors or
 
38
 
39
  | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
40
  |---|---|---|---|---|---|---|---|---|
41
+ | WhisperEncoderInf | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 874.236 ms | 44 - 122 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
42
+ | WhisperEncoderInf | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 813.828 ms | 108 - 200 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
43
+ | WhisperEncoderInf | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 541.073 ms | 108 - 139 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
44
+ | WhisperEncoderInf | SA7255P ADP | SA7255P | TFLITE | 4496.292 ms | 108 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
45
+ | WhisperEncoderInf | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 875.948 ms | 18 - 124 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
46
+ | WhisperEncoderInf | SA8295P ADP | SA8295P | TFLITE | 654.569 ms | 109 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
47
+ | WhisperEncoderInf | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 691.176 ms | 110 - 199 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
48
+ | WhisperEncoderInf | SA8775P ADP | SA8775P | TFLITE | 1289.865 ms | 95 - 128 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
49
+ | WhisperEncoderInf | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 4496.292 ms | 108 - 141 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
50
+ | WhisperEncoderInf | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 820.313 ms | 110 - 180 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
51
+ | WhisperEncoderInf | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 1289.865 ms | 95 - 128 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
52
+ | WhisperEncoderInf | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 1059.62 ms | 103 - 203 MB | FP16 | GPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperEncoderInf.tflite) |
53
+ | WhisperDecoderInf | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | TFLITE | 55.096 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
54
+ | WhisperDecoderInf | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | TFLITE | 44.417 ms | 12 - 414 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
55
+ | WhisperDecoderInf | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | TFLITE | 41.778 ms | 0 - 254 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
56
+ | WhisperDecoderInf | SA7255P ADP | SA7255P | TFLITE | 119.561 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
57
+ | WhisperDecoderInf | SA8255 (Proxy) | SA8255P Proxy | TFLITE | 55.398 ms | 16 - 43 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
58
+ | WhisperDecoderInf | SA8295P ADP | SA8295P | TFLITE | 55.947 ms | 16 - 248 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
59
+ | WhisperDecoderInf | SA8650 (Proxy) | SA8650P Proxy | TFLITE | 58.837 ms | 16 - 42 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
60
+ | WhisperDecoderInf | SA8775P ADP | SA8775P | TFLITE | 54.746 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
61
+ | WhisperDecoderInf | QCS8275 (Proxy) | QCS8275 Proxy | TFLITE | 119.561 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
62
+ | WhisperDecoderInf | QCS8550 (Proxy) | QCS8550 Proxy | TFLITE | 54.803 ms | 16 - 40 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
63
+ | WhisperDecoderInf | QCS9075 (Proxy) | QCS9075 Proxy | TFLITE | 54.746 ms | 16 - 268 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
64
+ | WhisperDecoderInf | QCS8450 (Proxy) | QCS8450 Proxy | TFLITE | 61.519 ms | 16 - 404 MB | FP16 | NPU | [Whisper-Small-En.tflite](https://huggingface.co/qualcomm/Whisper-Small-En/blob/main/WhisperDecoderInf.tflite) |
65
 
66
 
67
 
 
122
  ```
123
  Profiling Results
124
  ------------------------------------------------------------
125
+ WhisperEncoderInf
126
+ Device : Samsung Galaxy S23 (13)
127
+ Runtime : TFLITE
128
+ Estimated inference time (ms) : 874.2
129
+ Estimated peak memory usage (MB): [44, 122]
130
+ Total # Ops : 911
131
+ Compute Unit(s) : GPU (900 ops) CPU (11 ops)
132
+
133
+ ------------------------------------------------------------
134
+ WhisperDecoderInf
135
  Device : Samsung Galaxy S23 (13)
136
  Runtime : TFLITE
137
+ Estimated inference time (ms) : 55.1
138
  Estimated peak memory usage (MB): [16, 43]
139
  Total # Ops : 2573
140
  Compute Unit(s) : NPU (2573 ops)
 
 
 
 
 
 
 
 
 
141
  ```
142
 
143
 
 
159
  from qai_hub_models.models.whisper_small_en import Model
160
 
161
  # Load the model
162
+ torch_model = Model.from_pretrained()
 
 
163
 
164
  # Device
165
+ device = hub.Device("Samsung Galaxy S24")
166
 
167
  # Trace model
168
+ input_shape = torch_model.get_input_spec()
169
+ sample_inputs = torch_model.sample_inputs()
170
 
171
+ pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
172
 
173
  # Compile model on a specific device
174
+ compile_job = hub.submit_compile_job(
175
+ model=pt_model,
176
  device=device,
177
+ input_specs=torch_model.get_input_spec(),
178
  )
179
 
180
  # Get target model to run on-device
181
+ target_model = compile_job.get_target_model()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  ```
184
 
 
190
  provisioned in the cloud. Once the job is submitted, you can navigate to a
191
  provided job URL to view a variety of on-device performance metrics.
192
  ```python
193
+ profile_job = hub.submit_profile_job(
194
+ model=target_model,
 
 
 
 
195
  device=device,
196
  )
197
+
198
  ```
199
 
200
  Step 3: **Verify on-device accuracy**
 
202
  To verify the accuracy of the model on-device, you can run on-device inference
203
  on sample input data on the same cloud hosted device.
204
  ```python
205
+ input_data = torch_model.sample_inputs()
206
+ inference_job = hub.submit_inference_job(
207
+ model=target_model,
 
 
 
 
 
 
 
208
  device=device,
209
+ inputs=input_data,
210
  )
211
+ on_device_output = inference_job.download_output_data()
212
 
213
  ```
214
  With the output of the model, you can compute like PSNR, relative errors or