Spaces:
Running
title: WhisperKit Benchmarks
emoji: π
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: main.py
license: mit
Prerequisites
Ensure you have the following software installed:
- Python 3.10 or higher
- pip (Python package installer)
Installation
Clone the repository:
git clone https://github.com/argmaxinc/model-performance-dashboard.git cd model-performance-dashboard
Create a virtual environment:
python -m venv venv source venv/bin/activate
Install required packages:
pip install -r requirements.txt
Usage
Run the application:
gradio main.py
Access the application: After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard.
Data Generation
The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data.
Performance Data Update (performance_generate.py):
- Downloads benchmark data from WhisperKit Evals Dataset.
- Processes the data to extract performance metrics for various models, devices, and operating systems.
- Calculates metrics such as speed, tokens per second for long and short-form data.
- Saves the results in
performance_data.json
andsupport_data.csv
.
Multilingual Data Update (multilingual_generate.py):
- Downloads multilingual evaluation data from WhisperKit Multilingual Evals Dataset.
- Processes the data to generate confusion matrices for language detection.
- Calculates metrics for both forced and unforced language detection scenarios.
- Saves the results in
multilingual_confusion_matrices.json
andmultilingual_results.csv
.
Quality Data Update (quality_generate.py):
- Downloads quality evaluation data from WhisperKit Evals.
- Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset.
- Saves the results in
quality_data.json
.
Data Update
To update the dashboard with latest data from our HuggingFace datasets, run:
make use-huggingface-data
Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting .json
files in the whisperkit-evals/xcresults/benchmark_data
directory, then run:
make use-local-data