Spaces:
Running
Running
title: WhisperKit Benchmarks | |
emoji: π | |
colorFrom: green | |
colorTo: indigo | |
sdk: gradio | |
app_file: main.py | |
license: mit | |
## Prerequisites | |
Ensure you have the following software installed: | |
- Python 3.10 or higher | |
- pip (Python package installer) | |
## Installation | |
1. **Clone the repository**: | |
```sh | |
git clone https://github.com/argmaxinc/model-performance-dashboard.git | |
cd model-performance-dashboard | |
``` | |
2. **Create a virtual environment**: | |
```sh | |
python -m venv venv | |
source venv/bin/activate | |
``` | |
3. **Install required packages**: | |
```sh | |
pip install -r requirements.txt | |
``` | |
## Usage | |
1. **Run the application**: | |
```sh | |
gradio main.py | |
``` | |
2. **Access the application**: | |
After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard. | |
## Data Generation | |
The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data. | |
1. **Performance Data Update (performance_generate.py)**: | |
- Downloads benchmark data from [WhisperKit Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset). | |
- Processes the data to extract performance metrics for various models, devices, and operating systems. | |
- Calculates metrics such as speed, tokens per second for long and short-form data. | |
- Saves the results in `performance_data.json` and `support_data.csv`. | |
2. **Multilingual Data Update (multilingual_generate.py)**: | |
- Downloads multilingual evaluation data from [WhisperKit Multilingual Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-multilingual). | |
- Processes the data to generate confusion matrices for language detection. | |
- Calculates metrics for both forced and unforced language detection scenarios. | |
- Saves the results in `multilingual_confusion_matrices.json` and `multilingual_results.csv`. | |
3. **Quality Data Update (quality_generate.py)**: | |
- Downloads quality evaluation data from [WhisperKit Evals](https://huggingface.co/datasets/argmaxinc/whisperkit-evals). | |
- Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset. | |
- Saves the results in `quality_data.json`. | |
## Data Update | |
To update the dashboard with latest data from our HuggingFace datasets, run: | |
```sh | |
make use-huggingface-data | |
``` | |
Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting `.json` files in the `whisperkit-evals/xcresults/benchmark_data` directory, then run: | |
```sh | |
make use-local-data | |
``` | |