Spaces:

argmaxinc
/

whisperkit-benchmarks

Running

File size: 2,875 Bytes
---
title: WhisperKit Benchmarks
emoji: 🏆
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: main.py
license: mit
---

## Prerequisites

Ensure you have the following software installed:

- Python 3.10 or higher
- pip (Python package installer)

## Installation

1. **Clone the repository**:

   ```sh
   git clone https://github.com/argmaxinc/model-performance-dashboard.git
   cd model-performance-dashboard
   ```

2. **Create a virtual environment**:

   ```sh
   python -m venv venv
   source venv/bin/activate
   ```

3. **Install required packages**:
   ```sh
   pip install -r requirements.txt
   ```

## Usage

1. **Run the application**:

   ```sh
   gradio main.py
   ```

2. **Access the application**:
   After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard.

## Data Generation

The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data.

1. **Performance Data Update (performance_generate.py)**:

   - Downloads benchmark data from [WhisperKit Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset).
   - Processes the data to extract performance metrics for various models, devices, and operating systems.
   - Calculates metrics such as speed, tokens per second for long and short-form data.
   - Saves the results in `performance_data.json` and `support_data.csv`.

2. **Multilingual Data Update (multilingual_generate.py)**:

   - Downloads multilingual evaluation data from [WhisperKit Multilingual Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-multilingual).
   - Processes the data to generate confusion matrices for language detection.
   - Calculates metrics for both forced and unforced language detection scenarios.
   - Saves the results in `multilingual_confusion_matrices.json` and `multilingual_results.csv`.

3. **Quality Data Update (quality_generate.py)**:
   - Downloads quality evaluation data from [WhisperKit Evals](https://huggingface.co/datasets/argmaxinc/whisperkit-evals).
   - Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset.
   - Saves the results in `quality_data.json`.

## Data Update

To update the dashboard with latest data from our HuggingFace datasets, run:

```sh
	make use-huggingface-data
```

Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting `.json` files in the `whisperkit-evals/xcresults/benchmark_data` directory, then run:

```sh
    make use-local-data
```