ardaatahan's picture
initial commit
1543414
|
raw
history blame
2.88 kB
---
title: WhisperKit Benchmarks
emoji: πŸ†
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: main.py
license: mit
---
## Prerequisites
Ensure you have the following software installed:
- Python 3.10 or higher
- pip (Python package installer)
## Installation
1. **Clone the repository**:
```sh
git clone https://github.com/argmaxinc/model-performance-dashboard.git
cd model-performance-dashboard
```
2. **Create a virtual environment**:
```sh
python -m venv venv
source venv/bin/activate
```
3. **Install required packages**:
```sh
pip install -r requirements.txt
```
## Usage
1. **Run the application**:
```sh
gradio main.py
```
2. **Access the application**:
After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard.
## Data Generation
The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data.
1. **Performance Data Update (performance_generate.py)**:
- Downloads benchmark data from [WhisperKit Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset).
- Processes the data to extract performance metrics for various models, devices, and operating systems.
- Calculates metrics such as speed, tokens per second for long and short-form data.
- Saves the results in `performance_data.json` and `support_data.csv`.
2. **Multilingual Data Update (multilingual_generate.py)**:
- Downloads multilingual evaluation data from [WhisperKit Multilingual Evals Dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-multilingual).
- Processes the data to generate confusion matrices for language detection.
- Calculates metrics for both forced and unforced language detection scenarios.
- Saves the results in `multilingual_confusion_matrices.json` and `multilingual_results.csv`.
3. **Quality Data Update (quality_generate.py)**:
- Downloads quality evaluation data from [WhisperKit Evals](https://huggingface.co/datasets/argmaxinc/whisperkit-evals).
- Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset.
- Saves the results in `quality_data.json`.
## Data Update
To update the dashboard with latest data from our HuggingFace datasets, run:
```sh
make use-huggingface-data
```
Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting `.json` files in the `whisperkit-evals/xcresults/benchmark_data` directory, then run:
```sh
make use-local-data
```