ardaatahan's picture
initial commit
1543414
|
raw
history blame
2.88 kB
metadata
title: WhisperKit Benchmarks
emoji: πŸ†
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: main.py
license: mit

Prerequisites

Ensure you have the following software installed:

  • Python 3.10 or higher
  • pip (Python package installer)

Installation

  1. Clone the repository:

    git clone https://github.com/argmaxinc/model-performance-dashboard.git
    cd model-performance-dashboard
    
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate
    
  3. Install required packages:

    pip install -r requirements.txt
    

Usage

  1. Run the application:

    gradio main.py
    
  2. Access the application: After running main.py, a local server will start, and you will see an interface URL in the terminal. Open the URL in your web browser to interact with Argmax Benchmark dashboard.

Data Generation

The data generation process involves three main scripts: performance_generate.py, multilingual_generate.py, and quality_generate.py. Each script is responsible for updating a specific aspect of the benchmark data.

  1. Performance Data Update (performance_generate.py):

    • Downloads benchmark data from WhisperKit Evals Dataset.
    • Processes the data to extract performance metrics for various models, devices, and operating systems.
    • Calculates metrics such as speed, tokens per second for long and short-form data.
    • Saves the results in performance_data.json and support_data.csv.
  2. Multilingual Data Update (multilingual_generate.py):

    • Downloads multilingual evaluation data from WhisperKit Multilingual Evals Dataset.
    • Processes the data to generate confusion matrices for language detection.
    • Calculates metrics for both forced and unforced language detection scenarios.
    • Saves the results in multilingual_confusion_matrices.json and multilingual_results.csv.
  3. Quality Data Update (quality_generate.py):

    • Downloads quality evaluation data from WhisperKit Evals.
    • Processes the data to calculate Word Error Rate (WER) and Quality of Inference (QoI) metrics for each dataset.
    • Saves the results in quality_data.json.

Data Update

To update the dashboard with latest data from our HuggingFace datasets, run:

    make use-huggingface-data

Alternatively, you can use our on-device testing code [TODO:INSERT_LINK_TO_OS_TEST_CODE] on your device to update the dashboard with your own data. After generating the Xcode data, place the resulting .json files in the whisperkit-evals/xcresults/benchmark_data directory, then run:

    make use-local-data