Jasper881108
/

whisper-medium-zh

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Whisper Medium TW

This model is a fine-tuned version of openai/whisper-medium on the mozilla-foundation/common_voice_11_0 dataset.

Training and evaluation data

Training:

mozilla-foundation/common_voice_11_0 (train+validation)

Evaluation:

mozilla-foundation/common_voice_11_0 (test)

Training procedure

Datasets were augmented using audiomentations via PitchShift, TimeStretch, Gain, AddGaussianNoise transformations at p=0.3.
A space is added between each Chinese character, as demonstrated in the original paper. Effectively, WER == CER in this case.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
gradient_accumulation_steps: 32
optimizer: Adam
generation_max_length: 225,
warmup_steps: 200
max_steps: 2000,
fp16: True,
evaluation_strategy: "steps",

Framework versions

Transformers 4.27.1
Pytorch 2.0.1+cu120
Datasets 2.13.1

Downloads last month: 2

Inference Examples

Automatic Speech Recognition

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Jasper881108/whisper-medium-zh

Evaluation results

WER on mozilla-foundation/common_voice_11_0
test set self-reported

7.380

View on Papers With Code