metadata

language: vi
datasets:
  - vivos
  - common_voice
metrics:
  - wer
pipeline_tag: automatic-speech-recognition
tags:
  - audio
  - speech
  - Transformer
license: cc-by-nc-4.0
model-index:
  - name: Wav2vec2 Base Vietnamese 160h
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice vi
          type: common_voice
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 0
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8.0
          type: mozilla-foundation/common_voice_8_0
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 0
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: VIVOS
          type: vivos
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 0

FINETUNE WAV2VEC 2.0 FOR SPEECH RECOGNITION

Documentation
Installation
Usage
Logs and Visualization

Documentation

Suppose you need a simple way to fine-tune the Wav2vec 2.0 model for the task of Speech Recognition on your datasets, then you came to the right place.
All documents related to this repo can be found here:

Installation

pip install -r requirements.txt

Usage

Prepare your dataset
- Your dataset can be in .txt or .csv format.
- path and transcript columns are compulsory. The path column contains the paths to your stored audio files, depending on your dataset location, it can be either absolute paths or relative paths. The transcript column contains the corresponding transcripts to the audio paths.
- Check out our data_example.csv file for more information.
Configure the config.toml file

Run

Start training:
```
python train.py -c config.toml
```
Continue to train from resume:
```
python train.py -c config.toml -r
```

Load specific model and start training:

python train.py -c config.toml -p path/to/your/model.tar

Logs and Visualization

The logs during the training will be stored, and you can visualize it using TensorBoard by running this command:

# specify the <name> in config.json
tensorboard --logdir ~/saved/<name>

# specify a port 8080
tensorboard --logdir ~/saved/<name> --port 8080

khanhld
/

wav2vec2-base-vietnamese-160h

FINETUNE WAV2VEC 2.0 FOR SPEECH RECOGNITION

Table of contents

Documentation

Installation

Usage

Logs and Visualization