File size: 2,784 Bytes
eb58e9b
3198c3a
eb58e9b
 
3198c3a
 
eb58e9b
3198c3a
 
eb58e9b
3198c3a
 
eb58e9b
 
3198c3a
eb58e9b
3198c3a
eb58e9b
3198c3a
eb58e9b
3198c3a
eb58e9b
3198c3a
 
 
eb58e9b
3198c3a
eb58e9b
3198c3a
eb58e9b
3198c3a
 
 
eb58e9b
 
 
 
 
 
3198c3a
 
 
 
 
 
 
 
 
 
eb58e9b
 
 
3198c3a
 
 
 
 
 
 
 
 
 
 
 
 
eb58e9b
3198c3a
eb58e9b
3198c3a
eb58e9b
3198c3a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
language: id
license: apache-2.0
tags:
    - audio-classification
    - generated_from_trainer
metrics:
    - accuracy
    - f1
model-index:
    - name: wav2vec2-adult-child-id-cls
      results: []
---

# Wav2Vec2 Adult/Child Indonesian Speech Classifier

Wav2Vec2 Adult/Child Indonesian Speech Classifier is an audio classification model based on the [wav2vec 2.0](https://arxiv.org/abs/2006.11477) architecture. This model is a fine-tuned version of [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on a private adult/child Indonesian speech classification dataset.

This model was trained using HuggingFace's PyTorch framework. All training was done on a Tesla P100, provided by Kaggle. Training metrics were logged via Tensorboard.

## Model

| Model                         | #params | Arch.       | Training/Validation data (text)                      |
| ----------------------------- | ------- | ----------- | ---------------------------------------------------- |
| `wav2vec2-adult-child-id-cls` | 91M     | wav2vec 2.0 | Adult/Child Indonesian Speech Classification Dataset |

## Evaluation Results

The model achieves the following results on evaluation:

| Dataset                                      | Loss   | Accuracy | F1     |
| -------------------------------------------- | ------ | -------- | ------ |
| Adult/Child Indonesian Speech Classification | 0.2603 | 92.22%   | 0.9202 |

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

-   `learning_rate`: 3e-05
-   `train_batch_size`: 32
-   `eval_batch_size`: 32
-   `seed`: 42
-   `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08`
-   `lr_scheduler_type`: linear
-   `lr_scheduler_warmup_ratio`: 0.1
-   `gradient_accumulation_steps`: 1
-   `num_epochs`: 5

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |   F1   |
| :-----------: | :---: | :--: | :-------------: | :------: | :----: |
|    0.2415     |  1.0  | 305  |     0.2951      |  0.8804  | 0.8695 |
|     0.202     |  2.0  | 610  |     0.2392      |  0.9124  | 0.9081 |
|    0.2161     |  3.0  | 915  |     0.2508      |  0.9199  | 0.9161 |
|    0.1348     |  4.0  | 1220 |     0.2748      |  0.9153  | 0.9126 |
|     0.162     |  5.0  | 1525 |     0.2603      |  0.9222  | 0.9202 |

## Disclaimer

Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.

## Authors

Wav2Vec2 Adult/Child Indonesian Speech Classifier was trained and evaluated by [Ananto Joyoadikusumo](https://anantoj.github.io/). All computation and development are done on Kaggle.

## Framework versions

-   Transformers 4.18.0
-   Pytorch 1.11.0+cu102
-   Datasets 2.2.0
-   Tokenizers 0.12.1