File size: 3,054 Bytes

---
language: id
license: apache-2.0
tags:
    - audio-classification
    - generated_from_trainer
metrics:
    - accuracy
    - f1
model-index:
    - name: distil-wav2vec2-adult-child-id-cls-52m
      results: []
---

# DistilWav2Vec2 Adult/Child Indonesian Speech Classifier 52M

DistilWav2Vec2 Adult/Child Indonesian Speech Classifier is an audio classification model based on the [wav2vec 2.0](https://arxiv.org/abs/2006.11477) architecture. This model is a distilled version of [wav2vec2-adult-child-id-cls](https://huggingface.co/bookbot/wav2vec2-adult-child-id-cls) on a private adult/child Indonesian speech classification dataset.

This model was trained using HuggingFace's PyTorch framework. All training was done on a Tesla P100, provided by Kaggle. Training metrics were logged via Tensorboard.

## Model

| Model                                    | #params | Arch.       | Training/Validation data (text)                      |
| ---------------------------------------- | ------- | ----------- | ---------------------------------------------------- |
| `distil-wav2vec2-adult-child-id-cls-52m` | 52m     | wav2vec 2.0 | Adult/Child Indonesian Speech Classification Dataset |

## Evaluation Results

The model achieves the following results on evaluation:

| Dataset                                      | Loss   | Accuracy | F1     |
| -------------------------------------------- | ------ | -------- | ------ |
| Adult/Child Indonesian Speech Classification | 0.1560 | 94.89%   | 0.9480 |

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

-   `learning_rate`: 3e-05
-   `train_batch_size`: 32
-   `eval_batch_size`: 32
-   `seed`: 42
-   `gradient_accumulation_steps`: 4
-   `total_train_batch_size`: 128
-   `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08`
-   `lr_scheduler_type`: linear
-   `lr_scheduler_warmup_ratio`: 0.1
-   `num_epochs`: 7

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |   F1   |
| :-----------: | :---: | :--: | :-------------: | :------: | :----: |
|    0.2494     |  1.0  |  76  |     0.1706      |  0.9454  | 0.9421 |
|    0.2015     |  2.0  | 152  |     0.1519      |  0.9483  | 0.9464 |
|    0.1674     |  3.0  | 228  |     0.1560      |  0.9489  | 0.9480 |
|    0.1596     |  4.0  | 304  |     0.1760      |  0.9449  | 0.9414 |
|    0.0873     |  5.0  | 380  |     0.1825      |  0.9478  | 0.9452 |
|    0.0996     |  6.0  | 456  |     0.1733      |  0.9478  | 0.9460 |
|    0.1055     |  7.0  | 532  |     0.1749      |  0.9454  | 0.9433 |

## Disclaimer

Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.

## Authors

DistilWav2Vec2 Adult/Child Indonesian Speech Classifier was trained and evaluated by [Ananto Joyoadikusumo](https://anantoj.github.io/). All computation and development are done on Kaggle.

### Framework versions

-   Transformers 4.16.2
-   Pytorch 1.10.2+cu102
-   Datasets 1.18.3
-   Tokenizers 0.10.3