Masioki's picture
Update README.md
ed9c1f2 verified
metadata
tags:
  - generated_from_trainer
model-index:
  - name: prosody_gttbsc_distilbert-uncased-pitch
    results:
      - task:
          type: dialogue act classification
        dataset:
          name: asapp/slue-phase-2
          type: hvb
        metrics:
          - name: F1 macro E2E
            type: F1 macro
            value: 65.33
          - name: F1 macro GT
            type: F1 macro
            value: 71.78
datasets:
  - asapp/slue-phase-2
language:
  - en
metrics:
  - f1-macro

prosody_gttbsc_distilbert-uncased-pitch

Ground truth text with prosody encoding residual cross attention multi-label DAC

Model description

Prosody encoder: 2 layer transformer encoder with initial dense projection
Backbone: DistilBert uncased
Pooling: Self attention
Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween

Training and evaluation data

Trained on ground truth.
Evaluated on ground truth (GT) and normalized Whisper small transcripts (E2E).

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1