--- tags: - generated_from_trainer metrics: - accuracy - f1 - recall - precision model-index: - name: mixed_model_finetuned_cremad results: [] --- [Visualize in Weights & Biases](https://wandb.ai/yassmenyoussef55-arete-global/huggingface/runs/gt6e5ppa) # mixed_model_finetuned_cremad This model is a fine-tuned version of wav2vec2 on audio stream part and pretrained resnet3d_101 on video stream part ,[](https://huggingface.co/) It was trained from scratch on [CremaD dataset](https://github.com/CheyneyComputerScience/CREMA-D). This dataset provides 7442 samples of recordings from actors performing on 6 different emotions in English, which are: ```python emotions = ['angry', 'disgust', 'fearful', 'happy', 'neutral', 'sad'] ``` It achieves the following results on the evaluation set: - Loss: 0.3098 - Accuracy: 0.8972 - F1: 0.8960 - Recall: 0.8972 - Precision: 0.8974 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - training_steps: 743 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Recall | Precision | |:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:------:|:---------:| | 0.7914 | 1.0 | 186 | 1.0595 | 0.7171 | 0.7074 | 0.7171 | 0.7536 | | 0.5971 | 2.0 | 372 | 0.4401 | 0.8414 | 0.8375 | 0.8414 | 0.8443 | | 0.2891 | 3.0 | 558 | 0.3863 | 0.8548 | 0.8539 | 0.8548 | 0.8622 | | 0.1833 | 3.9946 | 743 | 0.3098 | 0.8972 | 0.8960 | 0.8972 | 0.8974 | ### Framework versions - Transformers 4.42.3 - Pytorch 2.1.2 - Datasets 2.20.0 - Tokenizers 0.19.1