andrei-saceleanu commited on
Commit
4837300
1 Parent(s): 799a68f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -20
README.md CHANGED
@@ -1,47 +1,43 @@
1
  ---
2
  license: apache-2.0
3
- tags:
4
- - generated_from_keras_callback
5
  model-index:
6
  - name: vit-base-vocalsound-logmel
7
  results: []
8
  ---
9
 
10
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
11
- probably proofread and complete it, then remove this comment. -->
12
-
13
  # vit-base-vocalsound-logmel
14
 
15
- This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
 
18
-
19
- ## Model description
20
-
21
- More information needed
22
-
23
- ## Intended uses & limitations
24
-
25
- More information needed
26
 
27
  ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
32
 
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
- - optimizer: None
 
 
 
37
  - training_precision: float32
38
 
39
- ### Training results
40
-
41
 
 
 
42
 
43
  ### Framework versions
44
 
45
  - Transformers 4.27.4
46
  - TensorFlow 2.12.0
47
- - Tokenizers 0.13.3
 
1
  ---
2
  license: apache-2.0
 
 
3
  model-index:
4
  - name: vit-base-vocalsound-logmel
5
  results: []
6
  ---
7
 
 
 
 
8
  # vit-base-vocalsound-logmel
9
 
10
+ This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on [VocalSound](https://github.com/YuanGongND/vocalsound) dataset.
11
  It achieves the following results on the evaluation set:
12
 
13
+ - accuracy: 88.8
14
+ - precision (micro): 91.3
15
+ - recall (micro): 87.1
16
+ - f1 score (micro): 89.1
17
+ - f1 score (macro): 89.1
 
 
 
18
 
19
  ## Training and evaluation data
20
 
21
+ Training: VocalSound training split (#samples = 15570)
22
 
23
+ Evaluation: VocalSound test split(#samples = 3594)
24
 
25
  ### Training hyperparameters
26
 
27
  The following hyperparameters were used during training:
28
+ - optimizer: AdamW
29
+ - weight_decay: 0
30
+ - learning_rate: 5e-5
31
+ - batch_size: 32
32
  - training_precision: float32
33
 
34
+ ### Preprocessing
 
35
 
36
+ Differently from [vit-base-vocalsound](https://huggingface.co/andrei-saceleanu/vit-base-vocalsound), the log-melspectrogram is used(log was applied as an addition) and the preprocessor normalization
37
+ step uses VocalSound statistics(i.e. mean and std) instead of the default IMAGENET ones.
38
 
39
  ### Framework versions
40
 
41
  - Transformers 4.27.4
42
  - TensorFlow 2.12.0
43
+ - Tokenizers 0.13.3