alefiury commited on
Commit
bb96b52
1 Parent(s): 0a5d4dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -1
README.md CHANGED
@@ -7,6 +7,8 @@ metrics:
7
  model-index:
8
  - name: weights
9
  results: []
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -19,6 +21,109 @@ It achieves the following results on the evaluation set:
19
  - Loss: 0.0061
20
  - F1: 0.9993
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Training and evaluation data
23
 
24
  The Librispeech-clean-100 dataset was used to train the model, with 70% of the data used for training, 10% for validation, and 20% for testing.
@@ -49,4 +154,4 @@ The following hyperparameters were used during training:
49
 
50
  - Transformers 4.28.0
51
  - Pytorch 2.0.0+cu118
52
- - Tokenizers 0.13.3
 
7
  model-index:
8
  - name: weights
9
  results: []
10
+ datasets:
11
+ - librispeech_asr
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
21
  - Loss: 0.0061
22
  - F1: 0.9993
23
 
24
+ ### Compute your inferences
25
+
26
+ ```python
27
+ class DataColletor:
28
+ def __init__(
29
+ self,
30
+ processor: Wav2Vec2Processor,
31
+ sampling_rate: int = 16000,
32
+ padding: Union[bool, str] = True,
33
+ max_length: Optional[int] = None,
34
+ pad_to_multiple_of: Optional[int] = None,
35
+ label2id: Dict = None,
36
+ max_audio_len: int = 5
37
+ ):
38
+
39
+ self.processor = processor
40
+ self.sampling_rate = sampling_rate
41
+
42
+ self.padding = padding
43
+ self.max_length = max_length
44
+ self.pad_to_multiple_of = pad_to_multiple_of
45
+
46
+ self.label2id = label2id
47
+
48
+ self.max_audio_len = max_audio_len
49
+
50
+ def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
51
+ # split inputs and labels since they have to be of different lenghts and need
52
+ # different padding methods
53
+ input_features = []
54
+ label_features = []
55
+ for feature in features:
56
+ speech_array, sampling_rate = torchaudio.load(feature["input_values"])
57
+
58
+ # Transform to Mono
59
+ speech_array = torch.mean(speech_array, dim=0, keepdim=True)
60
+
61
+ if sampling_rate != self.sampling_rate:
62
+ transform = torchaudio.transforms.Resample(sampling_rate, self.sampling_rate)
63
+ speech_array = transform(speech_array)
64
+ sampling_rate = self.sampling_rate
65
+
66
+ effective_size_len = sampling_rate * self.max_audio_len
67
+
68
+ if speech_array.shape[-1] > effective_size_len:
69
+ speech_array = speech_array[:, :effective_size_len]
70
+
71
+ speech_array = speech_array.squeeze().numpy()
72
+ input_tensor = self.processor(speech_array, sampling_rate=sampling_rate).input_values
73
+ input_tensor = np.squeeze(input_tensor)
74
+
75
+ input_features.append({"input_values": input_tensor})
76
+
77
+ batch = self.processor.pad(
78
+ input_features,
79
+ padding=self.padding,
80
+ max_length=self.max_length,
81
+ pad_to_multiple_of=self.pad_to_multiple_of,
82
+ return_tensors="pt",
83
+ )
84
+
85
+ return batch
86
+
87
+
88
+ label2id = {
89
+ "female": 0,
90
+ "male": 1
91
+ }
92
+
93
+ id2label = {
94
+ 0: "female",
95
+ 1: "male"
96
+ }
97
+
98
+ num_labels = 2
99
+
100
+ feature_extractor = AutoFeatureExtractor.from_pretrained("alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech")
101
+ model = AutoModelForAudioClassification.from_pretrained(
102
+ pretrained_model_name_or_path="alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech",
103
+ num_labels=num_labels,
104
+ label2id=label2id,
105
+ id2label=id2label,
106
+ )
107
+
108
+ data_collator = DataColletorTrain(
109
+ feature_extractor,
110
+ sampling_rate=16000,
111
+ padding=True,
112
+ label2id=label2id
113
+ )
114
+
115
+ test_dataloader = DataLoader(
116
+ dataset=test_dataset,
117
+ batch_size=16,
118
+ collate_fn=data_collator,
119
+ shuffle=False,
120
+ num_workers=10
121
+ )
122
+
123
+ preds = predict(test_dataloader=test_dataloader, model=model)
124
+ ```
125
+
126
+
127
  ## Training and evaluation data
128
 
129
  The Librispeech-clean-100 dataset was used to train the model, with 70% of the data used for training, 10% for validation, and 20% for testing.
 
154
 
155
  - Transformers 4.28.0
156
  - Pytorch 2.0.0+cu118
157
+ - Tokenizers 0.13.3