marinone94 commited on
Commit
8da9416
·
1 Parent(s): 2d9b52e

first commit

Browse files
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ *wandb/
huggingface_training.py ADDED
@@ -0,0 +1,372 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os # used to create output directory
2
+ from dataclasses import dataclass # used to define data collator
3
+ from math import ceil # used to round up decimals
4
+
5
+ import evaluate # used to import and compute evaluation metrics
6
+ import torch # used to know if a GPU with CUDA is available
7
+ import wandb # used for experiment tracking
8
+ from datasets import IterableDatasetDict, load_dataset # used to load the dataset in streaming mode
9
+ from transformers import (
10
+ AutoConfig, # used to load model configurations
11
+ AutoModelForSpeechSeq2Seq, # used to load the model architecture and weights
12
+ AutoProcessor, # used to load the Whisper processor, which includes a feature extractor and a tokenizer
13
+ Seq2SeqTrainer, # used to perform training and evaluation loops
14
+ Seq2SeqTrainingArguments, # used to define training hyperparameters
15
+ TrainerCallback, # used to shuffle the training data after each epoch
16
+ WhisperProcessor # used for static data typing
17
+ )
18
+ from transformers import set_seed # used for reproducibility
19
+ from transformers.models.whisper.english_normalizer import BasicTextNormalizer # used to normalize transcript and reference before evaluation
20
+ from transformers.trainer_pt_utils import IterableDataset, IterableDatasetShard # used to shuffle the training data after each epoch
21
+
22
+ """Then, we will load processor, model configuration, architecture and weights, and the dataset (in streaming mode). The English split of Fleurs is not a massive dataset, thus we could easily download it and store it in memory, but it is good to learn how to use the streaming mode if you were to fine-tune your model on larger datasets. """
23
+
24
+ model_id = "openai/whisper-tiny"
25
+ processor = AutoProcessor.from_pretrained(model_id)
26
+ config = AutoConfig.from_pretrained(model_id)
27
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
28
+
29
+ dataset_id = "google/fleurs"
30
+ dataset_language_code = "en_us"
31
+ dataset = load_dataset(dataset_id, dataset_language_code, streaming=True)
32
+
33
+ """The first time you run this code, make sure everything works fine using a small sample and low number of training steps. Just uncomment the next cell and run it. One note: since the dataset is loaded in streaming mode, the instruction will not be executed immediately. Instead, the dataset will be subsampled only when data will be needed during training."""
34
+
35
+ test_script = True
36
+ # test_script = False
37
+
38
+ ## Sample dataset for testing
39
+ if test_script is True:
40
+ dataset["train"] = dataset["train"].shuffle(seed=42).take(8)
41
+ dataset["validation"] = dataset["validation"].shuffle(seed=42).take(4)
42
+ dataset["test"] = dataset["test"].shuffle(seed=42).take(4)
43
+
44
+ """The raw dataset is not yet ready for training. As described in my first about Whisper, the input audio waveform needs to be transformed into a Log-mel Spectrogram. I recommend you to read the [Audio Preprocessing section](https://marinone94.github.io/Whisper-paper/#audio-preprocessing) to understand the process. For the scope of this article, you should just know that the audio is translated from the time domain to its frequency representation using a sliding window, and adjusted to simulate human hearing. The Whisper Feature Extractor included in the Whisper Processor will take care of the rest.
45
+
46
+ Furthermore, the reference transcripts need to be tokenized, since the model outputs one token at the time and they are used to compute the loss during training. Again, the Tokenizer will take care of that, but the task needs to be included in the preprocessing step.
47
+
48
+ When we introduced the WER metric, we learned about the importance of normalizing the texts. But should we do that also before training? That is up to you, but you should remember that Whisper models have been pretrained to predict Capitalization, digits, and punctuation. So if you normalize the reference teanscripts before fine-tuning, you will teach model not to predict capital letters, digits, and punctuations. This does not mean that the model will never predict them, since it has been extensively pretrained to do so. To wrap up, your choice should depend on the final application and the dataset size, but in general I recommend not to normalize the references before training.
49
+
50
+ Finally, by storing the input features in the default model input name, the trainer will automatically pick the correct ones during training. Thus, don't hard-code it!
51
+ """
52
+
53
+ normalizer = BasicTextNormalizer()
54
+ # model_input_name = 'input_features'
55
+ model_input_name = processor.feature_extractor.model_input_names[0]
56
+
57
+ def prepare_dataset(batch, normalize=False):
58
+ # process audio
59
+ sample = batch["audio"]
60
+ inputs = processor.feature_extractor(sample["array"], sampling_rate=sample["sampling_rate"])
61
+ # process audio length
62
+ batch[model_input_name] = inputs.get(model_input_name)[0]
63
+ batch["input_length"] = len(sample["array"])
64
+
65
+ # process targets
66
+ if normalize is True:
67
+ labels = batch["raw_transcription"].lower()
68
+ labels = normalizer(labels).strip()
69
+ else:
70
+ labels = batch["raw_transcription"].strip()
71
+ batch["labels"] = processor.tokenizer(labels).input_ids
72
+ return batch
73
+
74
+ """We will use the `.map` method to apply our preprocessing function to the whole dataset. At the same time, we will drop all the columns which are not strictly needed during training. Since `input_features`, `ìnput_length` and `labels` are not features of the raw dataset, we can remove all the original ones. Finally, we will convert the dataset features to `torch` type since the dataset has no `__len__`property (again, we are in streaming mode). """
75
+
76
+ # dataset["train"].features is like a dict
77
+ # train, validation and test splits have the same features
78
+ raw_datasets_features = list(dataset["train"].features.keys())
79
+ preprocessed_dataset = IterableDatasetDict()
80
+
81
+ preprocessed_dataset["train"] = dataset["train"].map(
82
+ prepare_dataset,
83
+ remove_columns=raw_datasets_features,
84
+ fn_kwargs={"normalize": False}, # needed only if default value and provided value differ
85
+ ).with_format("torch")
86
+ preprocessed_dataset["validation"] = dataset["validation"].map(
87
+ prepare_dataset,
88
+ remove_columns=raw_datasets_features,
89
+ fn_kwargs={"normalize": False}, # reference transripts are normalized in the evaluation function
90
+ ).with_format("torch")
91
+ preprocessed_dataset["test"] = dataset["test"].map(
92
+ prepare_dataset,
93
+ remove_columns=raw_datasets_features,
94
+ fn_kwargs={"normalize": False}, # reference transripts are normalized in the evaluation function
95
+ ).with_format("torch")
96
+
97
+ """Since we want to evaluate our model on the validation set during training, we also need to provide a method that computes the metrics given the model predictions. It looks very similar to the function we introduced above, but since it will receive a single prediction object, we need to extract the predicted tokens and the corresponding labels. Furthermore, we replace the label ids equal to -100 with the padding token. A couple of minutes of patience and you will understand why.
98
+
99
+ When decoding the prediction and the labels, we need to discard the special tokens. Those are used to force the model to perform specific tasks. You can read more [here](https://marinone94.github.io/Whisper-paper/#tasks).
100
+ """
101
+
102
+ metric = evaluate.load("wer")
103
+
104
+ def compute_metrics(pred):
105
+ # extract predicted tokens
106
+ pred_ids = pred.predictions
107
+ label_ids = pred.label_ids
108
+
109
+ # pad tokens will then be discarded by the tokenizer with all other special tokens
110
+ label_ids[label_ids == -100] = processor.tokenizer.pad_token_id
111
+
112
+ # decode transcripts and reference
113
+ pred_str = processor.batch_decode(pred_ids, skip_special_tokens=True)
114
+ label_str = processor.batch_decode(label_ids, skip_special_tokens=True)
115
+
116
+ # normalize transcript and reference
117
+ pred_str = [normalizer(pred) for pred in pred_str]
118
+ label_str = [normalizer(label) for label in label_str]
119
+
120
+ # only evaluate the samples that correspond to non-zero references
121
+ pred_str = [pred_str[i] for i in range(len(pred_str)) if len(label_str[i]) > 0]
122
+ label_str = [label_str[i] for i in range(len(label_str)) if len(label_str[i]) > 0]
123
+
124
+ # express WER as percentage
125
+ wer = 100 * metric.compute(predictions=pred_str, references=label_str)
126
+
127
+ return {"wer": wer}
128
+
129
+ """Alright, we are almost done preparing our dataset. Quite a lot of work, I know, but that is most of the job.
130
+
131
+ The last step is to define a data collator, which will build data btaches from the datasets during training using the Whisper Processor. It will also pad input features and labels.
132
+
133
+ Also, in the metrics computation method we replaced the labels with id equal to -100. It was done because the data collator **must** set the padding tokens to -100 so that the trainer will ignore them when computing the loss. That was the reverse step.
134
+ """
135
+
136
+ @dataclass
137
+ class DataCollatorSpeechSeq2SeqWithPadding:
138
+
139
+ processor: WhisperProcessor
140
+ decoder_start_token_id: int
141
+
142
+ def __call__(self, features):
143
+ # split inputs and labels since they have to be of different lengths and need
144
+ # different padding methods
145
+ model_input_name = self.processor.model_input_names[0]
146
+ input_features = [{model_input_name: feature[model_input_name]} for feature in features]
147
+ label_features = [{"input_ids": feature["labels"]} for feature in features]
148
+
149
+ batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")
150
+
151
+ labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")
152
+
153
+ # replace padding with -100 to ignore loss correctly
154
+ labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)
155
+
156
+ # if bos token is appended in previous tokenization step,
157
+ # cut bos token here as it's append later anyways
158
+ if (labels[:, 0] == self.decoder_start_token_id).all().cpu().item():
159
+ labels = labels[:, 1:]
160
+
161
+ batch["labels"] = labels
162
+
163
+ return batch
164
+
165
+ data_collator = DataCollatorSpeechSeq2SeqWithPadding(
166
+ processor=processor,
167
+ decoder_start_token_id=model.config.decoder_start_token_id,
168
+ )
169
+
170
+ """Next step was something I would have definitely missed I had not attended the 🤗 Whisper Fine-Tuning Event. Thanks, guys, I learned a ton!
171
+
172
+ Still, there is something misterious to me, so I would love if someone explained it to me. Streaming datasets are not automatically shuffled after each epoch, therefore we define a Callback to do so. However, if we set the number of epochs in the Training Arguments (which we will see shortly), the Trainer complains that the datset has no length, and it asks us to define the maximum number of training steps. So, will this Callback ever be used? Or the Trainer will not be aware of having completed an epoch? Thanks in advance to whoever will clarify this to me!
173
+ """
174
+
175
+ # Trainer callback to reinitialise and reshuffle the streamable datasets at the beginning of each epoch
176
+ # Only required for streaming: Trainer automatically shuffles non-streaming datasets
177
+ class ShuffleCallback(TrainerCallback):
178
+ def on_epoch_begin(self, args, state, control, train_dataloader, **kwargs):
179
+ if isinstance(train_dataloader.dataset, IterableDatasetShard):
180
+ pass # set_epoch() is handled by the Trainer
181
+ elif isinstance(train_dataloader.dataset, IterableDataset):
182
+ train_dataloader.dataset.set_epoch(train_dataloader.dataset._epoch + 1)
183
+
184
+ """We are finally done preparing our data! But do you remember that Whisper is a multi-task Speech Recognition model? And that the task is simply induced using special prefix tokens? Good, now it is time to instruct the model. To do so, we can set those special tokens using the Tokenizer embedded in the Processor.
185
+
186
+ In our specific case, we could skip this step since English transcription is the default behaviour. Still, this is how you would do if you were in a multilingual setting.
187
+ """
188
+
189
+ processor.tokenizer.set_prefix_tokens(language="en", task="transcribe")
190
+
191
+ ## If you wanted to transcribe in Swedish
192
+ ## (Of course, you'd need a Swedish dataset)
193
+ # processor.tokenizer.set_prefix_tokens(language="sv", task="transcribe")
194
+
195
+ ## If you wanted to get an English transcription from Swedish audio
196
+ # processor.tokenizer.set_prefix_tokens(language="sv", task="translate")
197
+
198
+ """(Here you can see what happens if we define only the number of epochs. Scroll down a bit to see explanation and working implementation of Training Arguments and Trainer)."""
199
+
200
+ # output_dir = "./model"
201
+ # os.makedirs(output_dir, exist_ok=True)
202
+ # training_args = Seq2SeqTrainingArguments(
203
+ # output_dir=output_dir,
204
+ # num_train_epochs=2,
205
+ # do_train=True,
206
+ # do_eval=True,
207
+ # evaluation_strategy="steps",
208
+ # eval_steps=1,
209
+ # logging_strategy="steps",
210
+ # logging_steps=1,
211
+ # per_device_train_batch_size=4,
212
+ # per_device_eval_batch_size=2
213
+ # )
214
+
215
+ # Initialize Trainer
216
+ # trainer = Seq2SeqTrainer(
217
+ # model=model,
218
+ # args=training_args,
219
+ # train_dataset=preprocessed_dataset["train"],
220
+ # eval_dataset=preprocessed_dataset["validation"],
221
+ # tokenizer=processor.feature_extractor,
222
+ # data_collator=data_collator,
223
+ # compute_metrics=compute_metrics,
224
+ # callbacks=[ShuffleCallback()]
225
+ # )
226
+
227
+ """Cool, we are almost ready for training! Let's define (and create, if missing) the output directory and define some Training Arguments. You can read about all the parameterse on the [🤗 docs](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Seq2SeqTrainingArguments).
228
+
229
+ Here, we will instruct the trainer to both train and evaluate the model, define how often metrics should be logged, evaluation should be performed on the evaluation set, model saved, and what batch size to use. The model - in this configuration - **will not be** pushed to the 🤗 hub since it is quite slow. Make sure to authenticate, create a repo and push your model if you train a large model, or use a large dataset!
230
+
231
+ We will also use mixed precision (16-bit floating point, or fp16) if we are running our training on a GPU.
232
+
233
+ We will also instruct the model to use the `generate` method for evaluation. That method is used for inference, and it applies a decoding technique to the predicted logits. In this case, it will use greedy search, since we set the number of beams to 1. I briefly introduced decoding algorithgms in the [Decoder paragraph](https://marinone94.github.io/Whisper-paper/#decoder) of my first article, but for now you can simply think of it as selecting the next token as the highest probability, after applying a softmax to the logits. I am considering writing a post about the impact of decoding algorithms on Whisper performance, so let me know you are interested!
234
+
235
+ Last, we can track our training using several experiment tracking tools. I use Weights and Biases - great tool, you should definitely have a look - but 🤗 supports also "azure_ml", "comet_ml", "mlflow", "neptune" and "tensorboard". You can use "all" (default) to report to all integrations installed, "none" for no integrations. Since WandB is installed in this environment, you should explicitely set it to "none" if you don't have an account.
236
+ """
237
+
238
+ ## If you don't want to track your experiment with WandB, run this!
239
+ os.environ["WANDB_DISABLED"] = "true"
240
+ report_to = "none"
241
+
242
+ # If you have a wandb account, login!
243
+ # Otherwise, edit this cell to loging with your favourite experiment tracker(s)
244
+ # wandb.login()
245
+ # wandb.init(project="whisper-training-post")
246
+ # report_to = "wandb"
247
+
248
+ # Define (and create, if missing) output directory
249
+ output_dir = "./output"
250
+ # os.makedirs(output_dir, exist_ok=True)
251
+
252
+ # Check if we have a GPU.
253
+ # In case, we will use mixed precision
254
+ # to reduce memory footprint with
255
+ # with minimal to no harm to performance
256
+ device = "cuda" if torch.cuda.is_available() else "cpu"
257
+ use_fp16 = (device == "cuda")
258
+
259
+ # Let's first define the batch sizes
260
+ # Increase it if you have more than 16GB GPU
261
+ train_bs = 4 if test_script is True else 64
262
+ eval_bs = 2 if test_script is True else 32
263
+
264
+ # Then we infer the number of steps
265
+ # TODO: how did I find it?
266
+ num_training_samples = 2602
267
+ num_epochs = 5
268
+ max_steps_full_training = ceil(num_training_samples * num_epochs / train_bs)
269
+ max_steps = 2 if test_script is True else max_steps_full_training
270
+
271
+ # We don't want to evaluate too often since it slows down training a lot
272
+ eval_steps = 1 if test_script is True else int(max_steps / 5)
273
+ logging_steps = 1 if test_script is True else int(max_steps / 100)
274
+
275
+ training_args = Seq2SeqTrainingArguments(
276
+ output_dir=output_dir,
277
+ do_train=True,
278
+ do_eval=True,
279
+ max_steps=max_steps,
280
+ evaluation_strategy="steps",
281
+ eval_steps=eval_steps,
282
+ logging_strategy="steps",
283
+ logging_steps=logging_steps,
284
+ save_strategy="steps",
285
+ save_steps=eval_steps,
286
+ save_total_limit=2,
287
+ learning_rate=1e-5,
288
+ warmup_ratio=0.5 if test_script is True else 0.2,
289
+ per_device_train_batch_size=train_bs,
290
+ per_device_eval_batch_size=eval_bs,
291
+ # important
292
+ fp16=use_fp16,
293
+ predict_with_generate=True,
294
+ generation_num_beams=1,
295
+ # track experiment
296
+ report_to=report_to, # edit this line to track with your favourite experiment tracker(s)
297
+ )
298
+
299
+ """Now we can provide the trainer with the model, tokenizer (important: use the one you set language and task to! In this example, it is `processor.tokenizer`), training arguments, datasets, data collator, callback, and the method to compute metrics during evaluation.
300
+
301
+ Note that we don't need to place the model to the accelerator device, nor we had to do it in the data collator with the dataset! The trainer will take care of it, if a GPU is available.
302
+ """
303
+
304
+ # Initialize Trainer
305
+ trainer = Seq2SeqTrainer(
306
+ model=model,
307
+ args=training_args,
308
+ train_dataset=preprocessed_dataset["train"],
309
+ eval_dataset=preprocessed_dataset["validation"],
310
+ tokenizer=processor.feature_extractor,
311
+ data_collator=data_collator,
312
+ compute_metrics=compute_metrics,
313
+ callbacks=[ShuffleCallback()]
314
+ )
315
+
316
+ """Let's
317
+
318
+ I hope you haven't left yet. If you have, bad for you, as we are ready for training our model! 🍾
319
+ As Whisper is a pretrained model ready to be used off-the-shelf, it is advisable to evaluate it before training on both the validation and test sets. Let's make sure we make no harm to it.
320
+ """
321
+
322
+ # eval_metrics = trainer.evaluate(
323
+ # eval_dataset=preprocessed_dataset["validation"],
324
+ # metric_key_prefix="eval",
325
+ # max_length=448,
326
+ # num_beams=1,
327
+ # # gen_kwargs={"key": value} to provide additional generation specific arguments by keyword
328
+ # )
329
+
330
+ # trainer.log_metrics("eval", eval_metrics)
331
+ # trainer.save_metrics("eval", eval_metrics)
332
+ # print(eval_metrics)
333
+
334
+ # test_metrics = trainer.evaluate(
335
+ # eval_dataset=preprocessed_dataset["test"],
336
+ # metric_key_prefix="test",
337
+ # max_length=448,
338
+ # num_beams=1,
339
+ # # gen_kwargs={"key": value} to provide additional generation specific arguments by keyword
340
+ # )
341
+
342
+ # trainer.log_metrics("test", test_metrics)
343
+ # trainer.save_metrics("test", test_metrics)
344
+ # print(test_metrics)
345
+
346
+ # train_result = trainer.train()
347
+ trainer.save_model()
348
+
349
+ # metrics = train_result.metrics
350
+ # trainer.log_metrics("train", metrics)
351
+ # trainer.save_metrics("train", metrics)
352
+ # trainer.save_state()
353
+ # print(metrics)
354
+
355
+ """ADD SOMETHING ABOUT THE TRAINING.
356
+
357
+ Now let's evaluate the
358
+ """
359
+
360
+ # final_metrics = trainer.evaluate(
361
+ # eval_dataset=preprocessed_dataset["test"],
362
+ # metric_key_prefix="test",
363
+ # max_length=448,
364
+ # num_beams=1,
365
+ # # gen_kwargs={"key": value} to provide additional generation specific arguments by keyword
366
+ # )
367
+
368
+ # trainer.log_metrics("test", final_metrics)
369
+ # trainer.save_metrics("test", final_metrics)
370
+ # print(final_metrics)
371
+
372
+ trainer.push_to_hub()
output/config.json ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openai/whisper-tiny",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "architectures": [
6
+ "WhisperForConditionalGeneration"
7
+ ],
8
+ "attention_dropout": 0.0,
9
+ "begin_suppress_tokens": [
10
+ 220,
11
+ 50257
12
+ ],
13
+ "bos_token_id": 50257,
14
+ "d_model": 384,
15
+ "decoder_attention_heads": 6,
16
+ "decoder_ffn_dim": 1536,
17
+ "decoder_layerdrop": 0.0,
18
+ "decoder_layers": 4,
19
+ "decoder_start_token_id": 50258,
20
+ "dropout": 0.0,
21
+ "encoder_attention_heads": 6,
22
+ "encoder_ffn_dim": 1536,
23
+ "encoder_layerdrop": 0.0,
24
+ "encoder_layers": 4,
25
+ "eos_token_id": 50257,
26
+ "forced_decoder_ids": [
27
+ [
28
+ 1,
29
+ 50259
30
+ ],
31
+ [
32
+ 2,
33
+ 50359
34
+ ],
35
+ [
36
+ 3,
37
+ 50363
38
+ ]
39
+ ],
40
+ "init_std": 0.02,
41
+ "is_encoder_decoder": true,
42
+ "max_length": 448,
43
+ "max_source_positions": 1500,
44
+ "max_target_positions": 448,
45
+ "model_type": "whisper",
46
+ "num_hidden_layers": 4,
47
+ "num_mel_bins": 80,
48
+ "pad_token_id": 50257,
49
+ "scale_embedding": false,
50
+ "suppress_tokens": [
51
+ 1,
52
+ 2,
53
+ 7,
54
+ 8,
55
+ 9,
56
+ 10,
57
+ 14,
58
+ 25,
59
+ 26,
60
+ 27,
61
+ 28,
62
+ 29,
63
+ 31,
64
+ 58,
65
+ 59,
66
+ 60,
67
+ 61,
68
+ 62,
69
+ 63,
70
+ 90,
71
+ 91,
72
+ 92,
73
+ 93,
74
+ 359,
75
+ 503,
76
+ 522,
77
+ 542,
78
+ 873,
79
+ 893,
80
+ 902,
81
+ 918,
82
+ 922,
83
+ 931,
84
+ 1350,
85
+ 1853,
86
+ 1982,
87
+ 2460,
88
+ 2627,
89
+ 3246,
90
+ 3253,
91
+ 3268,
92
+ 3536,
93
+ 3846,
94
+ 3961,
95
+ 4183,
96
+ 4667,
97
+ 6585,
98
+ 6647,
99
+ 7273,
100
+ 9061,
101
+ 9383,
102
+ 10428,
103
+ 10929,
104
+ 11938,
105
+ 12033,
106
+ 12331,
107
+ 12562,
108
+ 13793,
109
+ 14157,
110
+ 14635,
111
+ 15265,
112
+ 15618,
113
+ 16553,
114
+ 16604,
115
+ 18362,
116
+ 18956,
117
+ 20075,
118
+ 21675,
119
+ 22520,
120
+ 26130,
121
+ 26161,
122
+ 26435,
123
+ 28279,
124
+ 29464,
125
+ 31650,
126
+ 32302,
127
+ 32470,
128
+ 36865,
129
+ 42863,
130
+ 47425,
131
+ 49870,
132
+ 50254,
133
+ 50258,
134
+ 50360,
135
+ 50361,
136
+ 50362
137
+ ],
138
+ "torch_dtype": "float32",
139
+ "transformers_version": "4.26.0.dev0",
140
+ "use_cache": true,
141
+ "vocab_size": 51865
142
+ }
output/preprocessor_config.json ADDED
The diff for this file is too large to render. See raw diff
 
output/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3843686519777a4550909e8bd4961dcf7425e7183295f03d09a433a271f0887
3
+ size 151098921
output/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:add8e65dc5f947661175e212b5f04c955afbda87e53b0a8fd21f73e377e0d106
3
+ size 3579
requirements.txt ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ absl-py==1.3.0
2
+ aiohttp==3.8.3
3
+ aiosignal==1.3.1
4
+ appdirs==1.4.4
5
+ async-timeout==4.0.2
6
+ attrs==22.1.0
7
+ audioread==3.0.0
8
+ boto3==1.26.27
9
+ botocore==1.29.27
10
+ cachetools==5.2.0
11
+ certifi==2022.12.7
12
+ cffi==1.15.1
13
+ charset-normalizer==2.1.1
14
+ click==8.1.3
15
+ contextlib2==21.6.0
16
+ datasets @ git+https://github.com/huggingface/datasets@6338ded243ccb495b53b1996cba0847bdc250aba
17
+ decorator==5.1.1
18
+ dill==0.3.6
19
+ docker-pycreds==0.4.0
20
+ evaluate==0.3.0
21
+ filelock==3.8.2
22
+ frozenlist==1.3.3
23
+ fsspec==2022.11.0
24
+ gitdb==4.0.10
25
+ GitPython==3.1.29
26
+ google-auth==2.15.0
27
+ google-auth-oauthlib==0.4.6
28
+ google-pasta==0.2.0
29
+ grpcio==1.51.1
30
+ huggingface-hub==0.11.1
31
+ idna==3.4
32
+ importlib-metadata==4.13.0
33
+ jiwer==2.5.1
34
+ jmespath==1.0.1
35
+ joblib==1.2.0
36
+ Levenshtein==0.20.2
37
+ librosa==0.9.2
38
+ llvmlite==0.39.1
39
+ Markdown==3.4.1
40
+ MarkupSafe==2.1.1
41
+ more-itertools==9.0.0
42
+ multidict==6.0.3
43
+ multiprocess==0.70.14
44
+ numba==0.56.4
45
+ numpy==1.23.5
46
+ oauthlib==3.2.2
47
+ packaging==20.9
48
+ pandas==1.5.2
49
+ pathos==0.3.0
50
+ pathtools==0.1.2
51
+ pooch==1.6.0
52
+ pox==0.3.2
53
+ ppft==1.7.6.6
54
+ promise==2.3
55
+ protobuf==3.20.3
56
+ protobuf3-to-dict==0.1.5
57
+ psutil==5.9.4
58
+ pyarrow==10.0.1
59
+ pyasn1==0.4.8
60
+ pyasn1-modules==0.2.8
61
+ pycparser==2.21
62
+ pyparsing==3.0.9
63
+ python-dateutil==2.8.2
64
+ pytz==2022.6
65
+ PyYAML==6.0
66
+ rapidfuzz==2.13.4
67
+ regex==2022.10.31
68
+ requests==2.28.1
69
+ requests-oauthlib==1.3.1
70
+ resampy==0.4.2
71
+ responses==0.18.0
72
+ rsa==4.9
73
+ s3transfer==0.6.0
74
+ sagemaker==2.121.1
75
+ schema==0.7.5
76
+ scikit-learn==1.2.0
77
+ scipy==1.9.3
78
+ sentry-sdk==1.11.1
79
+ setproctitle==1.3.2
80
+ shortuuid==1.0.11
81
+ six==1.16.0
82
+ smdebug-rulesconfig==1.0.1
83
+ smmap==5.0.0
84
+ soundfile==0.11.0
85
+ tensorboard==2.11.0
86
+ tensorboard-data-server==0.6.1
87
+ tensorboard-plugin-wit==1.8.1
88
+ threadpoolctl==3.1.0
89
+ tokenizers==0.13.2
90
+ torch==1.11.0
91
+ torchaudio==0.11.0
92
+ tqdm==4.64.1
93
+ transformers @ git+https://github.com/huggingface/transformers@9a6c6ef97fa5df4b1fb8dbc9e8c10ee3a9ed7e2a
94
+ typing-extensions==4.4.0
95
+ urllib3==1.26.13
96
+ wandb==0.13.6
97
+ Werkzeug==2.2.2
98
+ xxhash==3.1.0
99
+ yarl==1.8.2
100
+ zipp==3.11.0