File size: 8,052 Bytes
abbb14d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50316c7
4aae45b
 
 
 
 
 
 
dd575b8
 
 
 
204ff19
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
12/17/2022 00:49:12 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: True
12/17/2022 00:49:12 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=1000,
evaluation_strategy=steps,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_max_length=225,
generation_num_beams=None,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
greater_is_better=False,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=ales/whisper-base-belarusian,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=True,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=./runs/Dec17_00-49-12_129-213-88-66,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=50,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=6000,
metric_for_best_model=wer,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=./,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=64,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=True,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./,
save_on_each_node=False,
save_steps=1000,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=500,
weight_decay=0.0,
xpu_backend=None,
)
12/17/2022 00:49:12 - INFO - __main__ - Data parameters: DataTrainingArguments(dataset_name='mozilla-foundation/common_voice_11_0', dataset_config_name='be', max_train_samples=None, max_eval_samples=None, audio_column_name='audio', text_column_name='sentence', max_duration_in_seconds=30.0, min_duration_in_seconds=0.0, train_split_name='train', eval_split_name='validation', do_lower_case=False, do_remove_punctuation=False, do_normalize_eval=True, language='be', task='transcribe', shuffle_buffer_size=500, streaming_train=True, streaming_eval=False)
12/17/2022 00:49:12 - INFO - __main__ - Model parameters: ModelArguments(model_name_or_path='openai/whisper-base', config_name=None, tokenizer_name=None, feature_extractor_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=True, freeze_feature_encoder=False, freeze_encoder=False, forced_decoder_ids=None, suppress_tokens=None, model_index_name='Whisper Base Belarusian')
12/17/2022 00:49:12 - INFO - __main__ - output_dir already exists. will try to load last checkpoint.
12/17/2022 00:49:12 - INFO - __main__ - last_checkpoint is None. will try to read from training_args.resume_from_checkpoint
12/17/2022 00:49:12 - INFO - __main__ - last_checkpoint is None. resume_from_checkpoint is either None or not existing dir. will try to read from the model saved in the root of output_dir.
12/17/2022 00:49:12 - INFO - __main__ - dir is not empty, but contains only: ['src', '.gitattributes', 'train_20221217-004912.log', '.git', 'train_run_1.log']. it is OK - will start training
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset Infos from /home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - INFO - datasets.builder - Overwrite dataset info from restored data version.
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset Infos from /home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - INFO - datasets.builder - Overwrite dataset info from restored data version.
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f)
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:14 - INFO - __main__ - vectorizing dataset
12/17/2022 00:49:14 - INFO - __main__ - will preprocess data using None processes.
12/17/2022 00:49:16 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-31769dd5b5301183.arrow
12/17/2022 00:49:18 - WARNING - huggingface_hub.repository - /home/ubuntu/whisper-base-belarusian/./ is already a clone of https://huggingface.co/ales/whisper-base-belarusian. Make sure you pull the latest changes with `repo.git_pull()`.
12/17/2022 00:49:20 - INFO - __main__ - ShuffleCallback. shuffling train dataset. seed: 42. dataset epoch: 0
12/17/2022 02:39:16 - WARNING - datasets.download.streaming_download_manager - Got disconnected from remote data host. Retrying in 5sec [1/20]
12/17/2022 03:40:45 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['src/__pycache__/preprocess.cpython-38.pyc', 'src/__pycache__/run_speech_recognition_seq2seq_streaming.cpython-38.pyc']. This may take a bit of time if the files are large.
12/17/2022 16:02:39 - INFO - __main__ - ShuffleCallback. shuffling train dataset. seed: 42. dataset epoch: 1
12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream.
12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
12/17/2022 17:58:44 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow...        
remote: LFS file scan complete.        
To https://huggingface.co/ales/whisper-base-belarusian
   4074dad..52d55ef  main -> main

12/17/2022 17:59:11 - WARNING - huggingface_hub.repository - To https://huggingface.co/ales/whisper-base-belarusian
   52d55ef..4aae45b  main -> main

12/17/2022 17:59:14 - INFO - __main__ - *** Evaluate ***
12/17/2022 18:34:16 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow...        
remote: LFS file scan complete.        
To https://huggingface.co/ales/whisper-base-belarusian
   4aae45b..dd575b8  main -> main