File size: 8,052 Bytes
abbb14d 50316c7 4aae45b dd575b8 204ff19 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
12/17/2022 00:49:12 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: True
12/17/2022 00:49:12 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=1000,
evaluation_strategy=steps,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_max_length=225,
generation_num_beams=None,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
greater_is_better=False,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=ales/whisper-base-belarusian,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=True,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=./runs/Dec17_00-49-12_129-213-88-66,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=50,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=6000,
metric_for_best_model=wer,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=./,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=64,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=True,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=./,
save_on_each_node=False,
save_steps=1000,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=500,
weight_decay=0.0,
xpu_backend=None,
)
12/17/2022 00:49:12 - INFO - __main__ - Data parameters: DataTrainingArguments(dataset_name='mozilla-foundation/common_voice_11_0', dataset_config_name='be', max_train_samples=None, max_eval_samples=None, audio_column_name='audio', text_column_name='sentence', max_duration_in_seconds=30.0, min_duration_in_seconds=0.0, train_split_name='train', eval_split_name='validation', do_lower_case=False, do_remove_punctuation=False, do_normalize_eval=True, language='be', task='transcribe', shuffle_buffer_size=500, streaming_train=True, streaming_eval=False)
12/17/2022 00:49:12 - INFO - __main__ - Model parameters: ModelArguments(model_name_or_path='openai/whisper-base', config_name=None, tokenizer_name=None, feature_extractor_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=True, freeze_feature_encoder=False, freeze_encoder=False, forced_decoder_ids=None, suppress_tokens=None, model_index_name='Whisper Base Belarusian')
12/17/2022 00:49:12 - INFO - __main__ - output_dir already exists. will try to load last checkpoint.
12/17/2022 00:49:12 - INFO - __main__ - last_checkpoint is None. will try to read from training_args.resume_from_checkpoint
12/17/2022 00:49:12 - INFO - __main__ - last_checkpoint is None. resume_from_checkpoint is either None or not existing dir. will try to read from the model saved in the root of output_dir.
12/17/2022 00:49:12 - INFO - __main__ - dir is not empty, but contains only: ['src', '.gitattributes', 'train_20221217-004912.log', '.git', 'train_run_1.log']. it is OK - will start training
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset Infos from /home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - INFO - datasets.builder - Overwrite dataset info from restored data version.
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset Infos from /home/ubuntu/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - INFO - datasets.builder - Overwrite dataset info from restored data version.
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:13 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f)
12/17/2022 00:49:13 - INFO - datasets.info - Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f
12/17/2022 00:49:14 - INFO - __main__ - vectorizing dataset
12/17/2022 00:49:14 - INFO - __main__ - will preprocess data using None processes.
12/17/2022 00:49:16 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ubuntu/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/be/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-31769dd5b5301183.arrow
12/17/2022 00:49:18 - WARNING - huggingface_hub.repository - /home/ubuntu/whisper-base-belarusian/./ is already a clone of https://huggingface.co/ales/whisper-base-belarusian. Make sure you pull the latest changes with `repo.git_pull()`.
12/17/2022 00:49:20 - INFO - __main__ - ShuffleCallback. shuffling train dataset. seed: 42. dataset epoch: 0
12/17/2022 02:39:16 - WARNING - datasets.download.streaming_download_manager - Got disconnected from remote data host. Retrying in 5sec [1/20]
12/17/2022 03:40:45 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['src/__pycache__/preprocess.cpython-38.pyc', 'src/__pycache__/run_speech_recognition_seq2seq_streaming.cpython-38.pyc']. This may take a bit of time if the files are large.
12/17/2022 16:02:39 - INFO - __main__ - ShuffleCallback. shuffling train dataset. seed: 42. dataset epoch: 1
12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream.
12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
12/17/2022 17:58:44 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow...
remote: LFS file scan complete.
To https://huggingface.co/ales/whisper-base-belarusian
4074dad..52d55ef main -> main
12/17/2022 17:59:11 - WARNING - huggingface_hub.repository - To https://huggingface.co/ales/whisper-base-belarusian
52d55ef..4aae45b main -> main
12/17/2022 17:59:14 - INFO - __main__ - *** Evaluate ***
12/17/2022 18:34:16 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow...
remote: LFS file scan complete.
To https://huggingface.co/ales/whisper-base-belarusian
4aae45b..dd575b8 main -> main
|