File size: 7,578 Bytes
22c688b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
[2023-12-19 17:47:31,804] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /root/miniconda3/envs/textgen/lib/python3.10/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead. warnings.warn( 12/19/2023 17:47:36 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. /root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of π€ Transformers. Use `--hub_token` instead. warnings.warn( 12/19/2023 17:47:36 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 2 distributed training: True, compute dtype: torch.bfloat16 12/19/2023 17:47:36 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=2, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=True, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/phi-2-sft-alpaca_gpt4_en-1/Predict_20/runs/Dec19_17-47-36_autodl-container-f11a41911a-e496153c, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, output_dir=./models/sft/phi-2-sft-alpaca_gpt4_en-1/Predict_20, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=8, predict_with_generate=True, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard', 'wandb'], resume_from_checkpoint=None, run_name=./models/sft/phi-2-sft-alpaca_gpt4_en-1/Predict_20, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=500, save_strategy=steps, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/19/2023 17:47:36 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json... [WARNING|logging.py:314] 2023-12-19 17:47:37,929 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|βββββ | 1/2 [00:00<00:00, 1.75it/s] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 2.79it/s] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 2.56it/s] 12/19/2023 17:47:38 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 12/19/2023 17:47:39 - INFO - llmtuner.model.adapter - Merged 1 adapter(s). 12/19/2023 17:47:39 - INFO - llmtuner.model.adapter - Loaded adapter(s): ./models/sft/phi-2-sft-alpaca_gpt4_en-1 12/19/2023 17:47:39 - INFO - llmtuner.model.loader - trainable params: 0 || all params: 2779683840 || trainable%: 0.0000 12/19/2023 17:47:39 - INFO - llmtuner.model.loader - This IS expected that the trainable params is 0 if you are using model for inference only. 12/19/2023 17:47:39 - INFO - llmtuner.data.template - Add pad token: <|endoftext|> [WARNING|logging.py:314] 2023-12-19 17:47:40,715 >> You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. /root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' input_ids: [50256, 32, 8537, 1022, 257, 11040, 2836, 290, 281, 11666, 4430, 8796, 13, 383, 8796, 3607, 7613, 11, 6496, 11, 290, 23507, 7429, 284, 262, 2836, 338, 2683, 13, 198, 20490, 25, 13786, 1115, 9040, 329, 10589, 5448, 13, 198, 48902, 25] inputs: <|endoftext|>A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: Give three tips for staying healthy. Assistant: 0%| | 0/10 [00:00<?, ?it/s] 20%|ββ | 2/10 [00:10<00:43, 5.46s/it] 30%|βββ | 3/10 [00:12<00:26, 3.85s/it] 40%|ββββ | 4/10 [00:20<00:31, 5.22s/it] 50%|βββββ | 5/10 [00:23<00:22, 4.47s/it] 60%|ββββββ | 6/10 [00:26<00:16, 4.15s/it] 70%|βββββββ | 7/10 [00:39<00:21, 7.02s/it] 80%|ββββββββ | 8/10 [00:46<00:13, 6.96s/it] 90%|βββββββββ | 9/10 [00:51<00:06, 6.40s/it] 100%|ββββββββββ| 10/10 [01:01<00:00, 7.51s/it]Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model cost 0.578 seconds. Prefix dict has been built successfully. 100%|ββββββββββ| 10/10 [01:02<00:00, 6.28s/it] ***** predict metrics ***** predict_bleu-4 = 49.0534 predict_rouge-1 = 54.9625 predict_rouge-2 = 31.0959 predict_rouge-l = 39.8761 predict_runtime = 0:01:10.55 predict_samples_per_second = 0.283 predict_steps_per_second = 0.142 12/19/2023 17:48:51 - INFO - llmtuner.train.sft.trainer - Saving prediction results to ./models/sft/phi-2-sft-alpaca_gpt4_en-1/Predict_20/generated_predictions.jsonl |