The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `2` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2023-12-22 22:59:09,542] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-12-22 22:59:09,550] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /root/miniconda3/envs/textgen/lib/python3.10/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead. warnings.warn( /root/miniconda3/envs/textgen/lib/python3.10/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The `optimize_cuda_cache` arguement will be deprecated soon, please use `optimize_device_cache` instead. warnings.warn( 12/22/2023 22:59:15 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training. 12/22/2023 22:59:15 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 12/22/2023 22:59:15 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. /root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 12/22/2023 22:59:15 - INFO - llmtuner.model.parser - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, compute dtype: None 12/22/2023 22:59:15 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora-2nd/runs/Dec22_22-59-15_autodl-container-e8c311843c-66abc214, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1000, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora-2nd, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora-2nd, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/22/2023 22:59:15 - INFO - llmtuner.data.loader - Loading dataset comparison_gpt4_data_en.json... 12/22/2023 22:59:15 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training. 12/22/2023 22:59:15 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 12/22/2023 22:59:15 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. /root/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 12/22/2023 22:59:15 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, compute dtype: None 12/22/2023 22:59:15 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora-2nd/runs/Dec22_22-59-15_autodl-container-e8c311843c-66abc214, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1000, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora-2nd, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/dpo/phi-2-sft-alpaca_gpt4_en-ep1-dpo-comparison_gpt4_en-ep1-lora-2nd, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/22/2023 22:59:15 - INFO - llmtuner.data.loader - Loading dataset comparison_gpt4_data_en.json... Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/22/2023 22:59:16 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. [WARNING|logging.py:314] 2023-12-22 22:59:16,974 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 12/22/2023 22:59:16 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. Loading checkpoint shards: 0%| | 0/2 [00:00 12/22/2023 22:59:20 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 12/22/2023 22:59:20 - INFO - llmtuner.model.loader - trainable params: 2621440 || all params: 2782305280 || trainable%: 0.0942 12/22/2023 22:59:20 - INFO - llmtuner.data.template - Add pad token: <|endoftext|> Running tokenizer on dataset: 0%| | 0/36441 [00:00 rejected_ids: [7738, 11, 12550, 11, 290, 3469, 13, 50256] rejected: Red, Yellow, and Green.<|endoftext|> Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Running tokenizer on dataset: 0%| | 0/36441 [00:00