--- language: - en license: llama3 library_name: peft tags: - axolotl base_model: meta-llama/Meta-Llama-3-8B-Instruct datasets: - PJMixers/OpenCAI-v2 --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) # Description **RUN KILLED. REDOING** A sort of recreation of [OpenCAI](https://huggingface.co/datasets/Norquinal/OpenCAI) with a larger dataset. Likely not very useful by itself, but might be good to add within a merge. The unprocessed dataset is currently 10,105,611 messages. 204,476,977 tokens in chunks of 6000 tokens. # Prompt format LLaMa-3 Instruct format. No system training. Nicknames used instead of `user` and `assistant`. ``` <|start_header_id|>Bob<|end_header_id|> Hello.<|eot_id|><|start_header_id|>Joe<|end_header_id|> Hey. How are you?<|eot_id|> ``` # In progress training graphs Checkpoint 258/2105 ![train/loss](https://huggingface.co/PJMixers/LLaMa-3-OpenCAI-v2-run_3-SFT-8B-Latest-QLoRA/resolve/main/assets/train_loss.png) ![eval/loss](https://huggingface.co/PJMixers/LLaMa-3-OpenCAI-v2-run_3-SFT-8B-Latest-QLoRA/resolve/main/assets/eval_loss.png) ![train/grad_norm](https://huggingface.co/PJMixers/LLaMa-3-OpenCAI-v2-run_3-SFT-8B-Latest-QLoRA/resolve/main/assets/train_grad_norm.png) # Training settings ```yaml # Weights and Biases logging config wandb_project: L3-OpenCAI-v2-8B wandb_entity: wandb_watch: wandb_name: SFT-QLoRA-run_3-OpenCAI-v2 wandb_log_model: # Model architecture config base_model: meta-llama/Meta-Llama-3-8B-Instruct model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer # Hugging Face saving config hub_model_id: hub_strategy: push_dataset_to_hub: hf_use_auth_token: # Model checkpointing config output_dir: ./L3-OpenCAI-v2-run_3-SFT-8B-QLoRA resume_from_checkpoint: save_steps: saves_per_epoch: 50 save_safetensors: true save_total_limit: 3 # Mixed precision training config bf16: true fp16: false tf32: false # Model loading config load_in_8bit: false load_in_4bit: true strict: false # Sequence config sequence_len: 6008 s2_attention: false sample_packing: true eval_sample_packing: true pad_to_sequence_len: true train_on_inputs: false group_by_length: false # QLoRA adapter config adapter: qlora lora_model_dir: lora_r: 128 lora_alpha: 128 lora_dropout: 0.125 lora_fan_in_fan_out: lora_target_linear: save_embedding_layers: peft_layers_to_transform: peft_use_dora: peft_use_rslora: peft_layer_replication: lora_target_modules: - gate_proj - down_proj - up_proj - q_proj - v_proj - k_proj - o_proj lora_modules_to_save: # Unfrozen parameters for FFT unfrozen_parameters: # Dataset config datasets: - path: ./local_datasets/OpenCAI-v2-6000-CustomShareGPT.json type: customllama3 val_set_size: 0.01 evaluation_strategy: eval_steps: evals_per_epoch: 50 test_datasets: dataset_prepared_path: ./OpenCAI-v2-seed42-l3 shuffle_merged_datasets: true # Training hyperparameters num_epochs: 1 gradient_accumulation_steps: 16 micro_batch_size: 1 eval_batch_size: 1 warmup_steps: 25 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00002 loraplus_lr_ratio: 8 loraplus_lr_embedding: cosine_min_lr_ratio: 0.1 weight_decay: 0.1 max_grad_norm: 1 logging_steps: 1 # Model optimization gradient_checkpointing: unsloth xformers_attention: false flash_attention: false sdp_attention: true # Loss monitoring config early_stopping_patience: false loss_watchdog_threshold: 100.0 loss_watchdog_patience: 3 # Debug config debug: true seed: 42 # DeepSpeed and FSDP config deepspeed: fsdp: fsdp_config: # Token config special_tokens: bos_token: "<|begin_of_text|>" eos_token: "<|end_of_text|>" pad_token: "<|end_of_text|>" tokens: # Don't mess with this, it's here for accelerate and torchrun local_rank: ``` ### Framework versions - PEFT 0.10.0