--- license: apache-2.0 library_name: peft tags: - axolotl - generated_from_trainer base_model: byroneverson/LLaVA-v1.5-7B-rehome model-index: - name: LLaVA-v1.5-7B-rehome-qlora results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.3.0` ```yaml base_model: byroneverson/LLaVA-v1.5-7B-rehome model_type: AutoModelForCausalLM #MistralForCausalLM tokenizer_type: AutoTokenizer #LlamaTokenizer #is_mistral_derived_model: true # load_in_8bit: false load_in_4bit: true strict: false # datasets: - path: yahma/alpaca-cleaned #byroneverson/shell-cmd-instruct type: completion #solar_shell_instruct #alpaca # For `completion` datsets only, uses the provided field instead of `text` column field: output dataset_prepared_path: last_run_prepared val_set_size: 0.02 output_dir: /kaggle/working/qlora-out # # Push checkpoints to hub hub_model_id: byroneverson/LLaVA-v1.5-7B-rehome-shell-qlora # How to push checkpoints to hub # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy hub_strategy: checkpoint # Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets # Required to be true when used in combination with `push_dataset_to_hub` hf_use_auth_token: true # adapter: qlora lora_model_dir: # sequence_len: 2048 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true # Train only these params when performing full train #unfrozen_parameters: [ # "\\bmlp\\.up_proj\\b", # "\\bmlp\\.down_proj\\b", # "\\bmlp\\.gate_proj\\b", # ] # LoRA/QLoRA lora_r: 160 #128 lora_alpha: 80 #384 # A good ballpark is double lora_r for 2x or half for 0.5x lora_dropout: 0.025 lora_target_linear: false # Only target specified modules lora_fan_in_fan_out: lora_target_modules: [ "up_proj", "down_proj", "gate_proj", ] # wandb_project: "LLaVA-v1.5-7B-rehome-qlora" wandb_log_model: "checkpoint" wandb_entity: wandb_watch: wandb_run_id: # gradient_accumulation_steps: 16 # 1 micro_batch_size: 1 num_epochs: 0.2 optimizer: paged_lion_8bit #paged_adamw_8bit lr_scheduler: cosine cosine_min_lr_ratio: 1.0 # Constant lr if 1.0 learning_rate: 0.0001 # train_on_inputs: false group_by_length: false bf16: false #true fp16: true tf32: false # gradient_checkpointing: true early_stopping_patience: # Resume from a specific checkpoint dir resume_from_checkpoint: #last-checkpoint # If resume_from_checkpoint isn't set and you simply want it to start where it left off. # Be careful with this being turned on between different models. auto_resume_from_checkpoints: true #false local_rank: logging_steps: 1 xformers_attention: # Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention: flash_attention: false #true flash_attn_cross_entropy: # Whether to use flash-attention cross entropy implementation - advanced use only flash_attn_rms_norm: false # Whether to use flash-attention rms norm implementation - advanced use only flash_attn_fuse_qkv: # Whether to fuse QKV into a single operation flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation # warmup_steps: 4 eval_steps: 50 eval_table_size: eval_table_max_new_tokens: 128 save_steps: debug: true # deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: "" ```

# LLaVA-v1.5-7B-rehome-shell-qlora This model is a fine-tuned version of [byroneverson/LLaVA-v1.5-7B-rehome](https://huggingface.co/byroneverson/LLaVA-v1.5-7B-rehome) on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - total_eval_batch_size: 2 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 4 - num_epochs: 0.2 - mixed_precision_training: Native AMP ### Training results ### Framework versions - PEFT 0.7.2.dev0 - Transformers 4.37.0.dev0 - Pytorch 2.0.1+cu117 - Datasets 2.16.1 - Tokenizers 0.15.0