--- library_name: transformers license: other license_name: qwen license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE base_model: Qwen/Qwen2.5-14B tags: - generated_from_trainer model-index: - name: 14B-Qwen2.5-Freya-x1 results: [] --- Awe snap. Another Qwen 2.5 14b by the lord and savior, Sao. I'm still refining my own settings for Qwen but for those of you who are interested in my most recent settings: Temp: 1.1-1.2 OR .75-.85
Min P: 0.02 Min P - 0.05 (Min P seems to help with 'oddities' in responses) 0.035 Seems a decent midpoint.
Rep Penalty: 1.08
DRY: 0.3m 1.75, 2
[This is the 4bpw EXL2 version of this model. For the original model, go here](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1)
[For the 8bpw version, go here](https://huggingface.co/Statuo/Sao10K_14B-Qwen2.5-Freya-v1-EXL2-8bpw)
[For the 6bpw version, go here](https://huggingface.co/Statuo/Sao10K_14B-Qwen2.5-Freya-v1-EXL2-6bpw)
--- ![Freya](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1/resolve/main/sad.png) *Me during failed runs* # 14B-Qwen2.5-Freya-v1 I decided to mess around with training methods again, considering the re-emegence of methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way. Freya-S1 - LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model. - Cleaned text and literature as best as I could, still, may have had issues here and there. Freya-S2 - The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that. - Reduced LoRA rank because it's mainly instruct and other details I won't get into. Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.* ``` Prompt Format: ChatML Temperature: 1+ # I don't know, man. min_p: 0.05 ``` Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA. https://sao10k.carrd.co/ for contact. --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.6.0` ```yaml base_model: - s1: Qwen/Qwen2.5-14B - s2: Qwen/Qwen2.5-14B-Instruct model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false sequence_len: 16384 bf16: auto fp16: tf32: false flash_attention: true special_tokens: adapter: lora # 16-bit lora_r: - s1: 64 - s2: 32 lora_alpha: 64 lora_dropout: 0.2 lora_fan_in_fan_out: peft_use_rslora: true lora_target_linear: true # Data dataset_prepared_path: dataset_run_freya datasets: # S1 - Writing / Completion - path: datasets/eBooks-cleaned-75K type: completion - path: datasets/novels-clean-dedupe-10K type: completion # S2 - Instruct - path: datasets/10k-amoral-full-fixed-sys.json type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn - path: datasets/44k-hespera-smartshuffle.json type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn - path: datasets/5k_rpg_adventure_instruct-sys.json type: chat_template chat_template: chatml roles_to_train: ["gpt"] field_messages: conversations message_field_role: from message_field_content: value train_on_eos: turn shuffle_merged_datasets: true warmup_ratio: 0.1 plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_layer_norm: true liger_glu_activation: true liger_fused_linear_cross_entropy: true # Iterations num_epochs: - s1: 1 - s2: 2 # Sampling sample_packing: true pad_to_sequence_len: true train_on_inputs: false group_by_length: false # Batching gradient_accumulation_steps: 4 micro_batch_size: 2 gradient_checkpointing: unsloth # Evaluation val_set_size: 0.025 evals_per_epoch: 5 eval_table_size: eval_max_new_tokens: 256 eval_sample_packing: false eval_batch_size: 1 # Optimizer optimizer: paged_ademamix_8bit lr_scheduler: cosine learning_rate: - s1: 0.000002 - s2: 0.000004 weight_decay: 0.2 max_grad_norm: 10.0 # Garbage Collection gc_steps: 10 # Misc deepspeed: ./deepspeed_configs/zero2.json ```