Upload 2 files

a453544 verified 2 days ago

17 kB

	---
	license: other
	library_name: transformers
	tags:
	- generated_from_trainer
	base_model: Qwen/Qwen2.5-72B
	datasets:
	- anthracite-org/kalo-opus-instruct-22k-no-refusal
	- Nopm/Opus_WritingStruct
	- Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
	- Gryphe/Sonnet3.5-Charcard-Roleplay
	- Gryphe/ChatGPT-4o-Writing-Prompts
	- Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
	- Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
	- nothingiisreal/Reddit-Dirty-And-WritingPrompts
	- allura-org/Celeste-1.x-data-mixture
	- cognitivecomputations/dolphin-2.9.3
	license_name: qwen
	license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
	model-index:
	- name: EVA-Qwen2.5-72B-SFFT-v0.2
	results: []
	---



	# EVA Qwen2.5-72B v0.2

	<p>
	A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data.<br>
	It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
	</p>

	<p>Dedicated to Nev.</p>

	<p><b>NOTE: LLM-Compressor quants don't seem to work correctly, quality seems to be much worse than normal. It wasn't the case with previous versions. GGUF and GPTQ seem to be unaffected.</b></p>
	</br>
	<p><b>Version notes for 0.2</b>: Optimized training hyperparameters and increased sequence length. Better instruction following deeper into context and less repetition.</p>

	<p>
	<p>Prompt format is ChatML.</p><br>
	<h3>Recommended sampler values:</h3>
	<ul>
	<li>Temperature: 0.8</li>
	<li>Min-P: 0.05</li>
	<li>Top-A: 0.3</li>
	<li>Repetition Penalty: 1.03</li>
	</ul>

	<h3>Recommended SillyTavern preset (via CalamitousFelicitousness):</h3>
	<ul><li><a href="https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2/blob/main/EV01.json">Master import</a></li></ul>

	</p>

	<p>
	<br>
	<h3>
	Training data:
	</h3>
	<ul>
	<li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
	<li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
	<li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
	<li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
	<li>Synthstruct and SynthRP datasets by Epiculous</li>
	<li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
	</ul>
	<h3>
	Training time and hardware:
	</h3>
	<ul><li>17 hours on 8xH100 SXM</a></li></ul><br>
	</p>
	<p>Model was created by Kearm, Auri and Cahvay.</p>
	<h4>Special thanks:</h4><ul>
	<li>to Featherless for sponsoring this run</li>
	<li>to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.</li>
	<li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data</li>
	<li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>

	<h3>Statement about change in licensing for the future models.</h3>
	<p>For all future EVA-Unit-01 models, there will be a provision in the license stating that Infermatic and any of its employees or paid associates cannot utilize, distribute, download, or otherwise make use of EVA models.
	While this cannot retroactively apply to our licensing, we officially request Infermatic immediately cease use of our models for unwarranted profit, although we acknowledge at this point it will not likely be followed.
	EVA models will still be available in the future on Featherless, ArliAI (in the future), and other providers who want to host them, as well as for local and cloud usage.</p>


	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	base_model: Qwen/Qwen2.5-72B

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_swiglu: true
	liger_fused_linear_cross_entropy: true

	# plugins:
	# - axolotl.integrations.spectrum.SpectrumPlugin

	# spectrum_top_fraction: 0.5
	# # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
	# spectrum_model_name: Qwen/Qwen2.5-32B

	datasets:
	- path: datasets/Celeste_Filtered_utf8fix.jsonl
	type: sharegpt
	- path: datasets/deduped_not_samantha_norefusals.jsonl
	type: sharegpt
	- path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
	type: sharegpt
	- path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
	type: sharegpt
	- path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
	type: sharegpt
	- path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
	type: sharegpt
	- path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
	type: sharegpt
	- path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
	type: sharegpt

	chat_template: chatml
	shuffle_merged_datasets: true
	val_set_size: 0.001
	output_dir: EVA-Qwen2.5-72B-SFFT-v0.2

	sequence_len: 10240
	sample_packing: true
	eval_sample_packing: false
	pad_to_sequence_len: false

	# adapter: qlora
	# lora_model_dir:
	# lora_r: 64
	# lora_alpha: 128
	# lora_dropout: 0.05
	# lora_target_linear: true
	# peft_use_dora: true

	unfrozen_parameters:
	- ^lm_head.weight$
	- ^model.embed_tokens.weight$
	# mlp.down_proj layers
	- model.layers.62.mlp.down_proj
	- model.layers.64.mlp.down_proj
	- model.layers.63.mlp.down_proj
	- model.layers.66.mlp.down_proj
	- model.layers.65.mlp.down_proj
	- model.layers.67.mlp.down_proj
	- model.layers.68.mlp.down_proj
	- model.layers.31.mlp.down_proj
	- model.layers.60.mlp.down_proj
	- model.layers.69.mlp.down_proj
	- model.layers.61.mlp.down_proj
	- model.layers.59.mlp.down_proj
	- model.layers.30.mlp.down_proj
	- model.layers.70.mlp.down_proj
	- model.layers.32.mlp.down_proj
	- model.layers.34.mlp.down_proj
	- model.layers.33.mlp.down_proj
	- model.layers.76.mlp.down_proj
	- model.layers.72.mlp.down_proj
	- model.layers.71.mlp.down_proj
	- model.layers.58.mlp.down_proj
	- model.layers.75.mlp.down_proj
	- model.layers.29.mlp.down_proj
	- model.layers.56.mlp.down_proj
	- model.layers.26.mlp.down_proj
	- model.layers.35.mlp.down_proj
	- model.layers.28.mlp.down_proj
	- model.layers.57.mlp.down_proj
	- model.layers.77.mlp.down_proj
	- model.layers.36.mlp.down_proj
	- model.layers.27.mlp.down_proj
	- model.layers.25.mlp.down_proj
	- model.layers.78.mlp.down_proj
	- model.layers.37.mlp.down_proj
	- model.layers.73.mlp.down_proj
	- model.layers.55.mlp.down_proj
	- model.layers.54.mlp.down_proj
	- model.layers.74.mlp.down_proj
	- model.layers.24.mlp.down_proj
	- model.layers.53.mlp.down_proj
	# mlp.gate_proj layers
	- model.layers.78.mlp.gate_proj
	- model.layers.77.mlp.gate_proj
	- model.layers.76.mlp.gate_proj
	- model.layers.79.mlp.gate_proj
	- model.layers.75.mlp.gate_proj
	- model.layers.74.mlp.gate_proj
	- model.layers.73.mlp.gate_proj
	- model.layers.72.mlp.gate_proj
	- model.layers.71.mlp.gate_proj
	- model.layers.70.mlp.gate_proj
	- model.layers.69.mlp.gate_proj
	- model.layers.57.mlp.gate_proj
	- model.layers.54.mlp.gate_proj
	- model.layers.55.mlp.gate_proj
	- model.layers.68.mlp.gate_proj
	- model.layers.63.mlp.gate_proj
	- model.layers.53.mlp.gate_proj
	- model.layers.44.mlp.gate_proj
	- model.layers.45.mlp.gate_proj
	- model.layers.49.mlp.gate_proj
	- model.layers.58.mlp.gate_proj
	- model.layers.46.mlp.gate_proj
	- model.layers.56.mlp.gate_proj
	- model.layers.67.mlp.gate_proj
	- model.layers.62.mlp.gate_proj
	- model.layers.50.mlp.gate_proj
	- model.layers.64.mlp.gate_proj
	- model.layers.52.mlp.gate_proj
	- model.layers.40.mlp.gate_proj
	- model.layers.43.mlp.gate_proj
	- model.layers.48.mlp.gate_proj
	- model.layers.66.mlp.gate_proj
	- model.layers.47.mlp.gate_proj
	- model.layers.59.mlp.gate_proj
	- model.layers.65.mlp.gate_proj
	- model.layers.61.mlp.gate_proj
	- model.layers.60.mlp.gate_proj
	- model.layers.42.mlp.gate_proj
	- model.layers.51.mlp.gate_proj
	- model.layers.41.mlp.gate_proj
	# mlp.up_proj layers
	- model.layers.70.mlp.up_proj
	- model.layers.69.mlp.up_proj
	- model.layers.71.mlp.up_proj
	- model.layers.68.mlp.up_proj
	- model.layers.72.mlp.up_proj
	- model.layers.67.mlp.up_proj
	- model.layers.66.mlp.up_proj
	- model.layers.73.mlp.up_proj
	- model.layers.46.mlp.up_proj
	- model.layers.63.mlp.up_proj
	- model.layers.75.mlp.up_proj
	- model.layers.76.mlp.up_proj
	- model.layers.74.mlp.up_proj
	- model.layers.45.mlp.up_proj
	- model.layers.62.mlp.up_proj
	- model.layers.64.mlp.up_proj
	- model.layers.65.mlp.up_proj
	- model.layers.44.mlp.up_proj
	- model.layers.53.mlp.up_proj
	- model.layers.47.mlp.up_proj
	- model.layers.49.mlp.up_proj
	- model.layers.48.mlp.up_proj
	- model.layers.57.mlp.up_proj
	- model.layers.43.mlp.up_proj
	- model.layers.42.mlp.up_proj
	- model.layers.56.mlp.up_proj
	- model.layers.61.mlp.up_proj
	- model.layers.54.mlp.up_proj
	- model.layers.40.mlp.up_proj
	- model.layers.55.mlp.up_proj
	- model.layers.77.mlp.up_proj
	- model.layers.60.mlp.up_proj
	- model.layers.41.mlp.up_proj
	- model.layers.35.mlp.up_proj
	- model.layers.37.mlp.up_proj
	- model.layers.58.mlp.up_proj
	- model.layers.34.mlp.up_proj
	- model.layers.38.mlp.up_proj
	- model.layers.33.mlp.up_proj
	- model.layers.39.mlp.up_proj
	# self_attn.k_proj layers
	- model.layers.36.self_attn.k_proj
	- model.layers.79.self_attn.k_proj
	- model.layers.35.self_attn.k_proj
	- model.layers.34.self_attn.k_proj
	- model.layers.37.self_attn.k_proj
	- model.layers.33.self_attn.k_proj
	- model.layers.38.self_attn.k_proj
	- model.layers.39.self_attn.k_proj
	- model.layers.74.self_attn.k_proj
	- model.layers.77.self_attn.k_proj
	- model.layers.41.self_attn.k_proj
	- model.layers.69.self_attn.k_proj
	- model.layers.32.self_attn.k_proj
	- model.layers.78.self_attn.k_proj
	- model.layers.30.self_attn.k_proj
	- model.layers.70.self_attn.k_proj
	- model.layers.25.self_attn.k_proj
	- model.layers.42.self_attn.k_proj
	- model.layers.29.self_attn.k_proj
	- model.layers.31.self_attn.k_proj
	- model.layers.68.self_attn.k_proj
	- model.layers.66.self_attn.k_proj
	- model.layers.22.self_attn.k_proj
	- model.layers.65.self_attn.k_proj
	- model.layers.44.self_attn.k_proj
	- model.layers.40.self_attn.k_proj
	- model.layers.63.self_attn.k_proj
	- model.layers.23.self_attn.k_proj
	- model.layers.28.self_attn.k_proj
	- model.layers.24.self_attn.k_proj
	- model.layers.26.self_attn.k_proj
	- model.layers.67.self_attn.k_proj
	- model.layers.75.self_attn.k_proj
	- model.layers.27.self_attn.k_proj
	- model.layers.57.self_attn.k_proj
	- model.layers.64.self_attn.k_proj
	- model.layers.71.self_attn.k_proj
	- model.layers.61.self_attn.k_proj
	- model.layers.72.self_attn.k_proj
	- model.layers.73.self_attn.k_proj
	# self_attn.o_proj layers
	- model.layers.69.self_attn.o_proj
	- model.layers.39.self_attn.o_proj
	- model.layers.16.self_attn.o_proj
	- model.layers.14.self_attn.o_proj
	- model.layers.19.self_attn.o_proj
	- model.layers.42.self_attn.o_proj
	- model.layers.12.self_attn.o_proj
	- model.layers.15.self_attn.o_proj
	- model.layers.17.self_attn.o_proj
	- model.layers.38.self_attn.o_proj
	- model.layers.23.self_attn.o_proj
	- model.layers.22.self_attn.o_proj
	- model.layers.13.self_attn.o_proj
	- model.layers.29.self_attn.o_proj
	- model.layers.41.self_attn.o_proj
	- model.layers.44.self_attn.o_proj
	- model.layers.46.self_attn.o_proj
	- model.layers.45.self_attn.o_proj
	- model.layers.43.self_attn.o_proj
	- model.layers.49.self_attn.o_proj
	- model.layers.30.self_attn.o_proj
	- model.layers.26.self_attn.o_proj
	- model.layers.25.self_attn.o_proj
	- model.layers.37.self_attn.o_proj
	- model.layers.47.self_attn.o_proj
	- model.layers.11.self_attn.o_proj
	- model.layers.18.self_attn.o_proj
	- model.layers.28.self_attn.o_proj
	- model.layers.20.self_attn.o_proj
	- model.layers.27.self_attn.o_proj
	- model.layers.53.self_attn.o_proj
	- model.layers.52.self_attn.o_proj
	- model.layers.35.self_attn.o_proj
	- model.layers.71.self_attn.o_proj
	- model.layers.10.self_attn.o_proj
	- model.layers.3.self_attn.o_proj
	- model.layers.21.self_attn.o_proj
	- model.layers.24.self_attn.o_proj
	- model.layers.68.self_attn.o_proj
	- model.layers.48.self_attn.o_proj
	# self_attn.q_proj layers
	- model.layers.1.self_attn.q_proj
	- model.layers.2.self_attn.q_proj
	- model.layers.3.self_attn.q_proj
	- model.layers.0.self_attn.q_proj
	- model.layers.5.self_attn.q_proj
	- model.layers.4.self_attn.q_proj
	- model.layers.6.self_attn.q_proj
	- model.layers.8.self_attn.q_proj
	- model.layers.7.self_attn.q_proj
	- model.layers.9.self_attn.q_proj
	- model.layers.10.self_attn.q_proj
	- model.layers.68.self_attn.q_proj
	- model.layers.25.self_attn.q_proj
	- model.layers.12.self_attn.q_proj
	- model.layers.54.self_attn.q_proj
	- model.layers.55.self_attn.q_proj
	- model.layers.61.self_attn.q_proj
	- model.layers.18.self_attn.q_proj
	- model.layers.49.self_attn.q_proj
	- model.layers.66.self_attn.q_proj
	- model.layers.72.self_attn.q_proj
	- model.layers.11.self_attn.q_proj
	- model.layers.52.self_attn.q_proj
	- model.layers.64.self_attn.q_proj
	- model.layers.15.self_attn.q_proj
	- model.layers.60.self_attn.q_proj
	- model.layers.50.self_attn.q_proj
	- model.layers.59.self_attn.q_proj
	- model.layers.53.self_attn.q_proj
	- model.layers.48.self_attn.q_proj
	- model.layers.57.self_attn.q_proj
	- model.layers.70.self_attn.q_proj
	- model.layers.17.self_attn.q_proj
	- model.layers.67.self_attn.q_proj
	- model.layers.71.self_attn.q_proj
	- model.layers.62.self_attn.q_proj
	- model.layers.51.self_attn.q_proj
	- model.layers.19.self_attn.q_proj
	- model.layers.58.self_attn.q_proj
	- model.layers.13.self_attn.q_proj
	# self_attn.v_proj layers
	- model.layers.23.self_attn.v_proj
	- model.layers.25.self_attn.v_proj
	- model.layers.26.self_attn.v_proj
	- model.layers.27.self_attn.v_proj
	- model.layers.28.self_attn.v_proj
	- model.layers.29.self_attn.v_proj
	- model.layers.30.self_attn.v_proj
	- model.layers.31.self_attn.v_proj
	- model.layers.34.self_attn.v_proj
	- model.layers.35.self_attn.v_proj
	- model.layers.36.self_attn.v_proj
	- model.layers.37.self_attn.v_proj
	- model.layers.38.self_attn.v_proj
	- model.layers.42.self_attn.v_proj
	- model.layers.48.self_attn.v_proj
	- model.layers.57.self_attn.v_proj
	- model.layers.58.self_attn.v_proj
	- model.layers.61.self_attn.v_proj
	- model.layers.63.self_attn.v_proj
	- model.layers.64.self_attn.v_proj
	- model.layers.65.self_attn.v_proj
	- model.layers.66.self_attn.v_proj
	- model.layers.69.self_attn.v_proj
	- model.layers.70.self_attn.v_proj
	- model.layers.74.self_attn.v_proj
	- model.layers.75.self_attn.v_proj
	- model.layers.72.self_attn.v_proj
	- model.layers.39.self_attn.v_proj
	- model.layers.41.self_attn.v_proj
	- model.layers.40.self_attn.v_proj
	- model.layers.33.self_attn.v_proj
	- model.layers.59.self_attn.v_proj
	- model.layers.16.self_attn.v_proj
	- model.layers.15.self_attn.v_proj
	- model.layers.76.self_attn.v_proj
	- model.layers.24.self_attn.v_proj
	- model.layers.68.self_attn.v_proj
	- model.layers.67.self_attn.v_proj
	- model.layers.55.self_attn.v_proj
	- model.layers.44.self_attn.v_proj



	wandb_project: EVA-Qwen2.5-72B-SFFT-v0.2
	wandb_entity:
	wandb_watch:
	wandb_name: Unit-02
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 1
	num_epochs: 3
	optimizer: paged_ademamix_8bit
	lr_scheduler: cosine
	learning_rate: 0.00003
	max_grad_norm: 1.5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: "unsloth"
	# gradient_checkpointing_kwargs:
	# use_reentrant: true
	early_stopping_patience:
	resume_from_checkpoint: EVA-Qwen2.5-72B-SFFT-v0.2/checkpoint-128
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 20
	evals_per_epoch: 4
	saves_per_epoch: 4
	save_safetensors: true
	save_total_limit: 1
	hub_model_id:
	hub_strategy:
	debug:
	deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json
	weight_decay: 0.12
	# fsdp:
	# - full_shard
	# - auto_wrap
	# fsdp_config:
	# fsdp_limit_all_gathers: true
	# fsdp_sync_module_states: false
	# fsdp_offload_params: true
	# fsdp_cpu_ram_efficient_loading: true
	# fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
	# fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
	# fsdp_activation_checkpointing: true
	# fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
	# fsdp_sharding_strategy: FULL_SHARD
	# fsdp_forward_prefetch: false # Added
	# fsdp_backward_prefetch: "BACKWARD_PRE" # Added
	# fsdp_backward_prefetch_limit: 1 # Added
	# fsdp_mixed_precision: BF16 # Added
	```

	</details><br>

	<h3><a href=https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard>Open LLM Leaderboard Evaluation Results</a></h3>

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|43.54\|
	\|IFEval (0-Shot) \|68.79\|
	\|BBH (3-Shot) \|59.07\|
	\|MATH Lvl 5 (4-Shot)\|39.05\|
	\|GPQA (0-shot) \|21.14\|
	\|MuSR (0-shot) \|19.73\|
	\|MMLU-PRO (5-shot) \|53.48\|