sanchit-gandhi
/

distil-zephyr-1.5b-dpo-ultrafeedback-200k

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

distil-zephyr-1.5b-dpo-ultrafeedback-200k / config_200k.yaml

sanchit-gandhi's picture

sanchit-gandhi HF staff

Training in progress, step 100

718d1ae verified about 2 months ago

raw history blame contribute delete

No virus

911 Bytes

	# Model arguments
	model_name_or_path: sanchit-gandhi/distil-zephyr-1.5b-ssft-ultrachat-200k
	torch_dtype: null

	# Data training arguments
	# For definitions, see: src/h4/training/config.py
	dataset_mixer:
	HuggingFaceH4/ultrafeedback_binarized: 1.0
	dataset_splits:
	- train_prefs
	- test_prefs
	preprocessing_num_workers: 12

	# DPOTrainer arguments
	bf16: true
	beta: 0.01
	do_eval: true
	evaluation_strategy: steps
	eval_steps: 100
	gradient_accumulation_steps: 2
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: False
	learning_rate: 5.0e-7
	log_level: info
	logging_steps: 25
	lr_scheduler_type: cosine
	max_length: 1024
	max_prompt_length: 512
	num_train_epochs: 1
	optim: adamw_torch
	output_dir: ./
	per_device_train_batch_size: 8
	per_device_eval_batch_size: 8
	push_to_hub: true
	save_strategy: "steps"
	save_steps: 100
	save_total_limit: 1
	seed: 42
	warmup_ratio: 0.1
	report_to:
	- tensorboard
	- wandb