sabersaleh
/

Llama3-SimPO

Model card Files Files and versions Community

Llama3-SimPO / README.md

sabersaleh's picture

Update README.md

57d5a6a verified 12 days ago

|

history blame contribute delete

371 Bytes

	---
	license: mit
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	base_model:
	- princeton-nlp/Llama-3-Base-8B-SFT
	---

	This is an aligned model based on princeton-nlp/Llama-3-Base-8B-SFT. This model is aligned using the Ultrafeedback dataset, fine-tuned through the Simple Preference Optimization (SimPO) loss. The optimization process was conducted with a single epoch.