CombinHorizon commited on
Commit
3054603
1 Parent(s): b3d3584

Update README.md

Browse files

Cleanup model card, tags and license

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -1 +1,14 @@
1
- This is a model released from the preprint: *[SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734)* Please refer to our [repository](https://github.com/princeton-nlp/SimPO) for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - SimPo
7
+ language:
8
+ - en
9
+ base_model:
10
+ - meta-llama/Meta-Llama-3-8B-Instruct
11
+ ---
12
+ This is a model released from the preprint: *[SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734)*, which is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets.
13
+ SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance.
14
+ Please refer to our [github repo](https://github.com/princeton-nlp/SimPO) for more details.