Full-text search
+ 1,000 results
eren23 / dpo-binarized-NeutrixOmnibe-7B
README.md
model
5 matches
tags:
transformers, safetensors, mistral, text-generation, merge, dpo, conversation, text-generation-inference, Kukedlc/NeuTrixOmniBe-7B-model-remix, en, dataset:argilla/OpenHermes2.5-dpo-binarized-alpha, license:apache-2.0, model-index, autotrain_compatible, endpoints_compatible, region:us
119
120
121
122
123
DPO Finetuned Kukedlc/NeuTrixOmniBe-7B-model-remix using argilla/OpenHermes2.5-dpo-binarized-alpha
argilla dpo binarized pairs is a dataset built on top of: https://huggingface.co/datasets/teknium/OpenHermes-2.5 using https://github.com/argilla-io/distilabel if interested.
Thx for the great data sources.
eren23 / dpo-binarized-NeuralTrix-7B
README.md
model
4 matches
tags:
transformers, safetensors, mistral, text-generation, conversation, text-generation-inference, CultriX/NeuralTrix-7B-dpo, dpo, merge, en, dataset:argilla/OpenHermes2.5-dpo-binarized-alpha, license:apache-2.0, model-index, autotrain_compatible, endpoints_compatible, region:us
119
120
121
122
123
DPO Finetuned CultriX/NeuralTrix-7B-dpo using argilla/OpenHermes2.5-dpo-binarized-alpha
argilla dpo binarized pairs is a dataset built on top of: https://huggingface.co/datasets/teknium/OpenHermes-2.5 using https://github.com/argilla-io/distilabel if interested.
Thx for the great data sources.
cloudyu / Pluto_13B_DPO
README.md
model
2 matches
LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-3.0bpw-h6-exl2
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-4.0bpw-h6-exl2
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-5.0bpw-h6-exl2
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
yunconglong / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-6.0bpw-h6-exl2
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-8.0bpw-h8-exl2
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
yunconglong / Truthful_DPO_MOE_19B
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, conversational, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
cloudyu / Truthful_DPO_TomGrc_FusionNet_34Bx2_MoE
README.md
model
6 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, conversational, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_34Bx2_MoE]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
cloudyu / Truthful_DPO_cloudyu_Mixtral_34Bx2_MoE_60B
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [cloudyu/Mixtral_34Bx2_MoE_60B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
yunconglong / 13B_MATH_DPO
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset kyujinpy/orca_math_dpo to improve [yunconglong/MoE_13B_DPO]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
yunconglong / MoE_13B_DPO
README.md
model
5 matches
tags:
transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
9
10
11
12
13
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset Intel/orca_dpo_pairs to improve [yunconglong/Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B]
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```
eren23 / OGNO-7b-dpo-truthful
README.md
model
2 matches
tags:
transformers, pytorch, mistral, text-generation, merge, dpo, text-generation-inference, en, dataset:jondurbin/truthy-dpo-v0.1, license:apache-2.0, model-index, autotrain_compatible, endpoints_compatible, region:us
118
119
120
121
122
DPO Finetuned paulml/OGNO-7B using jondurbin/truthy-dpo-v0.1
paulml/OGNO-7B is a mistral 7b variant afaik and this repo is an experimental repo, so might not be useable in prod
Thx for the great data sources.
v1olet / v1olet_merged_dpo_7B
README.md
model
1 matches
tags:
transformers, pytorch, mistral, text-generation, en, license:apache-2.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
7
8
9
10
11
DPO from the model ranked *6th* on the overall leaderboard and **1st** in the 7B leaderboard - v1olet/v1olet_marcoroni-go-bruins-merge-7B.
You can use alpaca template.
```
template_format = """{system}
radames / sd-21-DPO-LoRA
README.md
model
6 matches
tags:
diffusers, text-to-image, base_model:stabilityai/stable-diffusion-2-1, base_model:finetune:stabilityai/stable-diffusion-2-1, region:us
7
8
9
10
11
# DPO LoRA Stable Diffusion v2-1
Model trained with LoRA implementation of Diffusion DPO Read more [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo)
Base Model: https://huggingface.co/stabilityai/stable-diffusion-2-1
radames / sdxl-DPO-LoRA
README.md
model
2 matches
tags:
diffusers, text-to-image, base_model:stabilityai/stable-diffusion-xl-base-1.0, base_model:finetune:stabilityai/stable-diffusion-xl-base-1.0, region:us
7
8
9
10
11
# DPO LoRA Stable Diffusion XL
Model trained with LoRA implementation of Diffusion DPO Read more [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo)
Base Model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
radames / sdxl-turbo-DPO-LoRA
README.md
model
6 matches
tags:
diffusers, text-to-image, base_model:stabilityai/sdxl-turbo, base_model:finetune:stabilityai/sdxl-turbo, region:us
7
8
9
10
11
# DPO LoRA Stable Diffusion XL Turbo
Model trained with LoRA implementation of Diffusion DPO Read more [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo)
Base Model: https://huggingface.co/stabilityai/sdxl-turbo
yunconglong / 7Bx4_DPO
README.md
model
4 matches
tags:
transformers, safetensors, mixtral, text-generation, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us
6
7
8
9
10
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with jondurbin/truthy-dpo-v0.1
```
DPO Trainer
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
```