Full Text Search - Hugging Face

Full-text search

models datasets spaces

+ 1,000 results

eren23 / dpo-binarized-NeutrixOmnibe-7B

README.md

model

5 matches

tags: transformers, safetensors, mistral, text-generation, merge, dpo, conversation, text-generation-inference, Kukedlc/NeuTrixOmniBe-7B-model-remix, en, dataset:argilla/OpenHermes2.5-dpo-binarized-alpha, license:apache-2.0, model-index, autotrain_compatible, endpoints_compatible, region:us

DPO Finetuned Kukedlc/NeuTrixOmniBe-7B-model-remix using argilla/OpenHermes2.5-dpo-binarized-alpha

argilla dpo binarized pairs is a dataset built on top of: https://huggingface.co/datasets/teknium/OpenHermes-2.5 using https://github.com/argilla-io/distilabel if interested.

Thx for the great data sources.

eren23 / dpo-binarized-NeuralTrix-7B

README.md

model

4 matches

tags: transformers, safetensors, mistral, text-generation, conversation, text-generation-inference, CultriX/NeuralTrix-7B-dpo, dpo, merge, en, dataset:argilla/OpenHermes2.5-dpo-binarized-alpha, license:apache-2.0, model-index, autotrain_compatible, endpoints_compatible, region:us

DPO Finetuned CultriX/NeuralTrix-7B-dpo using argilla/OpenHermes2.5-dpo-binarized-alpha

argilla dpo binarized pairs is a dataset built on top of: https://huggingface.co/datasets/teknium/OpenHermes-2.5 using https://github.com/argilla-io/distilabel if interested.

Thx for the great data sources.

cloudyu / Pluto_13B_DPO

README.md

model

2 matches

tags: transformers, safetensors, mixtral, text-generation, moe, dpo, license:cc-by-nc-4.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

# DPO of cloudyu/Mixtral_7Bx2_MoE

dataset : jondurbin/truthy-dpo-v0.1

* metrics average 75.88

LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-3.0bpw-h6-exl2

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-4.0bpw-h6-exl2

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-5.0bpw-h6-exl2

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

yunconglong / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-6.0bpw-h6-exl2

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

LoneStriker / Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B-8.0bpw-h8-exl2

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_7Bx2_MoE_14B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

yunconglong / Truthful_DPO_MOE_19B

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, conversational, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

cloudyu / Truthful_DPO_TomGrc_FusionNet_34Bx2_MoE

README.md

model

6 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, conversational, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [TomGrc/FusionNet_34Bx2_MoE]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

cloudyu / Truthful_DPO_cloudyu_Mixtral_34Bx2_MoE_60B

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset jondurbin/truthy-dpo-v0.1 to improve [cloudyu/Mixtral_34Bx2_MoE_60B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

yunconglong / 13B_MATH_DPO

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset kyujinpy/orca_math_dpo to improve [yunconglong/MoE_13B_DPO]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

yunconglong / MoE_13B_DPO

README.md

model

5 matches

tags: transformers, safetensors, mixtral, text-generation, moe, DPO, RL-TUNED, license:other, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with dataset Intel/orca_dpo_pairs to improve [yunconglong/Truthful_DPO_TomGrc_FusionNet_7Bx2_MoE_13B]

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.

eren23 / OGNO-7b-dpo-truthful

README.md

model

2 matches

tags: transformers, pytorch, mistral, text-generation, merge, dpo, text-generation-inference, en, dataset:jondurbin/truthy-dpo-v0.1, license:apache-2.0, model-index, autotrain_compatible, endpoints_compatible, region:us

DPO Finetuned paulml/OGNO-7B using jondurbin/truthy-dpo-v0.1

paulml/OGNO-7B is a mistral 7b variant afaik and this repo is an experimental repo, so might not be useable in prod

Thx for the great data sources.

v1olet / v1olet_merged_dpo_7B

README.md

model

1 matches

tags: transformers, pytorch, mistral, text-generation, en, license:apache-2.0, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

DPO from the model ranked *6th* on the overall leaderboard and **1st** in the 7B leaderboard - v1olet/v1olet_marcoroni-go-bruins-merge-7B.

You can use alpaca template.

template_format = """{system}

radames / sd-21-DPO-LoRA

README.md

model

6 matches

tags: diffusers, text-to-image, base_model:stabilityai/stable-diffusion-2-1, base_model:finetune:stabilityai/stable-diffusion-2-1, region:us

# DPO LoRA Stable Diffusion v2-1

Model trained with LoRA implementation of Diffusion DPO Read more [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo)

Base Model: https://huggingface.co/stabilityai/stable-diffusion-2-1

radames / sdxl-DPO-LoRA

README.md

model

2 matches

tags: diffusers, text-to-image, base_model:stabilityai/stable-diffusion-xl-base-1.0, base_model:finetune:stabilityai/stable-diffusion-xl-base-1.0, region:us

# DPO LoRA Stable Diffusion XL

Model trained with LoRA implementation of Diffusion DPO Read more [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo)

Base Model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

radames / sdxl-turbo-DPO-LoRA

README.md

model

6 matches

tags: diffusers, text-to-image, base_model:stabilityai/sdxl-turbo, base_model:finetune:stabilityai/sdxl-turbo, region:us

# DPO LoRA Stable Diffusion XL Turbo

Model trained with LoRA implementation of Diffusion DPO Read more [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/diffusion_dpo)

Base Model: https://huggingface.co/stabilityai/sdxl-turbo

yunconglong / 7Bx4_DPO

README.md

model

4 matches

tags: transformers, safetensors, mixtral, text-generation, license:mit, autotrain_compatible, text-generation-inference, endpoints_compatible, region:us

* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with jondurbin/truthy-dpo-v0.1

TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.