Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.07691

Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 58
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Text Generation • Updated Apr 18 • 6.01k • 237
alvarobartt/mistral-orpo-mix

Text Generation • Updated Mar 24 • 23
alvarobartt/Mistral-7B-v0.1-ORPO

Text Generation • Updated Mar 23 • 1.91k • 15

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 58
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 31
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 43
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 25

Papers - Fine-tuning

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 15
SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 41
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 43

This is the official collection of "ORPO: Monolithic Preference Optimization without Reference Model".

kaist-ai/mistral-orpo-beta

Text Generation • Updated Mar 17 • 2.66k • 34
kaist-ai/mistral-orpo-alpha

Text Generation • Updated Mar 17 • 2.52k • 9
ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 58
kaist-ai/mistral-orpo-capybara-7k

Text Generation • Updated Mar 23 • 3.09k • 26

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 175
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 63
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 57
InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26 • 25

ibm/AttaQ

Viewer • Updated Jan 26 • 1.57k • 5
ibm/merlinite-7b

Text Generation • Updated Mar 5 • 12.2k • 99
microsoft/Orca-2-13b

Text Generation • Updated Nov 22, 2023 • 20k • 651
snorkelai/snorkel-curated-instruction-tuning

Preview • Updated Mar 11 • 2 • 9

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 2
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 37
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 11

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Paper • 2401.01854 • Published Jan 3 • 9
LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2 • 50
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 24
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 73

Table-GPT: Table-tuned GPT for Diverse Table Tasks

Paper • 2310.09263 • Published Oct 13, 2023 • 36
A Zero-Shot Language Agent for Computer Control with Structured Reflection

Paper • 2310.08740 • Published Oct 12, 2023 • 14
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 12
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 21

Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 72
ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 58

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs