-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 58 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • Updated • 6.01k • 237 -
alvarobartt/mistral-orpo-mix
Text Generation • Updated • 23 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • Updated • 1.91k • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2403.07691
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 58 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 31 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 43 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 25
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 15 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 41 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43
-
kaist-ai/mistral-orpo-beta
Text Generation • Updated • 2.66k • 34 -
kaist-ai/mistral-orpo-alpha
Text Generation • Updated • 2.52k • 9 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 58 -
kaist-ai/mistral-orpo-capybara-7k
Text Generation • Updated • 3.09k • 26
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 175 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 63 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 57 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 25
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 2 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 37 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 11
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 9 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 50 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 24 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 73
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 36 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 12 -
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 21