Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.12318

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published Jun 17 • 37
Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published Jun 12 • 14
WPO: Enhancing RLHF with Weighted Preference Optimization

Paper • 2406.11827 • Published Jun 17 • 14
In-Context Editing: Learning Knowledge from Self-Induced Distributions

Paper • 2406.11194 • Published Jun 17 • 15

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18 • 38
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Papers - Datasets - Multilingual - OpenAssistant

multilingual, pairwise human-rated chat transcripts. For the SFT data, we use the human-preferred response in each pair to finetune the model

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Papers - Datasets - Multilingual - Documents - Seahorse

contains documents and summaries in six languages (German, English, Spanish, Russian, Turkish, and Vietnamese) with pointwise human ratings

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Papers - Reward Model - Cross-Lingual

We propose to perform reward optimization using a RM trained for a different language. Assuming model generation quality transfers cross-lingually

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Papers - Fine-tuning - Reward Model

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Papers - Reward Model - Fine-tuning

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 116

Papers - Reward Model

Fine-Tuning Language Models from Human Preferences

Paper • 1909.08593 • Published Sep 18, 2019 • 3
Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1 • 11
Leverage the Average: an Analysis of KL Regularization in RL

Paper • 2003.14089 • Published Mar 31, 2020 • 2
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Paper • 2404.01258 • Published Apr 1 • 10

Papers - Google

about 21 hours ago

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27 • 24
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27 • 25
TC4D: Trajectory-Conditioned Text-to-4D Generation

Paper • 2403.17920 • Published Mar 26 • 16

Papers - University - MIT

One-step Diffusion with Distribution Matching Distillation

Paper • 2311.18828 • Published Nov 30, 2023 • 3
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 78
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1 • 11
Locating and Editing Factual Associations in GPT

Paper • 2202.05262 • Published Feb 10, 2022 • 1

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs