leonardlin
's Collections
tuning
updated
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
82
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper
•
2310.05914
•
Published
•
13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
55
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
•
2401.03462
•
Published
•
25
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
57
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
44
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
and Toxicity
Paper
•
2401.01967
•
Published
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
116
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
37
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
27
What Makes Good Data for Alignment? A Comprehensive Study of Automatic
Data Selection in Instruction Tuning
Paper
•
2312.15685
•
Published
•
16
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
135
TOFU: A Task of Fictitious Unlearning for LLMs
Paper
•
2401.06121
•
Published
•
14
Tuning LLMs with Contrastive Alignment Instructions for Machine
Translation in Unseen, Low-resource Languages
Paper
•
2401.05811
•
Published
•
5
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
61
WARM: On the Benefits of Weight Averaged Reward Models
Paper
•
2401.12187
•
Published
•
17
Learning Universal Predictors
Paper
•
2401.14953
•
Published
•
18
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
•
2401.16380
•
Published
•
45
Language Models can be Logical Solvers
Paper
•
2311.06158
•
Published
•
14
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
26
Continual Learning for Large Language Models: A Survey
Paper
•
2402.01364
•
Published
•
1
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
25
Vision Superalignment: Weak-to-Strong Generalization for Vision
Foundation Models
Paper
•
2402.03749
•
Published
•
9
Suppressing Pink Elephants with Direct Principle Feedback
Paper
•
2402.07896
•
Published
•
7
How to Train Data-Efficient LLMs
Paper
•
2402.09668
•
Published
•
33
QuRating: Selecting High-Quality Data for Training Language Models
Paper
•
2402.09739
•
Published
•
3
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
•
2402.09353
•
Published
•
18
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Paper
•
2402.13228
•
Published
•
3
FuseChat: Knowledge Fusion of Chat Models
Paper
•
2402.16107
•
Published
•
35
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
•
2403.13372
•
Published
•
51
Evolutionary Optimization of Model Merging Recipes
Paper
•
2403.13187
•
Published
•
44
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language
Model Fine-Tuning
Paper
•
2403.17919
•
Published
•
15
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
57
Insights into Alignment: Evaluating DPO and its Variants Across Multiple
Tasks
Paper
•
2404.14723
•
Published
•
9