Foundation AI Papers (II)
- Paper • 2404.19733 • Published • 43
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 64Note well ...
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 58
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 97Note "Less scalable version" of AGI backend model
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Paper • 2303.02536 • Published • 1Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 7Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Paper • 2310.01801 • Published • 3Aligning LLM Agents by Learning Latent Preference from User Edits
Paper • 2404.15269 • Published • 1Language-Image Models with 3D Understanding
Paper • 2405.03685 • Published • 1Chain of Thoughtlessness: An Analysis of CoT in Planning
Paper • 2405.04776 • Published • 1Memory Mosaics
Paper • 2405.06394 • Published • 2The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 12RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 61
PHUDGE: Phi-3 as Scalable Judge
Paper • 2405.08029 • Published • 1Note LoRA fine-tune on judge LM, using dataset from Prometheus's 10K feedback dataset. Turn LLM into a classifier to increase 'overfitting' and get a slightly better performing model based on Phi-3 (which arguably already have a stronger performance than Mistral) Not that surprising, and using large dataset to fine-tune on human preference is boring. They did release code for the experiment which is nice to have. The real gem is efficient alignment.
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Paper • 2405.09220 • Published • 23Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 11
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Paper • 2405.05904 • Published • 5Note A good way to avoid penalty while being lazy is just to be generic, or provide fake information
Robust agents learn causal world models
Paper • 2402.10877 • Published • 2How Far Are We From AGI
Paper • 2405.10313 • Published • 2
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 42Note What is the difference again?
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Paper • 2405.12939 • Published • 1
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 73Note Duh
The Platonic Representation Hypothesis
Paper • 2405.07987 • Published • 1Note Intelligence has at least 2 levels: Level 1 associative intelligence, key to achieve it is representation of concept such that 'distance' between representation vectors accurately depict the closeness of these concepts, such intelligence can be achieved with Supervised Learning. Level 2 is deductive intelligence, key to achieve that is searching for the right connection and reach the correct conclusion robustless to noisy input. This should be achieved with Reinforcement Learning.
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct
Paper • 2405.14906 • Published • 18Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning
Paper • 2405.17258 • Published • 11Executable Code Actions Elicit Better LLM Agents
Paper • 2402.01030 • Published • 21
Contextual Position Encoding: Learning to Count What's Important
Paper • 2405.18719 • Published • 3Note HUGE
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Paper • 2405.18512 • Published • 1