LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 115 • 9
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 122 • 13
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 37 • 8
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 18 • 1
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper • 2404.03648 • Published Apr 4 • 22 • 2
PointInfinity: Resolution-Invariant Point Diffusion Models Paper • 2404.03566 • Published Apr 4 • 13 • 1
Learning to Decode Collaboratively with Multiple Language Models Paper • 2403.03870 • Published Mar 6 • 17 • 6
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper • 2402.14830 • Published Feb 16 • 23 • 3
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 33 • 4
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16 • 39 • 7
Premise Order Matters in Reasoning with Large Language Models Paper • 2402.08939 • Published Feb 14 • 23 • 3
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper • 2402.08093 • Published Feb 12 • 52 • 9
Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7 • 25 • 2
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6 • 48 • 2
ReGAL: Refactoring Programs to Discover Generalizable Abstractions Paper • 2401.16467 • Published Jan 29 • 7 • 2
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30 • 15 • 4
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Paper • 2401.15687 • Published Jan 28 • 19 • 4
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Paper • 2401.15977 • Published Jan 29 • 34 • 8
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46 • 6
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design Paper • 2401.14112 • Published Jan 25 • 17 • 7
Orion-14B: Open-source Multilingual Large Language Models Paper • 2401.12246 • Published Jan 20 • 10 • 2
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16 • 35 • 5
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper • 2401.04658 • Published Jan 9 • 24 • 2
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40 • 3
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 253 • 7
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper • 2312.08361 • Published Dec 13, 2023 • 23 • 3
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 39 • 1
Steering Llama 2 via Contrastive Activation Addition Paper • 2312.06681 • Published Dec 9, 2023 • 9 • 1
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models Paper • 2312.07046 • Published Dec 12, 2023 • 12 • 1
LLM360: Towards Fully Transparent Open-Source LLMs Paper • 2312.06550 • Published Dec 11, 2023 • 52 • 3
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator Paper • 2312.04474 • Published Dec 7, 2023 • 28 • 2
Axiomatic Preference Modeling for Longform Question Answering Paper • 2312.02206 • Published Dec 2, 2023 • 7 • 1
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 25 • 4
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 69 • 5
Make Pixels Dance: High-Dynamic Video Generation Paper • 2311.10982 • Published Nov 18, 2023 • 65 • 5
The Generative AI Paradox: "What It Can Create, It May Not Understand" Paper • 2311.00059 • Published Oct 31, 2023 • 17 • 5
De-Diffusion Makes Text a Strong Cross-Modal Interface Paper • 2311.00618 • Published Nov 1, 2023 • 21 • 11
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Paper • 2310.20624 • Published Oct 31, 2023 • 12 • 9