Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 23
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 23 days ago • 45
view article Article Selective fine-tuning of Language Models with Spectrum By anakin87 • 29 days ago • 26
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published Aug 22 • 50
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. • 39 items • Updated Jul 17 • 35
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published Jul 1 • 39
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 108
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 64
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23 • 35
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 32
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 59
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline Paper • 2404.02893 • Published Apr 3 • 19
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19 • 24
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 72
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7 • 16
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
A Critical Evaluation of AI Feedback for Aligning Large Language Models Paper • 2402.12366 • Published Feb 19 • 3
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 592
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper • 2402.14830 • Published Feb 16 • 24
GPTVQ: The Blessing of Dimensionality for LLM Quantization Paper • 2402.15319 • Published Feb 23 • 19
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning Paper • 2402.04833 • Published Feb 7 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures Paper • 2402.03620 • Published Feb 6 • 109
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 28
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses Paper • 2312.00763 • Published Dec 1, 2023 • 19
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 39
YUAN 2.0: A Large Language Model with Localized Filtering-based Attention Paper • 2311.15786 • Published Nov 27, 2023 • 7
Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers Paper • 2311.10642 • Published Nov 17, 2023 • 23
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery Paper • 2310.18356 • Published Oct 24, 2023 • 22
A Simple and Effective Pruning Approach for Large Language Models Paper • 2306.11695 • Published Jun 20, 2023 • 3
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 14
Sparse Finetuning for Inference Acceleration of Large Language Models Paper • 2310.06927 • Published Oct 10, 2023 • 14
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 35
The Impact of Depth and Width on Transformer Language Model Generalization Paper • 2310.19956 • Published Oct 30, 2023 • 9