-
Communicative Agents for Software Development
Paper • 2307.07924 • Published • 2 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 34 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2404.02258
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 78 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 140 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 589 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 102 -
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Paper • 2403.09977 • Published • 9 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 11
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 35 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 83 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 102 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 102 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 40 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 62