-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 99 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 32 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 48 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 42
Collections
Discover the best community collections!
Collections including paper arxiv:2404.03592
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 21 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
stanford-crfm/BioMedLM
Text Generation • Updated • 2.5k • 379 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 56 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 75 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 28 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 119
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 15 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 42 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 180 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 65 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 58 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 26
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 180 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 75
-
Efficient Few-Shot Learning Without Prompts
Paper • 2209.11055 • Published • 2 -
Parameter-Efficient Transfer Learning for NLP
Paper • 1902.00751 • Published • 1 -
GPT Understands, Too
Paper • 2103.10385 • Published • 6 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 8
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 50 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 46 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 126 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 91 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 106
-
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 18 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 99 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 9 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102