-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 75 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 52 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14219
-
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Paper • 2403.16990 • Published • 24 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 49 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 29 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 19
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 30 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 15 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 8 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 40 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 32 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 37 -
microsoft/phi-1_5
Text Generation • Updated • 95.8k • 1.3k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 14 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • Updated • 24 • 7
-
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper • 2403.07508 • Published • 73 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 243 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 124 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 116
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 74 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 45 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 48 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 39
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 180 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 65 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 58 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 27