-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 36 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 6 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 153 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 43
Collections
Discover the best community collections!
Collections including paper arxiv:2310.06825
-
Llemma: An Open Language Model For Mathematics
Paper ā¢ 2310.10631 ā¢ Published ā¢ 45 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 43 -
Qwen Technical Report
Paper ā¢ 2309.16609 ā¢ Published ā¢ 30 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper ā¢ 2309.11568 ā¢ Published ā¢ 9
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 36 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper ā¢ 2005.11401 ā¢ Published ā¢ 11 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper ā¢ 2106.09685 ā¢ Published ā¢ 24 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper ā¢ 2205.14135 ā¢ Published ā¢ 8
-
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving
Paper ā¢ 2402.02519 ā¢ Published -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 153 -
Optimal Transport Aggregation for Visual Place Recognition
Paper ā¢ 2311.15937 ā¢ Published -
GOAT: GO to Any Thing
Paper ā¢ 2311.06430 ā¢ Published ā¢ 14
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper ā¢ 2402.17764 ā¢ Published ā¢ 567 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 153 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 43 -
Don't Make Your LLM an Evaluation Benchmark Cheater
Paper ā¢ 2311.01964 ā¢ Published ā¢ 1
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper ā¢ 2309.12307 ā¢ Published ā¢ 82 -
Small-scale proxies for large-scale Transformer training instabilities
Paper ā¢ 2309.14322 ā¢ Published ā¢ 17 -
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper ā¢ 2309.16058 ā¢ Published ā¢ 53 -
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Paper ā¢ 2310.06147 ā¢ Published ā¢ 1
-
FreeU: Free Lunch in Diffusion U-Net
Paper ā¢ 2309.11497 ā¢ Published ā¢ 63 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper ā¢ 2309.08532 ā¢ Published ā¢ 50 -
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper ā¢ 2309.12307 ā¢ Published ā¢ 82 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 43
-
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper ā¢ 2306.01116 ā¢ Published ā¢ 28 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper ā¢ 2205.14135 ā¢ Published ā¢ 8 -
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper ā¢ 2104.09864 ā¢ Published ā¢ 7 -
Language Models are Few-Shot Learners
Paper ā¢ 2005.14165 ā¢ Published ā¢ 10