Collections
Discover the best community collections!
Collections including paper arxiv:2401.02385
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper ā¢ 2401.01055 ā¢ Published ā¢ 53 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper ā¢ 2401.01335 ā¢ Published ā¢ 64 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper ā¢ 2401.00908 ā¢ Published ā¢ 178 -
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper ā¢ 2401.01854 ā¢ Published ā¢ 10
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 41 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ā¢ 1810.04805 ā¢ Published ā¢ 14 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper ā¢ 1907.11692 ā¢ Published ā¢ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper ā¢ 1910.01108 ā¢ Published ā¢ 14
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper ā¢ 2312.13964 ā¢ Published ā¢ 18 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper ā¢ 2312.11514 ā¢ Published ā¢ 257 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper ā¢ 2312.12491 ā¢ Published ā¢ 69 -
LLaVA-Ļ: Efficient Multi-Modal Assistant with Small Language Model
Paper ā¢ 2401.02330 ā¢ Published ā¢ 14
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper ā¢ 2312.11514 ā¢ Published ā¢ 257 -
3D-LFM: Lifting Foundation Model
Paper ā¢ 2312.11894 ā¢ Published ā¢ 13 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper ā¢ 2312.15166 ā¢ Published ā¢ 56 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper ā¢ 2312.16862 ā¢ Published ā¢ 30
-
togethercomputer/StripedHyena-Hessian-7B
Text Generation ā¢ Updated ā¢ 36 ā¢ 62 -
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Paper ā¢ 2312.08618 ā¢ Published ā¢ 11 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper ā¢ 2312.07987 ā¢ Published ā¢ 40 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper ā¢ 2312.06550 ā¢ Published ā¢ 56
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 41 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 157 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47
-
Llemma: An Open Language Model For Mathematics
Paper ā¢ 2310.10631 ā¢ Published ā¢ 49 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47 -
Qwen Technical Report
Paper ā¢ 2309.16609 ā¢ Published ā¢ 34 -
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper ā¢ 2309.11568 ā¢ Published ā¢ 10
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper ā¢ 2309.05519 ā¢ Published ā¢ 78 -
Large Language Model for Science: A Study on P vs. NP
Paper ā¢ 2309.05689 ā¢ Published ā¢ 20 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper ā¢ 2309.06126 ā¢ Published ā¢ 16 -
Large Language Models for Compiler Optimization
Paper ā¢ 2309.07062 ā¢ Published ā¢ 22