leonardlin
's Collections
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
•
2401.02954
•
Published
•
41
Paper
•
2309.16609
•
Published
•
35
Paper
•
2303.08774
•
Published
•
5
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
44
An In-depth Look at Gemini's Language Abilities
Paper
•
2312.11444
•
Published
•
1
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
Generative Artificial Intelligence (AI) Research Landscape
Paper
•
2312.10868
•
Published
•
1
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language
Models
Paper
•
2312.17661
•
Published
•
13
Paper
•
2310.06825
•
Published
•
47
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
89
Textbooks Are All You Need II: phi-1.5 technical report
Paper
•
2309.05463
•
Published
•
87
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
142
Paper
•
2401.04088
•
Published
•
158
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
70
Magicoder: Source Code Is All You Need
Paper
•
2312.02120
•
Published
•
80
Towards Conversational Diagnostic AI
Paper
•
2401.05654
•
Published
•
16
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
•
2401.13601
•
Published
•
45
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
51
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on
Generalizability, Trustworthiness and Causality through Four Modalities
Paper
•
2401.15071
•
Published
•
35
Language Models can be Logical Solvers
Paper
•
2311.06158
•
Published
•
18
OLMo: Accelerating the Science of Language Models
Paper
•
2402.00838
•
Published
•
82
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
72
BlackMamba: Mixture of Experts for State-Space Models
Paper
•
2402.01771
•
Published
•
23
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity
Text Embeddings Through Self-Knowledge Distillation
Paper
•
2402.03216
•
Published
•
4
Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
10
Not all layers are equally as important: Every Layer Counts BERT
Paper
•
2311.02265
•
Published
•
1
An Interactive Agent Foundation Model
Paper
•
2402.05929
•
Published
•
27
Advancing State of the Art in Language Modeling
Paper
•
2312.03735
•
Published
•
1
Large Language Models: A Survey
Paper
•
2402.06196
•
Published
•
3
ChemLLM: A Chemical Large Language Model
Paper
•
2402.06852
•
Published
•
27
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
•
2402.07456
•
Published
•
41
Grandmaster-Level Chess Without Search
Paper
•
2402.04494
•
Published
•
67
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts
for Instruction Tuning on General Tasks
Paper
•
2401.02731
•
Published
•
2
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126
Yi: Open Foundation Models by 01.AI
Paper
•
2403.04652
•
Published
•
62
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
124
InternLM2 Technical Report
Paper
•
2403.17297
•
Published
•
30
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
•
2404.12387
•
Published
•
38
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
149
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper
•
2405.12981
•
Published
•
28
Observational Scaling Laws and the Predictability of Language Model
Performance
Paper
•
2405.10938
•
Published
•
11