-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 581 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 243 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 238 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 255
Collections
Discover the best community collections!
Collections including paper arxiv:2312.11514
-
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 168k • 3.59k -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 238 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 243 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 255
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 16 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 23 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 255
-
Towards a World-English Language Model for On-Device Virtual Assistants
Paper • 2403.18783 • Published • 4 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 123 -
ReALM: Reference Resolution As Language Modeling
Paper • 2403.20329 • Published • 20 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 58
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 103 -
Sensor-based Multi-Robot Search and Coverage with Spatial Separation in Unstructured Environments
Paper • 2403.01710 • Published • 2 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published -
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems
Paper • 2306.12691 • Published • 2