Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 8 days ago • 222
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published 5 days ago • 43
Research Papers Collection A collection of papers focused on LLM • 19 items • Updated Mar 11 • 1
Empowering SLMs Collection Collection of resources focusing on making Small Language Models (SLMs) better at various tasks • 6 items • Updated 1 day ago • 2
Synthetic Data Generation Collection A curated list of papers focusing on synthetic data generation • 9 items • Updated Mar 11 • 2
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Paper • 2402.17553 • Published Feb 27 • 21
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 104
Scaling Laws for Downstream Task Performance of Large Language Models Paper • 2402.04177 • Published Feb 6 • 16
The Stack: 3 TB of permissively licensed source code Paper • 2211.15533 • Published Nov 20, 2022 • 5
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22 • 40
Pheme: Efficient and Conversational Speech Generation Paper • 2401.02839 • Published Jan 5 • 14
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 72
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 40
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 26
RMT: Retentive Networks Meet Vision Transformers Paper • 2309.11523 • Published Sep 20, 2023 • 32
Efficient Streaming Language Models with Attention Sinks Paper • 2309.17453 • Published Sep 29, 2023 • 13
PDFTriage: Question Answering over Long, Structured Documents Paper • 2309.08872 • Published Sep 16, 2023 • 51
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales Paper • 2308.01320 • Published Aug 2, 2023 • 42
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 165
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning Paper • 2307.03692 • Published Jul 5, 2023 • 23
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 23
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation Paper • 2306.07954 • Published Jun 13, 2023 • 111