🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated 2 days ago • 25
Llama 3.1 Collection This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 1 day ago • 384
Scaling Diffusion Transformers to 16 Billion Parameters Paper • 2407.11633 • Published 11 days ago • 22
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published 28 days ago • 84
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 75
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
Aya Datasets Collection The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated 29 days ago • 9
C4AI Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated May 23 • 40
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 243
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 23 items • Updated 15 days ago • 378
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published May 15 • 15
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 72
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 180
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22 • 6
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 98
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 582
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 39
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 53
Jointly Training Large Autoregressive Multimodal Models Paper • 2309.15564 • Published Sep 27, 2023 • 8
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 43
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 84
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 23