-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 45 -
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 5 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2402.00838
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 64 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 62 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83
-
google/flan-t5-large
Text2Text Generation • Updated • 1.69M • • 710 -
deepseek-ai/deepseek-coder-6.7b-instruct
Text Generation • Updated • 89.2k • 385 -
Object Recognition as Next Token Prediction
Paper • 2312.02142 • Published • 14 -
colbert-ir/dspy-Oct11-T5-Large-MH-3k-v1
Text2Text Generation • Updated • 14 • 1
-
Holistic Evaluation of Text-To-Image Models
Paper • 2311.04287 • Published • 16 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15 -
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 12 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 15
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 88 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 65 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 97