-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 47 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 30 -
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 28
Collections
Discover the best community collections!
Collections including paper arxiv:2401.14019
-
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
Paper • 2401.14019 • Published • 21 -
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Paper • 2308.16884 • Published • 8 -
Genie: Achieving Human Parity in Content-Grounded Datasets Generation
Paper • 2401.14367 • Published • 7 -
ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M
Updated • 289 • 63
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 16 -
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation
Paper • 2401.15688 • Published • 11 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 69 -
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Paper • 2401.15071 • Published • 35
-
Training Chain-of-Thought via Latent-Variable Inference
Paper • 2312.02179 • Published • 8 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257 -
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions
Paper • 2312.11595 • Published • 5 -
Quantum Denoising Diffusion Models
Paper • 2401.07049 • Published • 12
-
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 11 -
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 4 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
Paper • 2401.14019 • Published • 21