Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2401.14019

Synthetic Data Generation

A curated list of papers focusing on synthetic data generation

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20 • 47
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16 • 30
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 28

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Paper • 2401.14019 • Published Jan 25 • 21
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

Paper • 2308.16884 • Published Aug 31, 2023 • 8
Genie: Achieving Human Parity in Content-Grounded Datasets Generation

Paper • 2401.14367 • Published Jan 25 • 7
ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M

Updated 9 days ago • 289 • 63

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Paper • 2401.14019 • Published Jan 25 • 21
Running

21

👁

Explore Unitxt
unitxt/data

Updated 12 days ago • 1.4k
Running

📈

Unitxt Metric

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30 • 16
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation

Paper • 2401.15688 • Published Jan 28 • 11
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 69
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Paper • 2401.15071 • Published Jan 26 • 35

Interesting Miscellaneous

Training Chain-of-Thought via Latent-Variable Inference

Paper • 2312.02179 • Published Nov 28, 2023 • 8
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 257
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

Paper • 2312.11595 • Published Dec 18, 2023 • 5
Quantum Denoising Diffusion Models

Paper • 2401.07049 • Published Jan 13 • 12

Research Papers

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Paper • 2310.13127 • Published Oct 19, 2023 • 11
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

Paper • 2310.13961 • Published Oct 21, 2023 • 4
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

Paper • 2309.09582 • Published Sep 18, 2023 • 4
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Paper • 2401.14019 • Published Jan 25 • 21

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs