Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs Paper • 2310.13961 • Published Oct 21, 2023 • 4
ZeroGen: Efficient Zero-shot Learning via Dataset Generation Paper • 2202.07922 • Published Feb 16, 2022 • 1
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 17
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs Paper • 2309.09582 • Published Sep 18, 2023 • 4
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models Paper • 2310.13127 • Published Oct 19, 2023 • 10
TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design Paper • 2309.05447 • Published Sep 11, 2023 • 1
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning Paper • 2310.04484 • Published Oct 6, 2023 • 4
Diversity of Thought Improves Reasoning Abilities of Large Language Models Paper • 2310.07088 • Published Oct 11, 2023 • 4
Text Data Augmentation in Low-Resource Settings via Fine-Tuning of Large Language Models Paper • 2310.01119 • Published Oct 2, 2023 • 1
Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model Paper • 2310.08072 • Published Oct 12, 2023 • 1
Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations Paper • 2310.07849 • Published Oct 11, 2023 • 1
Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering Paper • 2309.06358 • Published Sep 3, 2023 • 1
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Paper • 2309.00267 • Published Sep 1, 2023 • 45
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 69
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor Paper • 2212.09689 • Published Dec 19, 2022 • 1
Democratizing Reasoning Ability: Tailored Learning from Large Language Model Paper • 2310.13332 • Published Oct 20, 2023 • 14
Teaching Language Models to Self-Improve through Interactive Demonstrations Paper • 2310.13522 • Published Oct 20, 2023 • 10
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection Paper • 2310.05035 • Published Oct 8, 2023 • 1
Tuna: Instruction Tuning using Feedback from Large Language Models Paper • 2310.13385 • Published Oct 20, 2023 • 8
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning Paper • 2310.11716 • Published Oct 18, 2023 • 3
CITING: Large Language Models Create Curriculum for Instruction Tuning Paper • 2310.02527 • Published Oct 4, 2023 • 2
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning Paper • 2310.04474 • Published Oct 6, 2023 • 2
UltraFeedback: Boosting Language Models with High-quality Feedback Paper • 2310.01377 • Published Oct 2, 2023 • 4
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques Paper • 2310.08101 • Published Oct 12, 2023 • 1
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation Paper • 2310.03214 • Published Oct 5, 2023 • 14
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Paper • 2308.09583 • Published Aug 18, 2023 • 7
Retrieval-Generation Synergy Augmented Large Language Models Paper • 2310.05149 • Published Oct 8, 2023 • 1
Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation Paper • 2310.08395 • Published Oct 12, 2023 • 1
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 49
LMDX: Language Model-based Document Information Extraction and Localization Paper • 2309.10952 • Published Sep 19, 2023 • 60
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 27
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 36
In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 26
Large Language Models Are Also Good Prototypical Commonsense Reasoners Paper • 2309.13165 • Published Sep 22, 2023 • 1
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models Paper • 2310.05074 • Published Oct 8, 2023 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Paper • 2309.12284 • Published Sep 21, 2023 • 16
Commonsense Knowledge Transfer for Pre-trained Language Models Paper • 2306.02388 • Published Jun 4, 2023 • 1
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model Paper • 2306.10241 • Published Jun 17, 2023 • 1
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 22
In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning Paper • 2308.04275 • Published Aug 8, 2023 • 1
Enable Language Models to Implicitly Learn Self-Improvement From Data Paper • 2310.00898 • Published Oct 2, 2023 • 21
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 84
Aligning Large Language Models through Synthetic Feedback Paper • 2305.13735 • Published May 23, 2023 • 1
Reinforced Self-Training (ReST) for Language Modeling Paper • 2308.08998 • Published Aug 17, 2023 • 2
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources Paper • 2306.04751 • Published Jun 7, 2023 • 4
Query2doc: Query Expansion with Large Language Models Paper • 2303.07678 • Published Mar 14, 2023 • 1
Generative Relevance Feedback with Large Language Models Paper • 2304.13157 • Published Apr 25, 2023 • 1
InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval Paper • 2301.01820 • Published Jan 4, 2023 • 1
Exploring the Viability of Synthetic Query Generation for Relevance Prediction Paper • 2305.11944 • Published May 19, 2023 • 1
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning Paper • 2305.18169 • Published May 29, 2023 • 1
Automated Annotation with Generative AI Requires Validation Paper • 2306.00176 • Published May 31, 2023 • 1
Augmented Large Language Models with Parametric Knowledge Guiding Paper • 2305.04757 • Published May 8, 2023 • 2
Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval Paper • 2308.08285 • Published Aug 16, 2023 • 1
Learning to Retrieve In-Context Examples for Large Language Models Paper • 2307.07164 • Published Jul 14, 2023 • 20
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning Paper • 2211.03044 • Published Nov 6, 2022 • 1
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models Paper • 2309.10707 • Published Sep 18, 2023 • 2
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation Paper • 2310.14192 • Published Oct 22, 2023 • 1
The Program Testing Ability of Large Language Models for Code Paper • 2310.05727 • Published Oct 9, 2023 • 1
Assessing the potential of AI-assisted pragmatic annotation: The case of apologies Paper • 2305.08339 • Published May 15, 2023 • 1
Effectiveness of Data Augmentation for Parameter Efficient Tuning with Limited Data Paper • 2303.02577 • Published Mar 5, 2023 • 1
Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis Paper • 2306.07664 • Published Jun 13, 2023 • 1
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise Paper • 2310.19019 • Published Oct 29, 2023 • 9
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers Paper • 2309.08532 • Published Sep 15, 2023 • 50
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data Paper • 2309.13876 • Published Sep 25, 2023 • 1
Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning Paper • 2305.18170 • Published May 29, 2023 • 2
Constructing Multilingual Code Search Dataset Using Neural Machine Translation Paper • 2306.15604 • Published Jun 27, 2023 • 1
Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization Paper • 2305.16430 • Published May 25, 2023 • 1
Generating Efficient Training Data via LLM-based Attribute Manipulation Paper • 2307.07099 • Published Jul 14, 2023 • 1
End-to-end Knowledge Retrieval with Multi-modal Queries Paper • 2306.00424 • Published Jun 1, 2023 • 1
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning Paper • 2309.10687 • Published Sep 16, 2023 • 1
AugGPT: Leveraging ChatGPT for Text Data Augmentation Paper • 2302.13007 • Published Feb 25, 2023 • 1
Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost Paper • 2306.15766 • Published Jun 27, 2023 • 1
Quick Starting Dialog Systems with Paraphrase Generation Paper • 2204.02546 • Published Apr 6, 2022 • 1
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework Paper • 2111.04130 • Published Nov 7, 2021 • 1
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Paper • 2308.01825 • Published Aug 3, 2023 • 19
Harnessing the Power of David against Goliath: Exploring Instruction Data Generation without Using Closed-Source Models Paper • 2308.12711 • Published Aug 24, 2023 • 1
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators Paper • 2303.16854 • Published Mar 29, 2023 • 1
Training Language Models with Language Feedback at Scale Paper • 2303.16755 • Published Mar 28, 2023 • 1
Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus Paper • 2002.00748 • Published Jan 27, 2020 • 1
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 26
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 49
Self-Instruct: Aligning Language Model with Self Generated Instructions Paper • 2212.10560 • Published Dec 20, 2022 • 5
WizardLM: Empowering Large Language Models to Follow Complex Instructions Paper • 2304.12244 • Published Apr 24, 2023 • 13
WizardCoder: Empowering Code Large Language Models with Evol-Instruct Paper • 2306.08568 • Published Jun 14, 2023 • 27
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction Paper • 2401.06201 • Published Jan 11 • 2
AceCoder: Utilizing Existing Code to Enhance Code Generation Paper • 2303.17780 • Published Mar 31, 2023 • 1
SPADE: Synthesizing Assertions for Large Language Model Pipelines Paper • 2401.03038 • Published Jan 5 • 2
Mixture of Soft Prompts for Controllable Data Generation Paper • 2303.01580 • Published Mar 2, 2023 • 1
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 45
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 72
CooK: Empowering General-Purpose Language Models with Modular and Collaborative Knowledge Paper • 2305.09955 • Published May 17, 2023 • 1
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Paper • 2402.10379 • Published Feb 16 • 27
A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages Paper • 2402.01939 • Published Feb 2 • 1
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models Paper • 2403.00231 • Published Mar 1 • 1
GECTurk: Grammatical Error Correction and Detection Dataset for Turkish Paper • 2309.11346 • Published Sep 20, 2023
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement Paper • 2403.15042 • Published Mar 22 • 24
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs Paper • 2402.16352 • Published Feb 26 • 1
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 33
CodecLM: Aligning Language Models with Tailored Synthetic Data Paper • 2404.05875 • Published Apr 8 • 15
NExT: Teaching Large Language Models to Reason about Code Execution Paper • 2404.14662 • Published 29 days ago • 3
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation Paper • 2402.18334 • Published Feb 28 • 12
GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning Paper • 2404.09163 • Published Apr 14
Better Synthetic Data by Retrieving and Transforming Existing Datasets Paper • 2404.14361 • Published 29 days ago • 1
DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation Paper • 2404.02489 • Published Apr 3
Prompting-based Synthetic Data Generation for Few-Shot Question Answering Paper • 2405.09335 • Published 6 days ago
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation Paper • 2405.10040 • Published 5 days ago