Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math Paper • 2312.17120 • Published Dec 28, 2023 • 24
Learning Vision from Models Rivals Learning Vision from Data Paper • 2312.17742 • Published Dec 28, 2023 • 12
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 72
IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions Paper • 2305.14010 • Published May 23, 2023
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 45