🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 57
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings Paper • 2402.17016 • Published Feb 26 • 5
DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination Paper • 2311.15477 • Published Nov 27, 2023 • 2
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Paper • 2308.11462 • Published Aug 20, 2023 • 3