Running 105 105 TxT360: Trillion Extracted Text ๐ Create a large, deduplicated dataset for LLM pre-training
Probably function calling datasets Collection Created using the https://huggingface.co/spaces/librarian-bots/dataset-column-search-api Space. โข 39 items โข Updated Jul 17, 2024 โข 37
data-is-better-together/10k_prompts_ranked Viewer โข Updated Mar 7, 2024 โข 10.3k โข 2.81k โข 150
togethercomputer/m2-bert-80M-32k-retrieval Sentence Similarity โข Updated Jan 12, 2024 โข 1.13k โข 127