Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation Paper • 2406.16678 • Published Jun 24 • 14
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated about 21 hours ago • 156
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 42
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 154
Czech evaluation datasets Collection This collections should contain czech evaluation datasets • 8 items • Updated Jan 14 • 3
Retrieval-Augmented Generation for Large Language Models: A Survey Paper • 2312.10997 • Published Dec 18, 2023 • 10
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 79
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Paper • 2101.00027 • Published Dec 31, 2020 • 6
Zephyr 7B Collection Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated Apr 12 • 145