RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 35
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Oct 9, 2024 • 13
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5, 2024 • 182
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 27 days ago • 698