RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 35
Parallel Sentences Datasets Collection These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual. • 14 items • Updated Oct 9 • 12
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 161
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 683