RedPajama: an Open Dataset for Training Large Language Models
Paper
•
2411.12372
•
Published
•
56
Synthetic Augmented data, Fair and Extreme-scaled Large Multimodal Model (SafeLMM) * Multilingual, Multimodal, Multidomain data * Synthetic data * Safety-by-design