On Relation-Specific Neurons in Large Language Models Paper ā¢ 2502.17355 ā¢ Published 24 days ago ā¢ 6
MMTEB Collection Our contribution to the Massive Multilingual Text Embedding Benchmark (MMTEB). Retrieval and reranking benchmarks in 16 languages. ā¢ 4 items ā¢ Updated Jun 6, 2024 ā¢ 2
MMTEB: Massive Multilingual Text Embedding Benchmark Paper ā¢ 2502.13595 ā¢ Published 29 days ago ā¢ 32
CommonCrawl Collection Large web-mined general corpus based on CommonCrawl. ā¢ 7 items ā¢ Updated Dec 8, 2024 ā¢ 2
NoLiMa: Long-Context Evaluation Beyond Literal Matching Paper ā¢ 2502.05167 ā¢ Published Feb 7 ā¢ 15
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. ā¢ 8 items ā¢ Updated Nov 23, 2024 ā¢ 80
How Transliterations Improve Crosslingual Alignment Paper ā¢ 2409.17326 ā¢ Published Sep 25, 2024 ā¢ 1
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages Paper ā¢ 2410.23825 ā¢ Published Oct 31, 2024 ā¢ 4
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs ā¢ 20 items ā¢ Updated Jan 15 ā¢ 120
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment Paper ā¢ 2410.05873 ā¢ Published Oct 8, 2024 ā¢ 3