PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning Paper • 2502.12054 • Published 25 days ago • 6
Kanana: Compute-efficient Bilingual Language Models Paper • 2502.18934 • Published 16 days ago • 62
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published 18 days ago • 24
view article Article Navigating Korean LLM Research #2: Evaluation Tools By amphora • Oct 23, 2024 • 7
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models Paper • 2309.02706 • Published Sep 6, 2023 • 2