Indic Benchmarks

community

https://indicbench.com

indicbench

AI & ML interests

None defined yet.

Organization Card

About org cards

Indic Language Benchmarking for Large Language Models

India is diverse with 22+ languages. This project aims to benchmark the performance of large language models on Indic languages across datasets. Goal is to evaluate a models abilities in understanding, generating, and processing text in these languages.

We currently have 8 languages across 3 datasets, more coming soon

Languages

Bengali (bn)
Gujarati (gu)
Hindi (hi)
Kannada (kn)
Malayalam (ml)
Odiya (or)
Tamil (ta)
Telugu (te)

Datasets

ARC-Challenge: hi, bn, gu, kn, ml, or, ta, te
TruthfulQA: hi, bn, gu, kn, ml, or, ta, te
Hellaswag: hi, bn, gu, kn, ml, or, ta, te

Code

We are also trying to build an MMLU dataset with Indian Knowledge. If anyone is interested in contributing, please reach out to Ram, Munish

models

None public yet

datasets 23

indicbench/hellaswag_or

Viewer • Updated Mar 28

indicbench/hellaswag_ta

Viewer • Updated Mar 28

indicbench/hellaswag_ml

Viewer • Updated Mar 28

indicbench/hellaswag_kn

Viewer • Updated Mar 28

indicbench/hellaswag_te

Viewer • Updated Mar 28

indicbench/hellaswag_bn

Viewer • Updated Mar 28

indicbench/hellaswag_gu

Viewer • Updated Mar 28

indicbench/truthfulqa_te

indicbench/truthfulqa_ml

indicbench/truthfulqa_kn